[None][doc] Clarify the perf best practice and supported hardware for gptoss (#8665)

dongfengy · web-flow · commit 6424f7e55f15 · 2025-10-31T10:11:59.000-07:00
Signed-off-by: Dongfeng Yu &lt;dongfengy@nvidia.com&gt;
Signed-off-by: dongfengy &lt;99041270+dongfengy@users.noreply.github.com&gt;
diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
@@ -23,12 +23,12 @@ The guide is intended for developers and practitioners seeking high-throughput o
 
 There are multiple MOE backends inside TensorRT LLM. Here are the support matrix of the MOE backends.
 
-| Device     | Activation Type | MoE Weights Type | MoE Backend | Use Case       |
-|------------|------------------|------------------|-------------|----------------|
-| B200/GB200 | MXFP8            | MXFP4            | TRTLLM      | Low Latency    |
-| B200/GB200 | MXFP8            | MXFP4            | CUTLASS     | Max Throughput |
+| Device                | Activation Type | MoE Weights Type | MoE Backend | Use Case                       |
+|---------------------- |-----------------|------------------|-------------|--------------------------------|
+| B200/GB200/B300/GB300 | MXFP8           | MXFP4            | TRTLLM      | Low Latency and Max Throughput |
 
-The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model.
+The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
+`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.
 
 ## Deployment Steps