[None][doc] Clarify the perf best practice and supported hardware for gptoss

dongfengy · dongfengy · commit f8e78379a40c · 2025-10-25T21:16:32.000Z
Signed-off-by: Dongfeng Yu &lt;dongfengy@nvidia.com&gt;
diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
@@ -25,10 +25,10 @@ There are multiple MOE backends inside TensorRT LLM. Here are the support matrix
 
 | Device     | Activation Type | MoE Weights Type | MoE Backend | Use Case       |
 |------------|------------------|------------------|-------------|----------------|
-| B200/GB200 | MXFP8            | MXFP4            | TRTLLM      | Low Latency    |
-| B200/GB200 | MXFP8            | MXFP4            | CUTLASS     | Max Throughput |
+| B200/GB200/B300 | MXFP8            | MXFP4            | TRTLLM      | Low Latency and  Max Throughput   |
 
-The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model.
+The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
+`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.
 
 ## Deployment Steps