diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md index 16da732a1d2..17e16583092 100644 --- a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md +++ b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md @@ -23,12 +23,12 @@ The guide is intended for developers and practitioners seeking high-throughput o There are multiple MOE backends inside TensorRT LLM. Here are the support matrix of the MOE backends. -| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case | -|------------|------------------|------------------|-------------|----------------| -| B200/GB200 | MXFP8 | MXFP4 | TRTLLM | Low Latency | -| B200/GB200 | MXFP8 | MXFP4 | CUTLASS | Max Throughput | +| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case | +|---------------------- |-----------------|------------------|-------------|--------------------------------| +| B200/GB200/B300/GB300 | MXFP8 | MXFP4 | TRTLLM | Low Latency and Max Throughput | -The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model. +The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model. +`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster. ## Deployment Steps