Skip to content

Commit f8e7837

Browse files
committed
[None][doc] Clarify the perf best practice and supported hardware for gptoss
Signed-off-by: Dongfeng Yu <[email protected]>
1 parent e47c787 commit f8e7837

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ There are multiple MOE backends inside TensorRT LLM. Here are the support matrix
2525

2626
| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case |
2727
|------------|------------------|------------------|-------------|----------------|
28-
| B200/GB200 | MXFP8 | MXFP4 | TRTLLM | Low Latency |
29-
| B200/GB200 | MXFP8 | MXFP4 | CUTLASS | Max Throughput |
28+
| B200/GB200/B300 | MXFP8 | MXFP4 | TRTLLM | Low Latency and Max Throughput |
3029

31-
The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model.
30+
The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
31+
`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.
3232

3333
## Deployment Steps
3434

0 commit comments

Comments
 (0)