Skip to content

Commit 6424f7e

Browse files
authored
[None][doc] Clarify the perf best practice and supported hardware for gptoss (#8665)
Signed-off-by: Dongfeng Yu <[email protected]> Signed-off-by: dongfengy <[email protected]>
1 parent afa75c9 commit 6424f7e

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@ The guide is intended for developers and practitioners seeking high-throughput o
2323

2424
There are multiple MOE backends inside TensorRT LLM. Here are the support matrix of the MOE backends.
2525

26-
| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case |
27-
|------------|------------------|------------------|-------------|----------------|
28-
| B200/GB200 | MXFP8 | MXFP4 | TRTLLM | Low Latency |
29-
| B200/GB200 | MXFP8 | MXFP4 | CUTLASS | Max Throughput |
26+
| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case |
27+
|---------------------- |-----------------|------------------|-------------|--------------------------------|
28+
| B200/GB200/B300/GB300 | MXFP8 | MXFP4 | TRTLLM | Low Latency and Max Throughput |
3029

31-
The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model.
30+
The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
31+
`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.
3232

3333
## Deployment Steps
3434

0 commit comments

Comments
 (0)