Skip to content

Commit 9da4664

Browse files
authored
Update quick-start-recipe-for-gpt-oss-on-trtllm.md
Signed-off-by: dongfengy <[email protected]>
1 parent ff7a29f commit 9da4664

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@ The guide is intended for developers and practitioners seeking high-throughput o
2323

2424
There are multiple MOE backends inside TensorRT LLM. Here are the support matrix of the MOE backends.
2525

26-
| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case |
27-
|------------|------------------|------------------|-------------|----------------|
28-
| B200/GB200/B300/GB300 | MXFP8 | MXFP4 | TRTLLM | Low Latency and Max Throughput |
26+
| Device | Activation Type | MoE Weights Type | MoE Backend | Use Case |
27+
|---------------------- |-----------------|------------------|-------------|--------------------------------|
28+
| B200/GB200/B300/GB300 | MXFP8 | MXFP4 | TRTLLM | Low Latency and Max Throughput |
2929

3030
The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
3131
`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.

0 commit comments

Comments
 (0)