NVIDIA · dongfengy · Oct 31, 2025 · Oct 25, 2025 · Oct 25, 2025 · Oct 27, 2025
@@ -23,12 +23,12 @@ The guide is intended for developers and practitioners seeking high-throughput o
 
 There are multiple MOE backends inside TensorRT LLM. Here are the support matrix of the MOE backends.
 
-| Device     | Activation Type | MoE Weights Type | MoE Backend | Use Case       |
-|------------|------------------|------------------|-------------|----------------|
-| B200/GB200 | MXFP8            | MXFP4            | TRTLLM      | Low Latency    |
-| B200/GB200 | MXFP8            | MXFP4            | CUTLASS     | Max Throughput |
+| Device                | Activation Type | MoE Weights Type | MoE Backend | Use Case                       |
+|---------------------- |-----------------|------------------|-------------|--------------------------------|
+| B200/GB200/B300/GB300 | MXFP8           | MXFP4            | TRTLLM      | Low Latency and Max Throughput |
 
-The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model.
+The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
+`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.
 
 ## Deployment Steps