From 32d9ecd474546560376ee16a25f264dcd68b2c38 Mon Sep 17 00:00:00 2001
From: Dongfeng Yu <dongfengy@nvidia.com>
Date: Sat, 25 Oct 2025 21:16:32 +0000
Subject: [PATCH 1/4] [None][doc] Clarify the perf best practice and supported
 hardware for gptoss

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
---
 .../quick-start-recipe-for-gpt-oss-on-trtllm.md             | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
index 16da732a1d2..c5842c5cb17 100644
--- a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
+++ b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
@@ -25,10 +25,10 @@ There are multiple MOE backends inside TensorRT LLM. Here are the support matrix
 
 | Device     | Activation Type | MoE Weights Type | MoE Backend | Use Case       |
 |------------|------------------|------------------|-------------|----------------|
-| B200/GB200 | MXFP8            | MXFP4            | TRTLLM      | Low Latency    |
-| B200/GB200 | MXFP8            | MXFP4            | CUTLASS     | Max Throughput |
+| B200/GB200/B300 | MXFP8            | MXFP4            | TRTLLM      | Low Latency and  Max Throughput   |
 
-The default moe backend is `CUTLASS`, so for the combination which is not supported by `CUTLASS`, one must set the `moe_config.backend` explicitly to run the model.
+The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
+`CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.
 
 ## Deployment Steps
 

From f7dadde930691148705e310169b4f0e3cfd8877e Mon Sep 17 00:00:00 2001
From: Dongfeng Yu <dongfengy@nvidia.com>
Date: Sat, 25 Oct 2025 21:17:49 +0000
Subject: [PATCH 2/4] Update doc

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
---
 .../quick-start-recipe-for-gpt-oss-on-trtllm.md                 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
index c5842c5cb17..a6a02040b12 100644
--- a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
+++ b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
@@ -25,7 +25,7 @@ There are multiple MOE backends inside TensorRT LLM. Here are the support matrix
 
 | Device     | Activation Type | MoE Weights Type | MoE Backend | Use Case       |
 |------------|------------------|------------------|-------------|----------------|
-| B200/GB200/B300 | MXFP8            | MXFP4            | TRTLLM      | Low Latency and  Max Throughput   |
+| B200/GB200/B300/GB300 | MXFP8            | MXFP4            | TRTLLM      | Low Latency and  Max Throughput   |
 
 The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
 `CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.

From 3d4018e169a5a0c60d28ed2316f0aba1d995fc98 Mon Sep 17 00:00:00 2001
From: dongfengy <99041270+dongfengy@users.noreply.github.com>
Date: Mon, 27 Oct 2025 13:14:25 -0700
Subject: [PATCH 3/4] Update quick-start-recipe-for-gpt-oss-on-trtllm.md

Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
---
 .../quick-start-recipe-for-gpt-oss-on-trtllm.md             | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
index a6a02040b12..7b32e24025b 100644
--- a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
+++ b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
@@ -23,9 +23,9 @@ The guide is intended for developers and practitioners seeking high-throughput o
 
 There are multiple MOE backends inside TensorRT LLM. Here are the support matrix of the MOE backends.
 
-| Device     | Activation Type | MoE Weights Type | MoE Backend | Use Case       |
-|------------|------------------|------------------|-------------|----------------|
-| B200/GB200/B300/GB300 | MXFP8            | MXFP4            | TRTLLM      | Low Latency and  Max Throughput   |
+| Device                | Activation Type | MoE Weights Type | MoE Backend | Use Case                       |
+|---------------------- |-----------------|------------------|-------------|--------------------------------|
+| B200/GB200/B300/GB300 |      MXFP8      |       MXFP4      |    TRTLLM   | Low Latency and Max Throughput |
 
 The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
 `CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.

From 58d6cf14525f4b0c131588c7647727590a48e8ce Mon Sep 17 00:00:00 2001
From: dongfengy <99041270+dongfengy@users.noreply.github.com>
Date: Mon, 27 Oct 2025 13:16:31 -0700
Subject: [PATCH 4/4] Update quick-start-recipe-for-gpt-oss-on-trtllm.md

Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
---
 .../quick-start-recipe-for-gpt-oss-on-trtllm.md                 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
index 7b32e24025b..17e16583092 100644
--- a/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
+++ b/docs/source/deployment-guide/quick-start-recipe-for-gpt-oss-on-trtllm.md
@@ -25,7 +25,7 @@ There are multiple MOE backends inside TensorRT LLM. Here are the support matrix
 
 | Device                | Activation Type | MoE Weights Type | MoE Backend | Use Case                       |
 |---------------------- |-----------------|------------------|-------------|--------------------------------|
-| B200/GB200/B300/GB300 |      MXFP8      |       MXFP4      |    TRTLLM   | Low Latency and Max Throughput |
+| B200/GB200/B300/GB300 | MXFP8           | MXFP4            | TRTLLM      | Low Latency and Max Throughput |
 
 The default moe backend is `CUTLASS`, so for the best possible perf, one must set the `moe_config.backend` explicitly to run the model.
 `CUTLASS` was better for max throughput at first but now we have optimized `TRTLLM` moe to be universally faster.