[None][feat] Autodeploy: Update the ssm to use slice #8667

nvchenghaoz · 2025-10-26T05:02:47Z

Summary by CodeRabbit

Tests
- Extended mixture-of-experts testing to cover additional routing and token distribution scenarios.
Performance
- Optimized tensor operations in Mamba model inference through streamlined data handling.

Signed-off-by: nvchenghaoz <[email protected]>

nvchenghaoz · 2025-10-26T05:02:58Z

/bot run

tensorrt-cicd · 2025-10-26T05:09:10Z

PR_Github #22521 [ run ] triggered by Bot. Commit: c5bbc31

coderabbitai · 2025-10-26T05:11:58Z

📝 Walkthrough

Walkthrough

The changes refactor the Mamba Triton backend to replace index-based tensor selections with direct slicing operations for both prefill and decode stages, removing the prefill_idx construct. Additionally, test cases for MoE Triton kernels are extended with early_exit parameterization to cover both balanced and imbalanced routing scenarios.

Changes

Cohort / File(s)	Summary
Mamba Backend Optimization `tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py`	Replaced index-based selections with direct tensor slicing for prefill and decode data construction. Removed prefill_idx usage and simplified hs_prefill, B_prefill, C_prefill, dt_prefill, x_decode, B_decode, C_decode, dt_decode derivation. Updated output assignment for both prefill and decode stages to use slice assignment instead of index_copy_.
MoE Test Parameterization `tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/triton_kernels/test_triton_moe.py`	Added early_exit parameterization (False, True) to test_triton_moe_matches_torch_moe_mlp_relu2 and test_triton_quant_fp8_moe_matches_torch_quant_fp8_moe. Extended tests to conditionally vary M parameter and apply either random top-k or imbalanced routing (75% token concentration on first two experts) based on early_exit flag.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py: Requires careful verification that the direct slicing semantics correctly replace index-based operations across prefill and decode paths without altering tensor shapes or data ordering.
test file: Pay attention to ensure early_exit branching correctly produces intended token distributions and routing imbalance for both test cases.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is incomplete and does not follow the required template. The author provided only "@coderabbitai summary," which is a command to invoke CodeRabbit's AI rather than an actual description. The required template specifies sections for Description (explaining the issue and solution), Test Coverage (listing relevant tests), and a PR Checklist confirming various requirements. None of these sections have been completed or filled out with substantive information about what changes were made and why.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "[None][feat] Autodeploy: Update the ssm to use slice" directly relates to the main changes in the pull request. The primary modification involves refactoring the Mamba triton backend to replace index-based selections with direct slicing operations for both prefill and decode paths, which is accurately captured by the phrase "use slice." The title follows the required format with "[None]" indicating no ticket reference and "[feat]" denoting a feature change. It is concise, specific enough for a developer scanning history to understand the core refactoring involved, and does not include extraneous information.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py (1)
137-137: Consider computing total_prefill_tokens once to avoid redundancy.

The value is recomputed here because it was previously computed inside the if num_prefill > 0 block (line 82) and isn't in scope. Consider computing total_prefill_tokens before both conditional blocks to avoid the redundant calculation.

Apply this diff to eliminate redundant computation:
     # Prefill: concatenate tokens at the front and run combined scan
+    total_prefill_tokens = 0 if num_prefill == 0 else int(seq_len[:num_prefill].sum().item())
+    
     if num_prefill > 0:
         seq_len_prefill = seq_len[:num_prefill].to(torch.int32)
-        total_prefill_tokens = int(seq_len_prefill.sum().item())

         hs_prefill = hs_flat[:total_prefill_tokens].unsqueeze(0)  # [1, S_p, H, D]
And remove the recomputation at line 137:
     # Decode: batch single-token updates via selective_state_update
     if num_decode > 0:
-        total_prefill_tokens = 0 if num_prefill == 0 else int(seq_len[:num_prefill].sum().item())
         slot_idx_decode = slot_idx[num_prefill:].to(torch.long)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6d20f6 and c5bbc31.

📒 Files selected for processing (2)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py (3 hunks)
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/triton_kernels/test_triton_moe.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/triton_kernels/test_triton_moe.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/triton_kernels/test_triton_moe.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/triton_kernels/test_triton_moe.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (2)

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/triton_kernels/test_triton_moe.py (2)

78-125: LGTM! Good test coverage enhancement.

The parameterization of early_exit to test both balanced and imbalanced routing scenarios is well-designed. The imbalanced routing (concentrating 75% of tokens on first 2 experts) will help validate the MoE kernel's behavior under skewed load distribution.

237-348: LGTM! Consistent test parameterization for FP8 quantized MoE.

The parameterization follows the same sound pattern as the BF16 test, appropriately adjusted for larger token counts in the FP8 test. The routing logic correctly implements both balanced and imbalanced scenarios.

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

tensorrt-cicd · 2025-10-26T08:16:20Z

PR_Github #22521 [ run ] completed with state SUCCESS. Commit: c5bbc31
/LLM/main/L0_MergeRequest_PR pipeline #16977 completed with status: 'FAILURE'

nvchenghaoz · 2025-10-26T21:01:49Z

/bot run

nvchenghaoz · 2025-10-26T23:05:29Z

/bot run

nvchenghaoz · 2025-10-27T03:51:24Z

/bot run

tensorrt-cicd · 2025-10-27T03:56:42Z

PR_Github #22568 [ run ] triggered by Bot. Commit: c5bbc31

tensorrt-cicd · 2025-10-27T06:30:58Z

PR_Github #22568 [ run ] completed with state SUCCESS. Commit: c5bbc31
/LLM/main/L0_MergeRequest_PR pipeline #17012 completed with status: 'SUCCESS'

Update the ssm to use slice

c5bbc31

Signed-off-by: nvchenghaoz <[email protected]>

nvchenghaoz requested a review from suyoggupta October 26, 2025 05:02

nvchenghaoz requested a review from a team as a code owner October 26, 2025 05:02

github-project-automation bot moved this to Backlog in AutoDeploy Board Oct 26, 2025

github-project-automation bot added this to AutoDeploy Board Oct 26, 2025

coderabbitai bot reviewed Oct 26, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py Show resolved Hide resolved

suyoggupta approved these changes Oct 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[None][feat] Autodeploy: Update the ssm to use slice #8667

[None][feat] Autodeploy: Update the ssm to use slice #8667

nvchenghaoz commented Oct 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

nvchenghaoz commented Oct 26, 2025

Uh oh!

tensorrt-cicd commented Oct 26, 2025

Uh oh!

coderabbitai bot commented Oct 26, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

tensorrt-cicd commented Oct 26, 2025

Uh oh!

nvchenghaoz commented Oct 26, 2025

Uh oh!

nvchenghaoz commented Oct 26, 2025

Uh oh!

nvchenghaoz commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[None][feat] Autodeploy: Update the ssm to use slice #8667

Are you sure you want to change the base?

[None][feat] Autodeploy: Update the ssm to use slice #8667

Conversation

nvchenghaoz commented Oct 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

nvchenghaoz commented Oct 26, 2025

Uh oh!

tensorrt-cicd commented Oct 26, 2025

Uh oh!

coderabbitai bot commented Oct 26, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Oct 26, 2025

Uh oh!

nvchenghaoz commented Oct 26, 2025

Uh oh!

nvchenghaoz commented Oct 26, 2025

Uh oh!

nvchenghaoz commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

tensorrt-cicd commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvchenghaoz commented Oct 26, 2025 •

edited by coderabbitai bot

Loading