Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

amirkl94 · 2025-10-21T08:53:44Z

📌 Description

Verify quant scales for fp8 are non null in cutlass FusedMoE path. Currently, if these tensors are passed as None from python it will result in segmentation fault.

Summary by CodeRabbit

Bug Fixes
- Enhanced validation for FP8 quantization parameters to improve system robustness and prevent potential null reference issues during quantization operations.

Signed-off-by: Amir Klein <[email protected]>

gemini-code-assist · 2025-10-21T08:53:57Z

Summary of Changes

Hello @amirkl94, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical stability issue in the Cutlass FP8 FusedMoE implementation by adding robust validation for quantization scale tensors. By ensuring these tensors are not null, the change effectively prevents segmentation faults that could arise from None values being passed from Python, thereby improving the overall reliability and robustness of the FusedMoE computation.

Highlights

Null Pointer Safeguard: Introduced explicit null checks for critical quantization scale tensors (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) within the Cutlass FP8 FusedMoE path.
Segmentation Fault Prevention: The added checks prevent segmentation faults that previously occurred when these quantization scale tensors were passed as None from Python, enhancing the stability of the FusedMoE operation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-21T08:54:12Z

Walkthrough

Added runtime null checks for FP8 quantization parameters (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) in the getQuantParams function before type validation, tightening input validation for quantization scales.

Changes

Cohort / File(s)	Change Summary
FP8 Quantization Null Checks `csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu`	Added runtime null validation for FP8 dequantization and quantization parameter pointers prior to type checks to ensure input validity

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A hop through the code with checks so fine,
Null pointers caught before they cross the line,
FP8 quantization now stands tall,
Validated checks prevent them all!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description provides a concise explanation of what the PR does and why it is needed, addressing the core "Description" section of the template. However, the description is significantly incomplete relative to the provided template. The author has only filled out the Description section but has omitted the "Related Issues" section entirely and has not included any of the "Pull Request Checklist" items, such as pre-commit checks verification or tests. This represents a substantial gap in following the repository's expected pull request structure, covering only approximately one-third of the template's required sections.	The author should complete the pull request description by adding the missing sections: the "Related Issues" section (if applicable) and the "Pull Request Checklist" with confirmations for pre-commit checks and tests. Even if some items (such as "Related Issues") are not applicable, they should be explicitly noted as such rather than omitted entirely. This ensures the description follows the repository's standards and provides reviewers with complete context about the changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "Fix: Verify scales are not None for Cutlass FP8 FusedMoE" is clear, specific, and directly aligned with the main change in the changeset. The raw summary confirms that the PR adds runtime null checks for FP8 quantization parameters, and the title accurately captures this core fix. The title is concise and uses descriptive language that clearly communicates the purpose of the change without being vague or generic.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request adds important null pointer checks for quantization scales in the FP8 FusedMoE path, which is a good defensive measure to prevent potential segmentation faults. The change is correct and addresses the issue described. I've found a minor typo in one of the new error messages and have provided a suggestion to fix it.

gemini-code-assist · 2025-10-21T08:54:50Z

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu

+      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
+          << "Expecting fc1fc2_dequant_dequant to be non null";


There appears to be a copy-paste error in this error message. It should refer to fc2_dequant to match the variable being checked.

TVM_FFI_ICHECK(fc2_dequant.get() != nullptr) << "Expecting fc2_dequant to be non null";

pls take a look at this comment from gemini

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1)

839-843: Add individual null checks to other quantization paths like the FP8 path.

The FP8 path (lines 803-808) checks each extracted tensor element for null (fc1_dequant.get() != nullptr), but other quantization modes only verify quant_scales.value().size() before directly accessing array elements. Since Python tests pass quant_scales=None to these other paths (NVFP4, W4A8_MXFP4_FP8, W4A8_MXFP4_MXFP8, BlockScaling, W4A16, INT4), they share the same segfault vulnerability.

Add per-item null checks after extracting each tensor from quant_scales.value() in:

W4A8_MXFP4_FP8 (lines 839-843): after extracting fc1_weight_block, fc1_global, fc2_act_global, fc2_weight_block, fc2_global

W4A8_MXFP4_MXFP8 (lines 904-907): after extracting fc1_weight_block, fc1_global, fc2_weight_block, fc2_global

NVFP4 (lines 963-968): after extracting all 6 scale tensors

BlockScaling (lines 1028-1029): after extracting fc1_scales, fc2_scales

W4A16 (lines 1037-1038): after extracting fc1_weight_scales, fc2_weight_scales

INT4 (lines 1048-1055): after extracting all 8 scale tensors

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f2596 and 4bce8ff.

📒 Files selected for processing (1)

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

coderabbitai · 2025-10-21T08:57:12Z

csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu

+      TVM_FFI_ICHECK(fc1_dequant.get() != nullptr) << "Expecting fc1_dequant to be non null";
+      TVM_FFI_ICHECK(fc2_quant.get() != nullptr) << "Expecting fc2_quant to be non null";
+      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
+          << "Expecting fc1fc2_dequant_dequant to be non null";
+      TVM_FFI_ICHECK(fc1_input_dequant.get() != nullptr)
+          << "Expecting fc1_input_dequant to be non null";


⚠️ Potential issue | 🟡 Minor

Fix typo in error message on line 806.

The error message on line 806 reads "Expecting fc1fc2_dequant_dequant to be non null" but should say "Expecting fc2_dequant to be non null".

Apply this diff to fix the typo:

- TVM_FFI_ICHECK(fc2_dequant.get() != nullptr) - << "Expecting fc1fc2_dequant_dequant to be non null"; + TVM_FFI_ICHECK(fc2_dequant.get() != nullptr) + << "Expecting fc2_dequant to be non null";

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

TVM_FFI_ICHECK(fc1_dequant.get() != nullptr) << "Expecting fc1_dequant to be non null";

TVM_FFI_ICHECK(fc2_quant.get() != nullptr) << "Expecting fc2_quant to be non null";

TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)

<< "Expecting fc1fc2_dequant_dequant to be non null";

TVM_FFI_ICHECK(fc1_input_dequant.get() != nullptr)

<< "Expecting fc1_input_dequant to be non null";

TVM_FFI_ICHECK(fc1_dequant.get() != nullptr) << "Expecting fc1_dequant to be non null";

TVM_FFI_ICHECK(fc2_quant.get() != nullptr) << "Expecting fc2_quant to be non null";

TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)

<< "Expecting fc2_dequant to be non null";

TVM_FFI_ICHECK(fc1_input_dequant.get() != nullptr)

<< "Expecting fc1_input_dequant to be non null";

🤖 Prompt for AI Agents

In csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu around lines 803 to 808, the TVM_FFI_ICHECK error message for fc2_dequant contains a typo ("Expecting fc1fc2_dequant_dequant to be non null"); update that string to the correct text "Expecting fc2_dequant to be non null" so the check reports the right variable name.

yongwww · 2025-10-21T16:07:02Z

/bot run

flashinfer-bot · 2025-10-21T16:07:38Z

GitLab MR !88 has been created, and the CI pipeline #36988944 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2025-10-22T00:07:17Z

[FAILED] Pipeline #36988944: 1/17 passed

Verify scales are not None for Cutlass FP8 FusedMoE

4bce8ff

Signed-off-by: Amir Klein <[email protected]>

amirkl94 requested review from cyx-6, wenscarl, yongwww and yzh119 as code owners October 21, 2025 08:53

gemini-code-assist bot reviewed Oct 21, 2025

View reviewed changes

coderabbitai bot reviewed Oct 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

amirkl94 commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Oct 21, 2025

Uh oh!

coderabbitai bot commented Oct 21, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 21, 2025

Uh oh!

yongwww Oct 21, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 21, 2025

Uh oh!

yongwww commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
		<< "Expecting fc1fc2_dequant_dequant to be non null";

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Are you sure you want to change the base?

Fix: Verify scales are not None for Cutlass FP8 FusedMoE #1961

Conversation

amirkl94 commented Oct 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 21, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

yongwww Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

yongwww commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 21, 2025

Uh oh!

flashinfer-bot commented Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amirkl94 commented Oct 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 21, 2025 •

edited

Loading