Skip to content

Conversation

@amirkl94
Copy link
Contributor

@amirkl94 amirkl94 commented Oct 21, 2025

📌 Description

Verify quant scales for fp8 are non null in cutlass FusedMoE path. Currently, if these tensors are passed as None from python it will result in segmentation fault.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced validation for FP8 quantization parameters to improve system robustness and prevent potential null reference issues during quantization operations.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @amirkl94, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical stability issue in the Cutlass FP8 FusedMoE implementation by adding robust validation for quantization scale tensors. By ensuring these tensors are not null, the change effectively prevents segmentation faults that could arise from None values being passed from Python, thereby improving the overall reliability and robustness of the FusedMoE computation.

Highlights

  • Null Pointer Safeguard: Introduced explicit null checks for critical quantization scale tensors (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) within the Cutlass FP8 FusedMoE path.
  • Segmentation Fault Prevention: The added checks prevent segmentation faults that previously occurred when these quantization scale tensors were passed as None from Python, enhancing the stability of the FusedMoE operation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Walkthrough

Added runtime null checks for FP8 quantization parameters (fc1_dequant, fc2_quant, fc2_dequant, fc1_input_dequant) in the getQuantParams function before type validation, tightening input validation for quantization scales.

Changes

Cohort / File(s) Change Summary
FP8 Quantization Null Checks
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu
Added runtime null validation for FP8 dequantization and quantization parameter pointers prior to type checks to ensure input validity

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 A hop through the code with checks so fine,
Null pointers caught before they cross the line,
FP8 quantization now stands tall,
Validated checks prevent them all!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description provides a concise explanation of what the PR does and why it is needed, addressing the core "Description" section of the template. However, the description is significantly incomplete relative to the provided template. The author has only filled out the Description section but has omitted the "Related Issues" section entirely and has not included any of the "Pull Request Checklist" items, such as pre-commit checks verification or tests. This represents a substantial gap in following the repository's expected pull request structure, covering only approximately one-third of the template's required sections. The author should complete the pull request description by adding the missing sections: the "Related Issues" section (if applicable) and the "Pull Request Checklist" with confirmations for pre-commit checks and tests. Even if some items (such as "Related Issues") are not applicable, they should be explicitly noted as such rather than omitted entirely. This ensures the description follows the repository's standards and provides reviewers with complete context about the changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "Fix: Verify scales are not None for Cutlass FP8 FusedMoE" is clear, specific, and directly aligned with the main change in the changeset. The raw summary confirms that the PR adds runtime null checks for FP8 quantization parameters, and the title accurately captures this core fix. The title is concise and uses descriptive language that clearly communicates the purpose of the change without being vague or generic.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds important null pointer checks for quantization scales in the FP8 FusedMoE path, which is a good defensive measure to prevent potential segmentation faults. The change is correct and addresses the issue described. I've found a minor typo in one of the new error messages and have provided a suggestion to fix it.

Comment on lines +805 to +806
TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
<< "Expecting fc1fc2_dequant_dequant to be non null";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a copy-paste error in this error message. It should refer to fc2_dequant to match the variable being checked.

      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
          << "Expecting fc2_dequant to be non null";

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls take a look at this comment from gemini

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1)

839-843: Add individual null checks to other quantization paths like the FP8 path.

The FP8 path (lines 803-808) checks each extracted tensor element for null (fc1_dequant.get() != nullptr), but other quantization modes only verify quant_scales.value().size() before directly accessing array elements. Since Python tests pass quant_scales=None to these other paths (NVFP4, W4A8_MXFP4_FP8, W4A8_MXFP4_MXFP8, BlockScaling, W4A16, INT4), they share the same segfault vulnerability.

Add per-item null checks after extracting each tensor from quant_scales.value() in:

  • W4A8_MXFP4_FP8 (lines 839-843): after extracting fc1_weight_block, fc1_global, fc2_act_global, fc2_weight_block, fc2_global
  • W4A8_MXFP4_MXFP8 (lines 904-907): after extracting fc1_weight_block, fc1_global, fc2_weight_block, fc2_global
  • NVFP4 (lines 963-968): after extracting all 6 scale tensors
  • BlockScaling (lines 1028-1029): after extracting fc1_scales, fc2_scales
  • W4A16 (lines 1037-1038): after extracting fc1_weight_scales, fc2_weight_scales
  • INT4 (lines 1048-1055): after extracting all 8 scale tensors
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c3f2596 and 4bce8ff.

📒 Files selected for processing (1)
  • csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs

Comment on lines +803 to +808
TVM_FFI_ICHECK(fc1_dequant.get() != nullptr) << "Expecting fc1_dequant to be non null";
TVM_FFI_ICHECK(fc2_quant.get() != nullptr) << "Expecting fc2_quant to be non null";
TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
<< "Expecting fc1fc2_dequant_dequant to be non null";
TVM_FFI_ICHECK(fc1_input_dequant.get() != nullptr)
<< "Expecting fc1_input_dequant to be non null";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in error message on line 806.

The error message on line 806 reads "Expecting fc1fc2_dequant_dequant to be non null" but should say "Expecting fc2_dequant to be non null".

Apply this diff to fix the typo:

-      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
-          << "Expecting fc1fc2_dequant_dequant to be non null";
+      TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
+          << "Expecting fc2_dequant to be non null";
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
TVM_FFI_ICHECK(fc1_dequant.get() != nullptr) << "Expecting fc1_dequant to be non null";
TVM_FFI_ICHECK(fc2_quant.get() != nullptr) << "Expecting fc2_quant to be non null";
TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
<< "Expecting fc1fc2_dequant_dequant to be non null";
TVM_FFI_ICHECK(fc1_input_dequant.get() != nullptr)
<< "Expecting fc1_input_dequant to be non null";
TVM_FFI_ICHECK(fc1_dequant.get() != nullptr) << "Expecting fc1_dequant to be non null";
TVM_FFI_ICHECK(fc2_quant.get() != nullptr) << "Expecting fc2_quant to be non null";
TVM_FFI_ICHECK(fc2_dequant.get() != nullptr)
<< "Expecting fc2_dequant to be non null";
TVM_FFI_ICHECK(fc1_input_dequant.get() != nullptr)
<< "Expecting fc1_input_dequant to be non null";
🤖 Prompt for AI Agents
In csrc/fused_moe/cutlass_backend/flashinfer_cutlass_fused_moe_sm100_binding.cu
around lines 803 to 808, the TVM_FFI_ICHECK error message for fc2_dequant
contains a typo ("Expecting fc1fc2_dequant_dequant to be non null"); update that
string to the correct text "Expecting fc2_dequant to be non null" so the check
reports the right variable name.

@yongwww
Copy link
Collaborator

yongwww commented Oct 21, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !88 has been created, and the CI pipeline #36988944 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Collaborator

[FAILED] Pipeline #36988944: 1/17 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants