Skip to content

Conversation

@guyueh1
Copy link
Contributor

@guyueh1 guyueh1 commented Feb 5, 2026

What does this PR do ?

In sequence packing case, the full sequence needs to be packed to multiple of X, where for BF16 X is 1, for FP8 it is: delayed or tensorwise recipe: 16; blockwise recipe: 128; mxfp8 recipe: 32.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Bug Fixes
    • Improved FP8 quantization handling for sequence packing. The padding strategy now automatically adjusts based on the selected FP8 recipe type (blockwise or mxfp8), optimizing memory usage and numerical precision for different quantization configurations.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 self-assigned this Feb 5, 2026
@guyueh1 guyueh1 requested a review from a team as a code owner February 5, 2026 16:53
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Modifies FP8 handling in Megatron sequence packing by refactoring the divisor calculation for pad_packed_seq_to_multiple_of to be conditional on the fp8_recipe parameter (16 by default, 128 for "blockwise", 32 for "mxfp8") while removing the intermediate use_blockwise_fp8 variable.

Changes

Cohort / File(s) Summary
FP8 Packing Logic
nemo_rl/models/megatron/data.py
Refactors FP8 divisor calculation in _get_pack_sequence_parameters_for_megatron to use recipe-specific values (16, 128, 32) instead of boolean-based logic for blockwise FP8 detection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested labels

CI:L2

Suggested reviewers

  • terrykong
  • yaoyu-33
  • zpqiu
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR modifies FP8 sequence padding behavior for mxfp8 training with recipe-dependent divisor calculation, but lacks test results and validation data demonstrating no numeric regression or convergence impact. Add to PR description: (1) confirmation that unit tests pass, (2) mxfp8 training validation results showing correct behavior and no convergence regression, (3) comparison metrics between FP8 recipes, and (4) specific model configuration used for validation.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: Mxfp8 training fix sequence padding' directly relates to the main change—adjusting sequence padding logic for mxfp8 recipe. It accurately captures the primary purpose of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@guyueh1 guyueh1 added super-v3 CI:L1 Run doctests, unit tests, and functional tests CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 5, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 requested a review from a team as a code owner February 9, 2026 21:21
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 9, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 9, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 9, 2026
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 10, 2026
Signed-off-by: root <root@gpu-254.slurm-workers-slurm.slurm.svc.cluster.local>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 10, 2026
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests super-v3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants