fix: Mxfp8 training fix sequence padding #1884

guyueh1 · 2026-02-05T16:53:32Z

What does this PR do ?

In sequence packing case, the full sequence needs to be packed to multiple of X, where for BF16 X is 1, for FP8 it is: delayed or tensorwise recipe: 16; blockwise recipe: 128; mxfp8 recipe: 32.

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Bug Fixes
- Improved FP8 quantization handling for sequence packing. The padding strategy now automatically adjusts based on the selected FP8 recipe type (blockwise or mxfp8), optimizing memory usage and numerical precision for different quantization configurations.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

coderabbitai · 2026-02-05T16:57:46Z

📝 Walkthrough

Walkthrough

Modifies FP8 handling in Megatron sequence packing by refactoring the divisor calculation for pad_packed_seq_to_multiple_of to be conditional on the fp8_recipe parameter (16 by default, 128 for "blockwise", 32 for "mxfp8") while removing the intermediate use_blockwise_fp8 variable.

Changes

Cohort / File(s)	Summary
FP8 Packing Logic `nemo_rl/models/megatron/data.py`	Refactors FP8 divisor calculation in `_get_pack_sequence_parameters_for_megatron` to use recipe-specific values (16, 128, 32) instead of boolean-based logic for blockwise FP8 detection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

cp: fix: Fix Fp8 sequence padding for PP>1 case (1579) into r0.5.0 #1670: Modifies Megatron packing logic to change FP8 sequence padding divisors and rounding behavior.
fix: Fix Fp8 sequence padding for PP>1 case #1579: Adjusts FP8-packed sequence padding logic with changes to pad_packed_seq_to_multiple_of and related packing code.
fix: Fix the sequence padding for FP8 case #1569: Updates pad_packed_seq_to_multiple_of computation for FP8 with blockwise/mxfp8 recipe handling in sequence-packing logic.

Suggested labels

CI:L2

Suggested reviewers

terrykong
yaoyu-33
zpqiu

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR modifies FP8 sequence padding behavior for mxfp8 training with recipe-dependent divisor calculation, but lacks test results and validation data demonstrating no numeric regression or convergence impact.	Add to PR description: (1) confirmation that unit tests pass, (2) mxfp8 training validation results showing correct behavior and no convergence regression, (3) comparison metrics between FP8 recipes, and (4) specific model configuration used for validation.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: Mxfp8 training fix sequence padding' directly relates to the main change—adjusting sequence padding logic for mxfp8 recipe. It accurately captures the primary purpose of the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Signed-off-by: root <root@gpu-254.slurm-workers-slurm.slurm.svc.cluster.local>

Fix sequence padding for mxfp8 training

ceb4040

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 self-assigned this Feb 5, 2026

guyueh1 requested a review from a team as a code owner February 5, 2026 16:53

guyueh1 added super-v3 CI:L1 Run doctests, unit tests, and functional tests CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 5, 2026

guyueh1 temporarily deployed to nemo-ci February 5, 2026 18:57 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 5, 2026 22:08 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 5, 2026 22:12 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 6, 2026 07:08 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 6, 2026 19:33 — with GitHub Actions Inactive

Fix test coverage

b68b5ca

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 requested a review from a team as a code owner February 9, 2026 21:21

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 9, 2026

guyueh1 temporarily deployed to nemo-ci February 9, 2026 21:22 — with GitHub Actions Inactive

guyueh1 had a problem deploying to nemo-ci February 9, 2026 21:49 — with GitHub Actions Error

Fix lint

b98dac5

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 9, 2026

guyueh1 temporarily deployed to nemo-ci February 9, 2026 21:51 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 9, 2026 21:56 — with GitHub Actions Inactive

fix lint

416d6df

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 9, 2026

guyueh1 temporarily deployed to nemo-ci February 9, 2026 22:31 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 9, 2026 22:36 — with GitHub Actions Inactive

terrykong added super-v3 and removed super-v3 labels Feb 10, 2026

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 10, 2026

guyueh1 temporarily deployed to nemo-ci February 10, 2026 00:25 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 10, 2026 00:53 — with GitHub Actions Inactive

Fix ut

52e66ae

Signed-off-by: root <root@gpu-254.slurm-workers-slurm.slurm.svc.cluster.local>

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 10, 2026

Merge branch 'main' into mxfp8_train

bb14de6

guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Feb 10, 2026

guyueh1 temporarily deployed to nemo-ci February 10, 2026 17:11 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci February 10, 2026 20:56 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Mxfp8 training fix sequence padding #1884

fix: Mxfp8 training fix sequence padding #1884

guyueh1 commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 5, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Mxfp8 training fix sequence padding #1884

Are you sure you want to change the base?

fix: Mxfp8 training fix sequence padding #1884

Conversation

guyueh1 commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

guyueh1 commented Feb 5, 2026 •

edited by coderabbitai bot

Loading