GPT OSS fp8 recipes by weijiac0619 · Pull Request #2687 · NVIDIA-NeMo/Megatron-Bridge

weijiac0619 · 2026-03-06T19:26:14Z

What does this PR do ?

Add explicit GPT-OSS 20B Hopper FP8 current-scaling recipe variants for pretrain, SFT, and PEFT while keeping the existing GPT-OSS recipes as the BF16 baseline

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

New Features
- Added new recipe configurations for GPT-OSS 20B models with FP8 current scaling support for pretraining, supervised fine-tuning, and parameter-efficient fine-tuning workflows.
- Enhanced training scripts with configurable recipe variables for greater flexibility in experiment setup.

copy-pr-bot · 2026-03-06T19:26:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-06T20:01:11Z

📝 Walkthrough

Walkthrough

The PR introduces FP8 current scaling configuration wrappers for GPT-OSS 20B models and parameterizes recipe names in SLURM execution scripts. Three shell scripts now use a RECIPE_NAME variable for flexibility, while the gpt_oss recipe module adds a helper function and three configuration factory wrappers for FP8 current scaling.

Changes

Cohort / File(s)	Summary
SLURM Script Updates `examples/models/gpt_oss/slurm_peft.sh`, `examples/models/gpt_oss/slurm_pretrain.sh`, `examples/models/gpt_oss/slurm_sft.sh`	Added `RECIPE_NAME` variable with default values specific to each script (peft, pretrain, sft) and replaced hardcoded recipe references with `${RECIPE_NAME}` for improved flexibility.
Recipe Module Exports `src/megatron/bridge/recipes/gpt_oss/__init__.py`	Added imports and public exports for three new FP8 current scaling configuration functions: `gpt_oss_20b_peft_fp8_current_scaling_config`, `gpt_oss_20b_pretrain_fp8_current_scaling_config`, and `gpt_oss_20b_sft_fp8_current_scaling_config`.
FP8 Current Scaling Configurations `src/megatron/bridge/recipes/gpt_oss/gpt_oss.py`	Introduced `_enable_gpt_oss_hopper_fp8_current_scaling()` helper function and three wrapper config factories that apply FP8 current scaling (setting `mixed_precision` to "bf16_with_fp8_current_scaling_mixed" and enabling `moe_router_padding_for_fp8`) to existing pretrain, SFT, and PEFT configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

GPT-OSS examples #2422: Introduced the same GPT-OSS SLURM example scripts and earlier recipe configurations that are now being enhanced with FP8 current scaling support and parameterized recipe names.

Suggested reviewers

cuichenx
yaoyu-33

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces FP8 recipe configurations for GPT-OSS 20B models affecting numerical precision, but lacks test coverage, performance benchmarks, validation data, and documentation of convergence results.	Add comprehensive unit tests for FP8 configurations in existing test files and create new tests similar to Nemotron/Llama recipes, then update PR description with validation data, performance metrics, and hardware context.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'GPT OSS fp8 recipes' accurately captures the main change: introducing FP8 current scaling recipe configurations for GPT-OSS models across pretrain, SFT, and PEFT workflows.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch weijia/gpt_oss_fp8_examples

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

examples/models/gpt_oss/slurm_pretrain.sh (1)
52-53: Inconsistent default pattern compared to SFT and PEFT scripts.

The pretrain script hardcodes the FP8 recipe (${MODEL_NAME}_pretrain_fp8_current_scaling_config) without allowing environment variable override, while PEFT and SFT scripts use the pattern RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_*_config}" which allows overriding via RECIPE_NAME env var.

If FP8 is the intended default for pretrain, consider using the same pattern for consistency:
♻️ Suggested fix for consistency
-# RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_config}"
-RECIPE_NAME="${MODEL_NAME}_pretrain_fp8_current_scaling_config"
+RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_fp8_current_scaling_config}"
+# RECIPE_NAME="${MODEL_NAME}_pretrain_config"
This maintains FP8 as the default while allowing users to override via RECIPE_NAME environment variable, consistent with the other scripts.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/models/gpt_oss/slurm_pretrain.sh` around lines 52 - 53, The script
currently hardcodes RECIPE_NAME to
"${MODEL_NAME}_pretrain_fp8_current_scaling_config" which prevents overriding
via environment; update the assignment to use shell parameter expansion like
RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_fp8_current_scaling_config}"
so FP8 remains the default but users can override RECIPE_NAME (match the pattern
used in SFT/PEFT scripts and keep the symbol RECIPE_NAME and the existing
default recipe name unchanged).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/models/gpt_oss/slurm_pretrain.sh`:
- Around line 52-53: The script currently hardcodes RECIPE_NAME to
"${MODEL_NAME}_pretrain_fp8_current_scaling_config" which prevents overriding
via environment; update the assignment to use shell parameter expansion like
RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_fp8_current_scaling_config}"
so FP8 remains the default but users can override RECIPE_NAME (match the pattern
used in SFT/PEFT scripts and keep the symbol RECIPE_NAME and the existing
default recipe name unchanged).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c36d08ed-be06-400b-ba8e-e74ba1e6f510

📥 Commits

Reviewing files that changed from the base of the PR and between 1d25ea2 and 5cf47ac.

📒 Files selected for processing (5)

examples/models/gpt_oss/slurm_peft.sh
examples/models/gpt_oss/slurm_pretrain.sh
examples/models/gpt_oss/slurm_sft.sh
src/megatron/bridge/recipes/gpt_oss/__init__.py
src/megatron/bridge/recipes/gpt_oss/gpt_oss.py

cuichenx

LGTM, could you mention the new fp8 recipes in the readme file as well?
will blackwell fp8 recipes be in a separate PR?

weijiac0619 · 2026-03-06T20:46:41Z

LGTM, could you mention the new fp8 recipes in the readme file as well? will blackwell fp8 recipes be in a separate PR?

yes will update readme file soon. yup decided to make it as a separate PR

fp8 recipes

4cf646f

weijiac0619 requested a review from cuichenx March 6, 2026 19:29

weijiac0619 added 2 commits March 6, 2026 11:53

clean

bf37ac4

clean

5cf47ac

weijiac0619 marked this pull request as ready for review March 6, 2026 19:55

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

cuichenx reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT OSS fp8 recipes#2687

GPT OSS fp8 recipes#2687
weijiac0619 wants to merge 3 commits intomainfrom
weijia/gpt_oss_fp8_examples

weijiac0619 commented Mar 6, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

cuichenx left a comment

Uh oh!

weijiac0619 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

weijiac0619 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cuichenx left a comment

Choose a reason for hiding this comment

Uh oh!

weijiac0619 commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

weijiac0619 commented Mar 6, 2026 •

edited

Loading