Conversation
📝 WalkthroughWalkthroughThe PR introduces FP8 current scaling configuration wrappers for GPT-OSS 20B models and parameterizes recipe names in SLURM execution scripts. Three shell scripts now use a Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/models/gpt_oss/slurm_pretrain.sh (1)
52-53: Inconsistent default pattern compared to SFT and PEFT scripts.The pretrain script hardcodes the FP8 recipe (
${MODEL_NAME}_pretrain_fp8_current_scaling_config) without allowing environment variable override, while PEFT and SFT scripts use the patternRECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_*_config}"which allows overriding viaRECIPE_NAMEenv var.If FP8 is the intended default for pretrain, consider using the same pattern for consistency:
♻️ Suggested fix for consistency
-# RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_config}" -RECIPE_NAME="${MODEL_NAME}_pretrain_fp8_current_scaling_config" +RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_fp8_current_scaling_config}" +# RECIPE_NAME="${MODEL_NAME}_pretrain_config"This maintains FP8 as the default while allowing users to override via
RECIPE_NAMEenvironment variable, consistent with the other scripts.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/models/gpt_oss/slurm_pretrain.sh` around lines 52 - 53, The script currently hardcodes RECIPE_NAME to "${MODEL_NAME}_pretrain_fp8_current_scaling_config" which prevents overriding via environment; update the assignment to use shell parameter expansion like RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_fp8_current_scaling_config}" so FP8 remains the default but users can override RECIPE_NAME (match the pattern used in SFT/PEFT scripts and keep the symbol RECIPE_NAME and the existing default recipe name unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/models/gpt_oss/slurm_pretrain.sh`:
- Around line 52-53: The script currently hardcodes RECIPE_NAME to
"${MODEL_NAME}_pretrain_fp8_current_scaling_config" which prevents overriding
via environment; update the assignment to use shell parameter expansion like
RECIPE_NAME="${RECIPE_NAME:-${MODEL_NAME}_pretrain_fp8_current_scaling_config}"
so FP8 remains the default but users can override RECIPE_NAME (match the pattern
used in SFT/PEFT scripts and keep the symbol RECIPE_NAME and the existing
default recipe name unchanged).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c36d08ed-be06-400b-ba8e-e74ba1e6f510
📒 Files selected for processing (5)
examples/models/gpt_oss/slurm_peft.shexamples/models/gpt_oss/slurm_pretrain.shexamples/models/gpt_oss/slurm_sft.shsrc/megatron/bridge/recipes/gpt_oss/__init__.pysrc/megatron/bridge/recipes/gpt_oss/gpt_oss.py
cuichenx
left a comment
There was a problem hiding this comment.
LGTM, could you mention the new fp8 recipes in the readme file as well?
will blackwell fp8 recipes be in a separate PR?
yes will update readme file soon. yup decided to make it as a separate PR |
What does this PR do ?
Add explicit GPT-OSS 20B Hopper FP8 current-scaling recipe variants for pretrain, SFT, and PEFT while keeping the existing GPT-OSS recipes as the BF16 baseline
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit