-
Notifications
You must be signed in to change notification settings - Fork 246
Draft: Use Megatron-Bridge recipes for megatron_cfg. #1894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ policy: | |
| ${.megatron_cfg.context_parallel_size}}, 2} | ||
| megatron_cfg: | ||
| enabled: true | ||
| megatron_recipe: megatron.bridge.recipes.qwen.qwen3.qwen3_32b_pretrain_config | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Find all Qwen3 related recipe files and configs
echo "=== Searching for Qwen3 recipe definitions ==="
find . -type f \( -name "*.py" -o -name "*.yaml" \) | xargs grep -l "qwen3.*recipe\|qwen3.*config" | head -20
echo -e "\n=== Looking for available Qwen3 megatron recipes ==="
rg "qwen3.*pretrain_config|def.*qwen3.*config" --type py -A 2 | head -50
echo -e "\n=== Checking for 1.7b Qwen3 recipe variants ==="
rg "qwen3.*1[._]7" --type py --type yaml
echo -e "\n=== Checking the specific recipe module path ==="
find . -path "*megatron/bridge/recipes/qwen/qwen3*" -type f | head -10Repository: NVIDIA-NeMo/RL Length of output: 3262 🏁 Script executed: #!/bin/bash
# Check the file in question and nearby distillation configs
echo "=== Current file content ==="
head -40 examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml
echo -e "\n=== Check other distillation YAML files for megatron_recipe usage ==="
find examples/configs/recipes/llm -name "*distillation*.yaml" -type f | xargs grep -l "megatron_recipe"
echo -e "\n=== Show megatron_recipe usage in distillation configs ==="
find examples/configs/recipes/llm -name "*distillation*.yaml" -type f -exec grep -H -B 2 -A 2 "megatron_recipe" {} \;Repository: NVIDIA-NeMo/RL Length of output: 2029 🏁 Script executed: #!/bin/bash
# Search for how megatron_recipe is used in the codebase (in Python code)
echo "=== How megatron_recipe is referenced in code ==="
rg "megatron_recipe" --type py -B 3 -A 3 | head -100
echo -e "\n=== Check if there's a mapping or factory for Qwen3 recipes ==="
rg "qwen3_.*pretrain_config|qwen3_.*model_config" --type pyRepository: NVIDIA-NeMo/RL Length of output: 2617 🏁 Script executed: #!/bin/bash
# Search for megatron.bridge recipe implementations
echo "=== Searching for megatron bridge recipes ==="
find . -path "*megatron*bridge*recipe*" -o -path "*megatron*recipe*" | grep -E "(\.py|\.yaml)$" | head -20
echo -e "\n=== Check if there's a separate install for megatron-bridge ==="
ls -la requirements*.txt pyproject.toml setup.py 2>/dev/null | head -10
echo -e "\n=== Search for any import of megatron.bridge ==="
rg "from megatron|import megatron" --type py -B 1 -A 1 | head -40
echo -e "\n=== Check the distillation_math.yaml base config ==="
cat examples/configs/distillation_math.yaml | head -100Repository: NVIDIA-NeMo/RL Length of output: 6630 🏁 Script executed: #!/bin/bash
# Check if there are test files or other distillation configs that might hint at expected models
echo "=== Check other distillation YAML configs to understand model/recipe patterns ==="
find examples/configs/recipes/llm -name "*distillation*.yaml" -type f -exec basename {} \; | sort
echo -e "\n=== Look at the FSDP variant for comparison ==="
head -40 examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yamlRepository: NVIDIA-NeMo/RL Length of output: 1702 🏁 Script executed: #!/bin/bash
# Check if the policy model_name is specified elsewhere or if it's inferred
echo "=== Search for how policy model_name is determined ==="
rg "model_name|student|policy" nemo_rl/models/ --type py -B 2 -A 2 | grep -A 5 -B 5 "1.7\|1_7\|student" | head -60
echo -e "\n=== Check if megatron_recipe can override model size or just provide base config ==="
rg "megatron_recipe|load_recipe" nemo_rl/models/ --type py -B 3 -A 3 | head -80Repository: NVIDIA-NeMo/RL Length of output: 4475 🏁 Script executed: #!/bin/bash
# Search for any Qwen3 recipe definitions in megatron-bridge or available recipes
echo "=== Search for all available Qwen3 recipes ==="
rg "qwen3.*pretrain_config|def.*qwen3_.*_config" -g "*.py"
echo -e "\n=== Check pyproject.toml for megatron-bridge dependency and version ==="
grep -A 10 -B 2 "megatron" pyproject.toml
echo -e "\n=== Check if there's a mapping of model sizes to recipes ==="
rg "1.7|1_7|1\.7|4b|8b|32b" nemo_rl/models/ --type py | grep -i recipeRepository: NVIDIA-NeMo/RL Length of output: 2883 🏁 Script executed: #!/bin/bash
# Check the git history or comments around this change
echo "=== Check if there are any recent changes or TODOs about this recipe ==="
rg "TODO|FIXME|XXX|qwen3_32b_pretrain_config" nemo_rl/ examples/ --type py --type yaml -B 2 -A 2
echo -e "\n=== Compare the megatron recipe variant with FSDP variant more carefully ==="
echo "=== FSDP (without megatron_recipe) ==="
grep -A 30 "^policy:" examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml | head -40
echo -e "\n=== Megatron (with megatron_recipe) ==="
grep -A 30 "^policy:" examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.yaml | head -40Repository: NVIDIA-NeMo/RL Length of output: 24191 Policy uses 32B megatron_recipe but should use 1.7B recipe for the student model. The policy (student model) is configured with 🤖 Prompt for AI Agents |
||
| teacher: | ||
| model_name: Qwen/Qwen3-32B | ||
| dtensor_cfg: | ||
|
|
@@ -30,6 +31,7 @@ teacher: | |
| enabled: false | ||
| sequence_packing: | ||
| enabled: true | ||
| megatron_recipe: megatron.bridge.recipes.qwen.qwen3.qwen3_32b_pretrain_config | ||
| megatron_cfg: | ||
| enabled: true | ||
| tensor_model_parallel_size: 4 | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: NVIDIA-NeMo/RL
Length of output: 3275
🏁 Script executed:
Repository: NVIDIA-NeMo/RL
Length of output: 2726
Update model to Llama 3.1 or verify recipe compatibility with Llama 3.3.
Line 9 specifies
meta-llama/Llama-3.3-70B-Instruct, but line 25 referencesllama31_70b_pretrain_config(Llama 3.1). Nollama33_70b_pretrain_configexists in the codebase. Either change the model tometa-llama/Llama-3.1-70B-Instructto match the recipe, or provide an alternative Llama 3.3–compatible recipe reference. All other similar configurations (e.g.,grpo_math_8B_megatron.yaml,sft_openmathinstruct2_megatron.yaml) use matching model and recipe versions.🤖 Prompt for AI Agents