-
Notifications
You must be signed in to change notification settings - Fork 27
Opt muon settings #697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Opt muon settings #697
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds an optimizer preset system that allows users to quickly apply pre-configured hyperparameter sets for supported optimizers. The "speedrun" preset provides optimized settings for AdamW and Muon optimizers based on proven configurations.
- Adds a new
--optimizer_presetCLI argument with support for "none" and "speedrun" presets - Implements preset application logic that automatically configures hyperparameters for AdamW and Muon optimizers when the speedrun preset is selected
- Adds a new exploration configuration file to compare speedrun presets between optimizers
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| train_args.py | Adds the --optimizer_preset argument definition with choices for "none" and "speedrun" presets |
| train.py | Implements _apply_optimizer_presets() method that applies speedrun hyperparameters for AdamW and Muon optimizers |
| explorations/muon_vs_adamw.yaml | Adds a third parameter group for testing AdamW with default settings |
| explorations/muon_speedrun_preset.yaml | New configuration file comparing speedrun presets for Muon and AdamW against baseline AdamW |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif self.args.optimizer == "muon": | ||
| self.args.learning_rate = 2e-2 | ||
| self.args.muon_momentum = 0.95 | ||
| self.args.weight_decay = 0.0 | ||
|
|
||
| if self.master_process: | ||
| print(f"Applied {preset} preset for optimizer '{self.args.optimizer}'.") | ||
|
|
||
|
|
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the speedrun preset is specified but the optimizer is neither "adamw" nor "muon", the function will print "Applied speedrun preset for optimizer 'X'" even though no preset values were actually applied. This could mislead users into thinking the preset was applied when it wasn't. Consider adding an else clause that either warns the user or skips the print statement when the optimizer doesn't have a preset implementation.
| elif self.args.optimizer == "muon": | |
| self.args.learning_rate = 2e-2 | |
| self.args.muon_momentum = 0.95 | |
| self.args.weight_decay = 0.0 | |
| if self.master_process: | |
| print(f"Applied {preset} preset for optimizer '{self.args.optimizer}'.") | |
| if self.master_process: | |
| print(f"Applied {preset} preset for optimizer 'adamw'.") | |
| elif self.args.optimizer == "muon": | |
| self.args.learning_rate = 2e-2 | |
| self.args.muon_momentum = 0.95 | |
| self.args.weight_decay = 0.0 | |
| if self.master_process: | |
| print(f"Applied {preset} preset for optimizer 'muon'.") | |
| else: | |
| if self.master_process: | |
| print(f"Warning: No '{preset}' preset available for optimizer '{self.args.optimizer}'. No preset values were applied.") |
| optimizer_preset: ["speedrun"] | ||
| - optimizer: ["muon"] | ||
| optimizer_preset: ["speedrun"] | ||
| muon_momentum: [0.95] |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The muon_momentum parameter is redundantly specified here since line 512 in the _apply_optimizer_presets() method already sets this to 0.95 when the speedrun preset is used with the muon optimizer. This redundant specification could cause confusion about which value takes precedence.
| muon_momentum: [0.95] |
This pull request introduces support for optimizer hyperparameter presets, specifically a "speedrun" preset for both the AdamW and Muon optimizers, and updates experiment configuration files to utilize this new feature. The main changes include adding the
--optimizer_presetargument, implementing logic to apply preset hyperparameters, and updating YAML files to compare optimizers using the new preset.Optimizer preset support:
--optimizer_preset(default: "none", options: "none", "speedrun") to allow selection of preset hyperparameters for supported optimizers (AdamWandMuon). (train_args.py)_apply_optimizer_presetsmethod in the training script to assign preset hyperparameters for "speedrun" runs, affecting learning rate, betas, weight decay, and epsilon for both AdamW and Muon optimizers. This is now called during optimizer creation. (train.py)Experiment configuration updates:
explorations/muon_speedrun_preset.yaml, to compare Muon and AdamW optimizers using the "speedrun" preset on the minipile dataset.explorations/muon_vs_adamw.yamlto include an AdamW baseline for comparison.