Skip to content

Conversation

@klei22
Copy link
Collaborator

@klei22 klei22 commented Dec 12, 2025

This pull request introduces support for optimizer hyperparameter presets, specifically a "speedrun" preset for both the AdamW and Muon optimizers, and updates experiment configuration files to utilize this new feature. The main changes include adding the --optimizer_preset argument, implementing logic to apply preset hyperparameters, and updating YAML files to compare optimizers using the new preset.

Optimizer preset support:

  • Added a new command-line argument --optimizer_preset (default: "none", options: "none", "speedrun") to allow selection of preset hyperparameters for supported optimizers (AdamW and Muon). (train_args.py)
  • Implemented the _apply_optimizer_presets method in the training script to assign preset hyperparameters for "speedrun" runs, affecting learning rate, betas, weight decay, and epsilon for both AdamW and Muon optimizers. This is now called during optimizer creation. (train.py)

Experiment configuration updates:

  • Added a new experiment configuration file, explorations/muon_speedrun_preset.yaml, to compare Muon and AdamW optimizers using the "speedrun" preset on the minipile dataset.
  • Updated explorations/muon_vs_adamw.yaml to include an AdamW baseline for comparison.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optimizer preset system that allows users to quickly apply pre-configured hyperparameter sets for supported optimizers. The "speedrun" preset provides optimized settings for AdamW and Muon optimizers based on proven configurations.

  • Adds a new --optimizer_preset CLI argument with support for "none" and "speedrun" presets
  • Implements preset application logic that automatically configures hyperparameters for AdamW and Muon optimizers when the speedrun preset is selected
  • Adds a new exploration configuration file to compare speedrun presets between optimizers

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
train_args.py Adds the --optimizer_preset argument definition with choices for "none" and "speedrun" presets
train.py Implements _apply_optimizer_presets() method that applies speedrun hyperparameters for AdamW and Muon optimizers
explorations/muon_vs_adamw.yaml Adds a third parameter group for testing AdamW with default settings
explorations/muon_speedrun_preset.yaml New configuration file comparing speedrun presets for Muon and AdamW against baseline AdamW

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +510 to +518
elif self.args.optimizer == "muon":
self.args.learning_rate = 2e-2
self.args.muon_momentum = 0.95
self.args.weight_decay = 0.0

if self.master_process:
print(f"Applied {preset} preset for optimizer '{self.args.optimizer}'.")


Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the speedrun preset is specified but the optimizer is neither "adamw" nor "muon", the function will print "Applied speedrun preset for optimizer 'X'" even though no preset values were actually applied. This could mislead users into thinking the preset was applied when it wasn't. Consider adding an else clause that either warns the user or skips the print statement when the optimizer doesn't have a preset implementation.

Suggested change
elif self.args.optimizer == "muon":
self.args.learning_rate = 2e-2
self.args.muon_momentum = 0.95
self.args.weight_decay = 0.0
if self.master_process:
print(f"Applied {preset} preset for optimizer '{self.args.optimizer}'.")
if self.master_process:
print(f"Applied {preset} preset for optimizer 'adamw'.")
elif self.args.optimizer == "muon":
self.args.learning_rate = 2e-2
self.args.muon_momentum = 0.95
self.args.weight_decay = 0.0
if self.master_process:
print(f"Applied {preset} preset for optimizer 'muon'.")
else:
if self.master_process:
print(f"Warning: No '{preset}' preset available for optimizer '{self.args.optimizer}'. No preset values were applied.")

Copilot uses AI. Check for mistakes.
optimizer_preset: ["speedrun"]
- optimizer: ["muon"]
optimizer_preset: ["speedrun"]
muon_momentum: [0.95]
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The muon_momentum parameter is redundantly specified here since line 512 in the _apply_optimizer_presets() method already sets this to 0.95 when the speedrun preset is used with the muon optimizer. This redundant specification could cause confusion about which value takes precedence.

Suggested change
muon_momentum: [0.95]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant