Opt muon settings #697

klei22 · 2025-12-12T17:20:42Z

This pull request introduces support for optimizer hyperparameter presets, specifically a "speedrun" preset for both the AdamW and Muon optimizers, and updates experiment configuration files to utilize this new feature. The main changes include adding the --optimizer_preset argument, implementing logic to apply preset hyperparameters, and updating YAML files to compare optimizers using the new preset.

Optimizer preset support:

Added a new command-line argument --optimizer_preset (default: "none", options: "none", "speedrun") to allow selection of preset hyperparameters for supported optimizers (AdamW and Muon). (train_args.py)
Implemented the _apply_optimizer_presets method in the training script to assign preset hyperparameters for "speedrun" runs, affecting learning rate, betas, weight decay, and epsilon for both AdamW and Muon optimizers. This is now called during optimizer creation. (train.py)

Experiment configuration updates:

Added a new experiment configuration file, explorations/muon_speedrun_preset.yaml, to compare Muon and AdamW optimizers using the "speedrun" preset on the minipile dataset.
Updated explorations/muon_vs_adamw.yaml to include an AdamW baseline for comparison.

Copilot

Pull request overview

This PR adds an optimizer preset system that allows users to quickly apply pre-configured hyperparameter sets for supported optimizers. The "speedrun" preset provides optimized settings for AdamW and Muon optimizers based on proven configurations.

Adds a new --optimizer_preset CLI argument with support for "none" and "speedrun" presets
Implements preset application logic that automatically configures hyperparameters for AdamW and Muon optimizers when the speedrun preset is selected
Adds a new exploration configuration file to compare speedrun presets between optimizers

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
train_args.py	Adds the `--optimizer_preset` argument definition with choices for "none" and "speedrun" presets
train.py	Implements `_apply_optimizer_presets()` method that applies speedrun hyperparameters for AdamW and Muon optimizers
explorations/muon_vs_adamw.yaml	Adds a third parameter group for testing AdamW with default settings
explorations/muon_speedrun_preset.yaml	New configuration file comparing speedrun presets for Muon and AdamW against baseline AdamW

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T17:23:52Z

train.py

+        elif self.args.optimizer == "muon":
+            self.args.learning_rate = 2e-2
+            self.args.muon_momentum = 0.95
+            self.args.weight_decay = 0.0
+
+        if self.master_process:
+            print(f"Applied {preset} preset for optimizer '{self.args.optimizer}'.")
+
+


When the speedrun preset is specified but the optimizer is neither "adamw" nor "muon", the function will print "Applied speedrun preset for optimizer 'X'" even though no preset values were actually applied. This could mislead users into thinking the preset was applied when it wasn't. Consider adding an else clause that either warns the user or skips the print statement when the optimizer doesn't have a preset implementation.

Suggested change

elif self.args.optimizer == "muon":

self.args.learning_rate = 2e-2

self.args.muon_momentum = 0.95

self.args.weight_decay = 0.0

if self.master_process:

print(f"Applied {preset} preset for optimizer '{self.args.optimizer}'.")

if self.master_process:

print(f"Applied {preset} preset for optimizer 'adamw'.")

elif self.args.optimizer == "muon":

self.args.learning_rate = 2e-2

self.args.muon_momentum = 0.95

self.args.weight_decay = 0.0

if self.master_process:

print(f"Applied {preset} preset for optimizer 'muon'.")

else:

if self.master_process:

print(f"Warning: No '{preset}' preset available for optimizer '{self.args.optimizer}'. No preset values were applied.")

Copilot · 2025-12-12T17:23:52Z

explorations/muon_speedrun_preset.yaml

+    optimizer_preset: ["speedrun"]
+  - optimizer: ["muon"]
+    optimizer_preset: ["speedrun"]
+    muon_momentum: [0.95]


The muon_momentum parameter is redundantly specified here since line 512 in the _apply_optimizer_presets() method already sets this to 0.95 when the speedrun preset is used with the muon optimizer. This redundant specification could cause confusion about which value takes precedence.

Suggested change

muon_momentum: [0.95]

klei22 and others added 2 commits December 8, 2025 23:31

Add speedrun optimizer preset and muon exploration

945eb73

Add preset with optimized muon settings

31df286

klei22 requested review from Copilot and gkielian December 12, 2025 17:20

Copilot started reviewing on behalf of klei22 December 12, 2025 17:21 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Opt muon settings #697

Opt muon settings #697

Uh oh!

klei22 commented Dec 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Opt muon settings #697

Are you sure you want to change the base?

Opt muon settings #697

Uh oh!

Conversation

klei22 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

klei22 commented Dec 12, 2025 •

edited

Loading