Opt muon settings #698

klei22 · 2025-12-12T20:14:52Z

This pull request introduces support for optimizer hyperparameter presets, specifically a "speedrun" preset for both AdamW and Muon optimizers, and updates configuration files to utilize this feature for more streamlined experiment setup and reproducibility. The changes also add a new YAML preset for comparing Muon and AdamW optimizers on the minipile dataset.

Optimizer preset support and integration:

Added a new --optimizer_preset argument to the training script, allowing users to select from predefined optimizer hyperparameter configurations (currently "none" or "speedrun"). (train_args.py)
Implemented the _apply_optimizer_presets method in the training logic to automatically set optimizer hyperparameters when the "speedrun" preset is selected, with specific values for AdamW and Muon. This method is now called during optimizer creation. (train.py)

Experiment configuration updates:

Added a new experiment configuration file, muon_speedrun_preset.yaml, to compare NanoGPT speedrun optimizer presets for Muon and AdamW on the minipile dataset, leveraging the new preset functionality.
Updated muon_vs_adamw.yaml to include an AdamW optimizer group, enabling direct comparison with Muon in experiments.

Copilot

Pull request overview

This pull request introduces optimizer hyperparameter presets to streamline experiment configuration and improve reproducibility. The main addition is a "speedrun" preset that applies predefined hyperparameters for AdamW and Muon optimizers, along with YAML configuration files to leverage this new functionality.

Key Changes:

Added --optimizer_preset CLI argument with "speedrun" preset support for AdamW and Muon optimizers
Implemented automatic hyperparameter application through _apply_optimizer_presets() method called during optimizer creation
Created new experiment configurations comparing optimizer performance with and without presets

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
train_args.py	Adds `--optimizer_preset` argument to support predefined optimizer hyperparameter configurations
train.py	Implements `_apply_optimizer_presets()` method to apply speedrun preset values for AdamW and Muon optimizers
explorations/muon_vs_adamw.yaml	Adds AdamW parameter group to enable comparison experiments
explorations/muon_speedrun_preset.yaml	New configuration file to compare speedrun presets for Muon and AdamW on minipile dataset

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T20:18:49Z

train.py

+
+        if self.args.optimizer == "adamw":
+            self.args.learning_rate = 6e-4
+            self.args.beta1, self.args.beta2 = 0.9, 0.95


The attributes beta1 and beta2 are being set here, but based on the codebase, the AdamW optimizer in optimizer_variants.py uses args.beta1 and args.beta2 (not args.adamw_betas). While this appears correct, it's worth noting that there's also an --adamw_betas argument defined in train_args.py that takes 2 values, which could create confusion. Consider whether the preset should also update adamw_betas for consistency, or if the naming convention should be clarified.

Suggested change

self.args.beta1, self.args.beta2 = 0.9, 0.95

self.args.beta1, self.args.beta2 = 0.9, 0.95

self.args.adamw_betas = (0.9, 0.95)

Copilot · 2025-12-12T20:18:49Z

explorations/muon_speedrun_preset.yaml

+  - optimizer: ["muon"]
+    optimizer_preset: ["speedrun"]
+    muon_momentum: [0.95]
+  - optimizer: ["adamw"]


There are three parameter groups defined here, with line 9 appearing to be a duplicate adamw entry without any configuration. The first two groups (lines 4-5 and 6-8) have specific configurations with the speedrun preset, but the third group (line 9) only specifies the optimizer without any preset or other parameters. This creates ambiguity about what configuration this third group represents. Consider removing this duplicate entry or clarifying its purpose if it's intended to represent a baseline adamw configuration without the preset.

Suggested change

- optimizer: ["adamw"]

Copilot · 2025-12-12T20:18:50Z

explorations/muon_speedrun_preset.yaml

+    optimizer_preset: ["speedrun"]
+  - optimizer: ["muon"]
+    optimizer_preset: ["speedrun"]
+    muon_momentum: [0.95]


The parameter group on line 8 explicitly sets muon_momentum: [0.95], but the speedrun preset (line 7) already sets this exact value in train.py line 512. This creates redundancy where the same value is specified in both the YAML configuration and the preset code. Consider removing the explicit muon_momentum setting from the YAML since the preset handles this, unless there's a specific reason to document or override the preset value.

Suggested change

muon_momentum: [0.95]

Copilot · 2025-12-12T20:18:50Z

train_args.py

+    training_group.add_argument(
+        "--optimizer_preset",
+        type=str,
+        default="none",
+        choices=["none", "speedrun"],
+        help="Optional preset hyperparameters for supported optimizers (currently AdamW and Muon).",
+    )


The --optimizer_preset argument is positioned in the middle of optimizer-specific sections (between MUON and ADAMW). Since this preset applies to multiple optimizers (AdamW and Muon as noted in the help text), consider moving this argument to a more general location, such as right after the --optimizer argument or in a separate "Optimizer Presets" section, to better reflect its cross-optimizer nature and improve discoverability.

Copilot · 2025-12-12T20:18:50Z

explorations/muon_vs_adamw.yaml

    adamw_eps: ["1e-8"]
  - optimizer: ["muon"]
    learning_rate: ["0.05"]
    muon_momentum: [0.95]


This parameter group only specifies the optimizer without any configuration parameters. While this might be intentional to test default AdamW settings alongside the configured groups above, it creates ambiguity about the purpose of this third group. Consider either removing this if it's not needed, or adding a comment to clarify that this represents a baseline/default AdamW configuration.

Suggested change

muon_momentum: [0.95]

muon_momentum: [0.95]

# Baseline/default AdamW configuration (uses all default AdamW settings)

klei22 and others added 2 commits December 8, 2025 23:31

Add speedrun optimizer preset and muon exploration

945eb73

Add preset with optimized muon settings

31df286

klei22 requested review from Copilot and gkielian December 12, 2025 20:14

Copilot started reviewing on behalf of klei22 December 12, 2025 20:15 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Opt muon settings #698

Opt muon settings #698

Uh oh!

klei22 commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	self.args.beta1, self.args.beta2 = 0.9, 0.95
	self.args.beta1, self.args.beta2 = 0.9, 0.95
	self.args.adamw_betas = (0.9, 0.95)

	muon_momentum: [0.95]
	muon_momentum: [0.95]
	# Baseline/default AdamW configuration (uses all default AdamW settings)

Opt muon settings #698

Are you sure you want to change the base?

Opt muon settings #698

Uh oh!

Conversation

klei22 commented Dec 12, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant