Add identity-first attention MLP sweep exploration #663

klei22 · 2025-10-19T05:46:58Z

The First MLP is Free

This pull request introduces a new experiment configuration file to explore the impact of varying the MLP size in the first transformer block when using identity attention. The configuration sets up a sweep over several first-layer MLP widths, while keeping the rest of the model parameters constant.

While still utilizing shared parameters during training of wte and lm_head, this aims to use a lookup table to replace the wte and the first MLP.

This sweep is intended to scope if adding an MLP (or any module without state) for inference.

Experiment setup and parameter sweep:

Added explorations/identity_first_layer_mlp_sweep.yaml to define a sweep that varies the first-layer MLP size in a 4-layer transformer, with the first block using identity attention and subsequent blocks using causal attention.
Configured the sweep to test five different first-layer MLP sizes (512, 1024, 1536, 2048, 2560), while keeping other layers at the default size of 2048.
Set shared base hyperparameters for all runs, including block_size, n_layer, n_head, n_embd, dataset, device, dtype, and compilation settings.
Specified per-layer attention variants, with only the first block using "identity" and the rest using "causal".

Copilot

Pull Request Overview

This PR introduces an experimental configuration to evaluate the performance impact of varying MLP sizes in the first transformer block when using identity attention. The goal is to determine whether adding an MLP (or stateless module) provides meaningful improvements during inference, potentially enabling replacement of the word token embedding (wte) and first MLP with a lookup table.

Adds a YAML configuration sweep for first-layer MLP size variations (512 to 2560)
Configures a 4-layer transformer with identity attention only in the first block
Sets up shared hyperparameters across all sweep runs

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Add identity-first attention MLP sweep exploration

03b7143

klei22 requested review from Copilot and gkielian October 19, 2025 05:46

Copilot AI reviewed Oct 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add identity-first attention MLP sweep exploration #663

Add identity-first attention MLP sweep exploration #663

Uh oh!

klei22 commented Oct 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add identity-first attention MLP sweep exploration #663

Are you sure you want to change the base?

Add identity-first attention MLP sweep exploration #663

Uh oh!

Conversation

klei22 commented Oct 19, 2025

The First MLP is Free

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant