Skip to content

11L MLP2x + LeakyReLU² + Legal TTT (val_bpb=1.2201, 3-seed mean, std=0.0015)#1057

Open
Programmerryoki wants to merge 1 commit intoopenai:mainfrom
Programmerryoki:submission/11L-MLP2x-TTT-1.2163
Open

11L MLP2x + LeakyReLU² + Legal TTT (val_bpb=1.2201, 3-seed mean, std=0.0015)#1057
Programmerryoki wants to merge 1 commit intoopenai:mainfrom
Programmerryoki:submission/11L-MLP2x-TTT-1.2163

Conversation

@Programmerryoki
Copy link
Copy Markdown

Summary

  • val_bpb: 1.2201 (3-seed mean, std 0.0015)
  • Artifact: ~15.0 MB (all 3 seeds under 16MB)
  • Platform: 8×H100 SXM, PyTorch 2.9.1+cu128

3-Seed Results

Seed Steps Pre-TTT BPB Post-TTT BPB Artifact
1337 7,821 1.3772 1.2184 14,986,599
42 7,833 1.4170 1.2212 15,010,826
2025 7,863 1.3899 1.2207 14,980,707
Mean 7,839 1.2201 (std 0.0015)

Configuration

11L / 512d / 8H / 4KV (GQA), MLP 2× with LeakyReLU(0.5)², seq_len 2048, BigramHash(4096), SmearGate, U-Net skips, XSA (last 4), LN Scale 1/√(L+1), EMA(0.997), int6 QAT + GPTQ-lite + zstd-22

Eval Protocol

Sliding window eval (stride=64) + legal score-first TTT: SGD(lr=0.002, momentum=0.9), 7 epochs per already-scored 32K chunk, cosine LR decay, all blocks unfrozen. Same protocol as PR #549.

Attribution

Built on PR #414 stack, LeakyReLU² (PR #493), Legal TTT (PR #461), XSA (PR #198), and community contributions.

Full details in records/track_10min_16mb/2026-03-29_11L_MLP2x_LeakyReLU_TTT/README.md

…3-seed mean, std=0.0015)

Records folder with full 3-seed validation.

Config: 11L/512d, 8H/4KV GQA, MLP 2x, seq_len 2048
Training: Muon + AdamW, EMA(0.997), int6 QAT + zstd-22, 600s on 8xH100
Eval: Sliding window stride=64 + Legal score-first TTT (7ep SGD)

3-seed results:
  Seed 1337: val_bpb=1.2184, 7821 steps, 14,986,599 bytes
  Seed 42:   val_bpb=1.2212, 7833 steps, 15,010,826 bytes
  Seed 2025: val_bpb=1.2207, 7863 steps, 14,980,707 bytes
  Mean:      val_bpb=1.2201 (std 0.0015)
  All artifacts under 16MB.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant