Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean) by dexhunter · Pull Request #1279 · openai/parameter-golf

dexhunter · 2026-04-03T02:38:38Z

Summary

val_bpb = 1.0924 (3-seed mean, std 0.0008) | 2.5133 nats | ~15.98 MB | 8xH100 SXM, 590s | No TTT
Improves PR Record: MuonEq-R + Depth Recurrence + Mixed Int5/Int6 GPTQ — val_bpb 1.0929 (3-seed mean) #1260 (1.0929) by using N_INT6=61 (one more int6 layer) with a smaller mini runner
All 3 seeds under 16MB (max: 15,996,591)
No SLOT, no TTT, no eval-time adaptation, fully legal

Key Innovation: N_INT6=61

PR #1260 used N_INT6=60. By regenerating a smaller self-extracting mini runner (21,396 bytes vs 87K standalone), we freed enough artifact budget to fit one additional int6 layer. N_INT6=61 improves BPP by ~0.001 per seed with zero architecture change — purely a quantization precision upgrade.

Results (8xH100 80GB SXM, PyTorch 2.9.1+cu128)

Seed	Steps	ms/step	Sliding BPB	val_loss (nats)	Artifact
42	5,540	106.5	1.0917	2.51171	15,996,591
0	5,536	106.6	1.0923	2.51309	15,974,481
7	5,538	106.6	1.0932	2.51522	15,982,332
Mean	5,538	106.6	1.0924	2.51334	15,984,468

Changes from PR #1218

	PR #1218	This
val_bpb	1.09785	1.09241 (-0.00544)
Optimizer	Muon	MuonEq-R
Depth recurrence	None	Layers 4,5 repeated
Mixed quantization	No	61 int6 + 5 int5

Credits

@clarkkev for PR Record: 4096-Vocab + 4.0-MLP-mult + 0.085-WD + Simplifications — val_bpb 1.09785 (3-seed mean) #1218 (4096-Vocab + MLP 4x + WD 0.085)
@abaybektursun for PR Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019 (GPTQ + XSA + BigramHash baseline)
@msisovic for PR Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA #1204 (depth recurrence concept)
@dexhunter for PR Record: MuonEq-R + Depth Recurrence + Mixed Int5/Int6 GPTQ — val_bpb 1.0929 (3-seed mean) #1260 (MuonEq-R + recurrence foundation)

Test plan

3-seed verification (42, 0, 7) — all pass artifact + time + score
All seeds under 16,000,000 bytes (seed 42 verified 3× with consistent fit)
Train < 600s, eval < 600s
No TTT, no SLOT, no forbidden techniques
Rule checker passed (log + script)

@clarkkev

… (3-seed mean) Improves PR openai#1260 (1.0929) by using N_INT6=61 (one more int6 layer) with a smaller mini runner (21,396 bytes) that creates enough headroom. 3-seed mean: 1.0924 BPB / 2.5133 nats (seeds 42, 0, 7) All seeds under 16MB (max: 15,996,591 bytes) No TTT, no SLOT, no eval-time adaptation. Techniques: MuonEq-R optimizer, depth recurrence (layers 4,5 shared MLP), 61 int6 + 5 int5 Hessian-ranked GPTQ, brotli-11 compression. Built on PR openai#1218 by @clarkkev.

New architecture: instead of N independent transformer blocks, use K shared blocks cycled to N virtual layers, with per-layer FiLM conditioning (learned scale vectors for attn/mlp/residual per virtual layer). Saves massive parameters — 3 shared blocks for 9 virtual layers uses ~6.5M vs 17.1M params, freeing artifact budget. This is genuinely novel for parameter-golf: no submission has tried feature-wise linear modulation for depth conditioning. The closest is PR openai#1279's LoRA adapters, but FiLM is much cheaper (1024 params per virtual layer vs ~8K for LoRA rank-4). Experiments running: Standard 9L vs FiLM 3→9 vs FiLM 3→18 vs FiLM 1→9. Also includes best_full_run.log: Kitchen Sink seq2048 at 600s reached 1.2698 BPB (1338 steps, 15.6MB artifact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dexhunter mentioned this pull request Apr 3, 2026

Record: MuonEq-R + Depth Recurrence + WD=0.090 + All-Int6 GPTQ — val_bpb 1.0912 (3-seed mean) #1285

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean)#1279

Record: MuonEq-R + Depth Recurrence + N61 Mixed GPTQ — val_bpb 1.0924 (3-seed mean)#1279
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:muoneqr-recurrence-n61-mixedquant

dexhunter commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dexhunter commented Apr 3, 2026

Summary

Key Innovation: N_INT6=61

Results (8xH100 80GB SXM, PyTorch 2.9.1+cu128)

Changes from PR #1218

Credits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant