Record: Combined 3-Layer Recurrence + Parallel Residuals + Polar Express + Brotli — val_bpb 1.1067 (3-seed mean) by erichroepke · Pull Request #1396 · openai/parameter-golf

erichroepke · 2026-04-06T00:03:23Z

Summary

val_bpb: 1.1067 (3-seed mean, std 0.0013) — beats SOTA (1.1147) by 0.008 BPB
Artifact: 13.87 MB (2.13 MB headroom under 16MB cap)
Clean submission — no TTT, no SLOT, no n-gram cache
8×H100 SXM, 600s training

Results

Seed	Sliding BPB	Artifact
1337	1.1080	13,866,319
42	1.1055	13,871,505
2025	1.1067	13,861,924
Mean	1.1067

What This Is

I'm a documentary filmmaker, not an ML engineer. I used Claude Opus 4.6 as a co-author to systematically analyze all open PRs in the competition, identified that @Omrigotlieb's #1344 and @dexhunter's #1392 each had techniques the other was missing, and merged them into a single stack that neither had tested.

The strategic decisions were mine. The code comprehension and merge engineering were AI-assisted. This is my first ML submission of any kind.

Novel Contribution

First submission combining 3-layer depth recurrence (from #1344) with parallel residuals (from #1392). Neither PR tested this combination. 2.13 MB of unused artifact headroom identified as future optimization opportunity.

Techniques Combined

Technique	Source	Setting
3-layer depth recurrence	@Omrigotlieb #1344	`RECUR_LAYERS=3,4,5`
Parallel residuals	@dexhunter #1392	`PARALLEL_START_LAYER=7`
Polar Express Newton-Schulz	@Omrigotlieb #1344	4 minimax-optimal steps
MuonEq-R	#1344	Row-norm before NS
Brotli + byte-shuffle	@dexhunter #1392	Replaces lzma
QK-Gain 5.0	@dexhunter #1392	Per-head gain
WD=0.105	@Omrigotlieb #1344	Higher WD for compression
Full Hessian GPTQ int6	Both	Standard
No TTT	Clean	Removed for compliance

Reproduction

VOCAB_SIZE=1024 QK_GAIN_INIT=5.0 RECUR_LAYERS="3,4,5" \
PARALLEL_START_LAYER=7 MUON_WD=0.105 MUON_EQ_R=1 \
SEED=1337 torchrun --standalone --nproc_per_node=8 train_gpt.py

Test plan

3-seed validation (1337, 42, 2025)
All artifacts under 16,000,000 bytes (max: 13,871,505)
Clean — no TTT, no SLOT, no n-gram cache
Beats merged SOTA by 0.008 BPP

Credits

The techniques belong to the people who invented them. I combined their work.

@Omrigotlieb (PR Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) #1344) — 3-layer recurrence, Polar Express, MuonEq-R, WD=0.105
@dexhunter (PR Add record: SP4096 + Depth Recurrence + Parallel Residuals + QK-Gain + Brotli (1.1020 BPB) #1392) — Parallel residuals, Brotli, QK-Gain 5.0
@abaybektursun (PR Record: AR Self-Gen GPTQ + XSA-all + BigramHash 3072×112 — val_bpb 1.11473 (3-seed mean) #1019) — Merged SOTA base stack
All upstream contributors credited in Record: SP4096 + Polar Express + MuonEq-R + Depth Recurrence — 1.0923 BPB (3-seed) #1344 and Add record: SP4096 + Depth Recurrence + Parallel Residuals + QK-Gain + Brotli (1.1020 BPB) #1392

🤖 Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

…r Express + Brotli — val_bpb 1.1067 (3-seed mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Robby955 · 2026-04-06T00:54:46Z

Your submission score is higher (worse) then the PRs (1344,1392) you cited for it, meaning it actually made the model worse, not better.

erichroepke · 2026-04-06T04:11:48Z

Withdrawing — ran with SP1024 instead of SP4096, which negated the combination gains. Will resubmit with proper SP4096 data once generated. Thanks for the feedback.

erichroepke · 2026-04-06T14:46:33Z

Hey — just submitted an updated version as PR #1416 (1.07948 BPB). This one combines @clarkkev's #1394 base with @stukenov's #1364 pre-quant TTT. Supersedes this submission. Thanks!

Record: Combined 3-Layer Depth Recurrence + Parallel Residuals + Pola…

805dfc1

…r Express + Brotli — val_bpb 1.1067 (3-seed mean) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

erichroepke closed this Apr 6, 2026

erichroepke deleted the record/combined-3layer-recur-parallel-resid-brotli branch April 6, 2026 04:17

erichroepke mentioned this pull request Apr 6, 2026

Record: SP8192 + Pre-Quant TTT — val_bpb 1.07948 (3-seed mean) #1416

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Combined 3-Layer Recurrence + Parallel Residuals + Polar Express + Brotli — val_bpb 1.1067 (3-seed mean)#1396

Record: Combined 3-Layer Recurrence + Parallel Residuals + Polar Express + Brotli — val_bpb 1.1067 (3-seed mean)#1396
erichroepke wants to merge 1 commit intoopenai:mainfrom
erichroepke:record/combined-3layer-recur-parallel-resid-brotli

erichroepke commented Apr 6, 2026

Uh oh!

Robby955 commented Apr 6, 2026

Uh oh!

erichroepke commented Apr 6, 2026

Uh oh!

erichroepke commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erichroepke commented Apr 6, 2026

Summary

Results

What This Is

Novel Contribution

Techniques Combined

Reproduction

Test plan

Credits

Uh oh!

Robby955 commented Apr 6, 2026

Uh oh!

erichroepke commented Apr 6, 2026

Uh oh!

erichroepke commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants