Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean) by Christopher-Lee-McClendon · Pull Request #1166 · openai/parameter-golf

Christopher-Lee-McClendon · 2026-03-31T03:03:12Z

Non-Record Submission: 10L E2E TTT-Linear + FlowRefiner (E2E was a README request)

val_bpb: 1.1335 ± 0.0010 (4-seed mean ± std, int6 sliding window, stride=64) | ~15.1 MB artifact | 2×A100 PCIe 40GB

Summary

10-layer transformer with end-to-end TTT-Linear refinement and a 1-step FlowRefiner, compressed to fit under the 16 MB artifact cap. The lightweight FlowRefiner is inspired in part by the FLOWR paper (arXiv:2504.10564), which uses learned flow-matching vector fields with Euler-style transport updates for efficient refinement; here we adapt that idea into a tiny hidden-state refiner rather than a pocket-conditioned 3D ligand generator. Here we use a flow-flavored residual MLP, not true source→target distribution matching (which will be the subject of a later PR).

Key Results — 4-Seed Reproducibility

Seed	SLURM Job	Sliding Window BPB	Artifact Size
42	55383562	1.13472	15,094,152
99	55392385	1.13388	15,198,948
1337	55392383	1.13269	15,070,964
2025	55392384	1.13284	15,117,416
Mean ± Std	—	1.13353 ± 0.00095	—

All 4 seeds completed successfully. All artifacts under 16MB cap.

Three-Variant Comparison (supplementary)

Variant	Layers	val_bpb (sw)	Total Size	Status
A: 11L + 60% warmdown	11	1.1236	16.68 MB	Over budget
B: 10L (this submission)	10	1.1335 ± 0.0010 (4-seed)	~15.2 MB	Legal
C: 11L + int5 MLP	11	1.1507	14.30 MB	Legal

Prior 11L Ablations on the Same Refiner Pair

These are earlier supporting runs on the same E2E-TTT / FlowRefiner pair from experiments_pr549/ rather than fresh 10-layer ablations for the legal submission:

Prior 11L run	Sliding BPB	Δ vs 11L baseline
Baseline	1.12440473	—
+ E2E-TTT only	1.12414225	-0.00026
+ Flow only	1.12531495	+0.00091
+ Both (Combined)	1.12344104	-0.00096

Synergy Note

In that earlier 11-layer study, FlowRefiner alone regressed after quantization, while the combined E2E-TTT + Flow model was best. The additive expectation from the isolated deltas is 1.12505247 BPB, whereas the actual combined run reached 1.12344104, a 0.00161 BPB improvement over additive expectation. We treat this as evidence that FlowRefiner is most useful when paired with TTT, while avoiding the claim that the same four-way ablation has already been rerun for the present 10-layer legal artifact.

Architecture

10 layers, 512D, 8H/4KV (GQA), 3×MLP LeakyReLU(0.5)²
E2E TTT-Linear (1.08M params): per-head inner-loop SGD during train+eval
1-step FlowRefiner (98K params): latent-space flow matching
BigramHash(1536), XSA, U-Net skips, VE128, Partial RoPE, SmearGate
EMA + SWA + Late QAT

Credits

Built on PR #549 (abaybektursun) and contributions from PR #65 (aquariouseworkman), PR #69 (TevBenji), PR #187 (Idan3011), PR #265 / PR #374 (unnir), PR #315 (jfprincz), PR #77 (samacqua), PR #50 (mattqlf), PR #76 (unixmadtoonslab), and the modded-nanogpt baseline. The flow-inspired framing for the hidden-state refiner was also informed by FLOWR (Cremer et al., arXiv:2504.10564).

See README.md for the detailed writeup, provenance paths to the prior 11-layer ablation logs, and supplementary variant comparison.

- 10-layer, 512D, E2E TTT-Linear + 1-step FlowRefiner - val_bpb 1.13472408 (int6 sliding window, stride=64, seed=42) - Artifact: 15,199,107 bytes (800K headroom under 16MB cap) - BigramHash(1536), LeakyReLU(0.5)², mixed int6/int8 + lzma - Includes three-variant size-quality comparison (11L/10L/int5) - Trained on 2×A100 PCIe 40GB, 7185 steps, ~2.2 hours

Seeds 42, 99, 1337, 2025 all completed successfully on 2×A100 PCIe 40GB. Mean sliding-window BPB: 1.13353 ± 0.00095 (4-seed std). Range: [1.13269, 1.13472]. All artifacts under 16MB cap (15.1-15.2 MB). Includes training logs and SLURM scripts for all seeds in supplementary/.

Christopher-Lee-McClendon added 2 commits March 30, 2026 23:02

Docs: add FLOWR inspiration and prior ablation context

fdb829f

Christopher-Lee-McClendon changed the title ~~Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347~~ Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request) Mar 31, 2026

Christopher-Lee-McClendon changed the title ~~Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request)~~ Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean) Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean)#1166

Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean)#1166
Christopher-Lee-McClendon wants to merge 3 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/10L-e2e-ttt-flow-refiner

Christopher-Lee-McClendon commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Christopher-Lee-McClendon commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Non-Record Submission: 10L E2E TTT-Linear + FlowRefiner (E2E was a README request)

Summary

Key Results — 4-Seed Reproducibility

Three-Variant Comparison (supplementary)

Prior 11L Ablations on the Same Refiner Pair

Synergy Note

Architecture

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Christopher-Lee-McClendon commented Mar 31, 2026 •

edited

Loading