Skip to content

Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean)#1166

Open
Christopher-Lee-McClendon wants to merge 3 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/10L-e2e-ttt-flow-refiner
Open

Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean)#1166
Christopher-Lee-McClendon wants to merge 3 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/10L-e2e-ttt-flow-refiner

Conversation

@Christopher-Lee-McClendon
Copy link
Copy Markdown

@Christopher-Lee-McClendon Christopher-Lee-McClendon commented Mar 31, 2026

Non-Record Submission: 10L E2E TTT-Linear + FlowRefiner (E2E was a README request)

val_bpb: 1.1335 ± 0.0010 (4-seed mean ± std, int6 sliding window, stride=64) | ~15.1 MB artifact | 2×A100 PCIe 40GB

Summary

10-layer transformer with end-to-end TTT-Linear refinement and a 1-step FlowRefiner, compressed to fit under the 16 MB artifact cap. The lightweight FlowRefiner is inspired in part by the FLOWR paper (arXiv:2504.10564), which uses learned flow-matching vector fields with Euler-style transport updates for efficient refinement; here we adapt that idea into a tiny hidden-state refiner rather than a pocket-conditioned 3D ligand generator. Here we use a flow-flavored residual MLP, not true source→target distribution matching (which will be the subject of a later PR).

Key Results — 4-Seed Reproducibility

Seed SLURM Job Sliding Window BPB Artifact Size
42 55383562 1.13472 15,094,152
99 55392385 1.13388 15,198,948
1337 55392383 1.13269 15,070,964
2025 55392384 1.13284 15,117,416
Mean ± Std 1.13353 ± 0.00095

All 4 seeds completed successfully. All artifacts under 16MB cap.

Three-Variant Comparison (supplementary)

Variant Layers val_bpb (sw) Total Size Status
A: 11L + 60% warmdown 11 1.1236 16.68 MB Over budget
B: 10L (this submission) 10 1.1335 ± 0.0010 (4-seed) ~15.2 MB Legal
C: 11L + int5 MLP 11 1.1507 14.30 MB Legal

Prior 11L Ablations on the Same Refiner Pair

These are earlier supporting runs on the same E2E-TTT / FlowRefiner pair from experiments_pr549/ rather than fresh 10-layer ablations for the legal submission:

Prior 11L run Sliding BPB Δ vs 11L baseline
Baseline 1.12440473
+ E2E-TTT only 1.12414225 -0.00026
+ Flow only 1.12531495 +0.00091
+ Both (Combined) 1.12344104 -0.00096

Synergy Note

In that earlier 11-layer study, FlowRefiner alone regressed after quantization, while the combined E2E-TTT + Flow model was best. The additive expectation from the isolated deltas is 1.12505247 BPB, whereas the actual combined run reached 1.12344104, a 0.00161 BPB improvement over additive expectation. We treat this as evidence that FlowRefiner is most useful when paired with TTT, while avoiding the claim that the same four-way ablation has already been rerun for the present 10-layer legal artifact.

Architecture

  • 10 layers, 512D, 8H/4KV (GQA), 3×MLP LeakyReLU(0.5)²
  • E2E TTT-Linear (1.08M params): per-head inner-loop SGD during train+eval
  • 1-step FlowRefiner (98K params): latent-space flow matching
  • BigramHash(1536), XSA, U-Net skips, VE128, Partial RoPE, SmearGate
  • EMA + SWA + Late QAT

Credits

Built on PR #549 (abaybektursun) and contributions from PR #65 (aquariouseworkman), PR #69 (TevBenji), PR #187 (Idan3011), PR #265 / PR #374 (unnir), PR #315 (jfprincz), PR #77 (samacqua), PR #50 (mattqlf), PR #76 (unixmadtoonslab), and the modded-nanogpt baseline. The flow-inspired framing for the hidden-state refiner was also informed by FLOWR (Cremer et al., arXiv:2504.10564).

See README.md for the detailed writeup, provenance paths to the prior 11-layer ablation logs, and supplementary variant comparison.

- 10-layer, 512D, E2E TTT-Linear + 1-step FlowRefiner
- val_bpb 1.13472408 (int6 sliding window, stride=64, seed=42)
- Artifact: 15,199,107 bytes (800K headroom under 16MB cap)
- BigramHash(1536), LeakyReLU(0.5)², mixed int6/int8 + lzma
- Includes three-variant size-quality comparison (11L/10L/int5)
- Trained on 2×A100 PCIe 40GB, 7185 steps, ~2.2 hours
@Christopher-Lee-McClendon Christopher-Lee-McClendon changed the title Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request) Mar 31, 2026
Seeds 42, 99, 1337, 2025 all completed successfully on 2×A100 PCIe 40GB.
Mean sliding-window BPB: 1.13353 ± 0.00095 (4-seed std).
Range: [1.13269, 1.13472].
All artifacts under 16MB cap (15.1-15.2 MB).

Includes training logs and SLURM scripts for all seeds in supplementary/.
@Christopher-Lee-McClendon Christopher-Lee-McClendon changed the title Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1347 (README request) Non-record: 10L E2E TTT-Linear + FlowRefiner — val_bpb 1.1335 ± 0.0010 (4-seed mean) Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant