Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean) by andrewbaggio1 · Pull Request #1209 · openai/parameter-golf

andrewbaggio1 · 2026-04-01T04:48:56Z

Summary

3-seed mean val_bpb: 1.1064 +/- 0.0004 | 8xH100 SXM | ~557s eval

Combines three proven legal eval-time techniques on a Full Hessian GPTQ base:

Score-first chunked TTT (PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549 recipe): score chunk, then train on it. -0.003 BPB.
SLOT (PR Record: QK-Gain 4.0 + XSA-11 + Muon-TTT + SLOT — val_bpb 1.0914 (3-seed mean) #1176): per-batch delta vector optimization in hidden space. -0.010 BPB.
All single-pass, score-before-update compliant. No n-gram cache, no multi-pass.

Results

Seed	Post-SLOT BPB	Eval Time
1337	1.1068	~557s
42	1.1062	~557s
7	1.1061	~557s
Mean	1.1064

Beats verified SOTA (#1019, 1.1147) by 0.0083 BPB (p < 0.01, std=0.0004).

Legality

TTT: Score-first chunked (65K tokens/chunk). Each chunk scored under inference_mode before any training. Last chunk never trained on. SGD + cosine LR across chunks.
SLOT: Per-batch delta (shape [1,1,512]) optimized with 8 AdamW steps. Delta re-initialized to zeros for each new batch. Gradients only through compute_logits (linear + softcap), not transformer.
Single left-to-right pass, no rescoring, no min(NLL).

Architecture

PR #1184 stack: 11L LeakyReLU(0.5)^2, d=512, GQA 8/4, MLP 3x, BigramHash(2816,112), SmearGate, XSA4, Partial RoPE, LN Scale, EMA, SWA, Late QAT. Full Hessian GPTQ with actorder + int6 + LZMA.

Credits

PR #1184 (icryo), PR #1019 (abaybektursun), PR #549 (abaybektursun), PR #1176 (bigbag), PR #461 (mrdavtan)

Test plan

3 seeds verified, all under 1.107
Beats SOTA by 0.0083 > 0.005 minimum
Training < 10 min, eval ~557s < 10 min
All techniques score-before-update compliant
No n-gram cache, no multi-pass, no min(NLL)

🤖 Generated with Claude Code

Combines Full Hessian GPTQ, legal score-first chunked TTT (3 epochs), and SLOT delta optimization (8 AdamW steps per batch). All eval-time techniques are single-pass, score-before-update compliant. 3-seed mean: 1.1064 +/- 0.0004 BPB on 8xH100 SXM. Beats verified SOTA (openai#1019, 1.1147) by 0.0083 BPB. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean)#1209

Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean)#1209
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/slot-ttt-1.1064

andrewbaggio1 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrewbaggio1 commented Apr 1, 2026

Summary

Results

Legality

Architecture

Credits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant