Skip to content

Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean)#1209

Open
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/slot-ttt-1.1064
Open

Record: Full GPTQ + Score-First TTT + SLOT — val_bpb 1.1064 (3-seed mean)#1209
andrewbaggio1 wants to merge 1 commit intoopenai:mainfrom
andrewbaggio1:submission/slot-ttt-1.1064

Conversation

@andrewbaggio1
Copy link
Copy Markdown

Summary

3-seed mean val_bpb: 1.1064 +/- 0.0004 | 8xH100 SXM | ~557s eval

Combines three proven legal eval-time techniques on a Full Hessian GPTQ base:

Results

Seed Post-SLOT BPB Eval Time
1337 1.1068 ~557s
42 1.1062 ~557s
7 1.1061 ~557s
Mean 1.1064

Beats verified SOTA (#1019, 1.1147) by 0.0083 BPB (p < 0.01, std=0.0004).

Legality

  • TTT: Score-first chunked (65K tokens/chunk). Each chunk scored under inference_mode before any training. Last chunk never trained on. SGD + cosine LR across chunks.
  • SLOT: Per-batch delta (shape [1,1,512]) optimized with 8 AdamW steps. Delta re-initialized to zeros for each new batch. Gradients only through compute_logits (linear + softcap), not transformer.
  • Single left-to-right pass, no rescoring, no min(NLL).

Architecture

PR #1184 stack: 11L LeakyReLU(0.5)^2, d=512, GQA 8/4, MLP 3x, BigramHash(2816,112), SmearGate, XSA4, Partial RoPE, LN Scale, EMA, SWA, Late QAT. Full Hessian GPTQ with actorder + int6 + LZMA.

Credits

PR #1184 (icryo), PR #1019 (abaybektursun), PR #549 (abaybektursun), PR #1176 (bigbag), PR #461 (mrdavtan)

Test plan

  • 3 seeds verified, all under 1.107
  • Beats SOTA by 0.0083 > 0.005 minimum
  • Training < 10 min, eval ~557s < 10 min
  • All techniques score-before-update compliant
  • No n-gram cache, no multi-pass, no min(NLL)

🤖 Generated with Claude Code

Combines Full Hessian GPTQ, legal score-first chunked TTT (3 epochs),
and SLOT delta optimization (8 AdamW steps per batch). All eval-time
techniques are single-pass, score-before-update compliant.

3-seed mean: 1.1064 +/- 0.0004 BPB on 8xH100 SXM.
Beats verified SOTA (openai#1019, 1.1147) by 0.0083 BPB.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant