Skip to content

Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean)#1339

Open
bigbag wants to merge 1 commit intoopenai:mainfrom
bigbag:submission/sp2048-clean
Open

Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean)#1339
bigbag wants to merge 1 commit intoopenai:mainfrom
bigbag:submission/sp2048-clean

Conversation

@bigbag
Copy link
Copy Markdown

@bigbag bigbag commented Apr 4, 2026

Summary

  • val_bpb = 1.0955 (3-seed mean, std 0.0004) | ~15.49 MB | 8×H100 SXM
  • First SP2048 submission with SWA + BigramHash + 3-layer depth recurrence + legal TTT
  • No SLOT, no n-gram cache — fully compliant

3-Seed Results

Seed Sliding BPB TTT BPB Artifact
42 1.0965 1.0952 15,498,155
314 1.0972 1.0960 15,493,880
999 1.0967 1.0954 15,474,490
Mean 1.0968 1.0955 15,488,842

Key Techniques

  1. SP2048 Vocabulary — 2048-token SentencePiece BPE
  2. 3-Layer Depth Recurrence (layers 3,4,5) — extends PR Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA #1204/Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean) #1331
  3. Stochastic Weight Averaging (from frac=0.75)
  4. BigramHash Embeddings (vocab=2048, dim=128)
  5. Legal Score-First TTT (SGD, lr=0.002, 3 epochs) — from PR Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + Legal TTT — val_bpb 1.0896 (3-seed mean) #1326
  6. Parallel Residuals (from layer 7)
  7. MuonEq-R + QK-Gain 5.0 + WD=0.095 + MLR=0.022
  8. Full GPTQ int6 + Brotli

Credits

PR #1326 @aryanbhosale, PR #1331 @dexhunter, PR #1204 @msisovic, PR #1218 @clarkkev, PR #1260 @dexhunter, PR #1217 @bigbag

Test plan

  • 3-seed validation (42, 314, 999)
  • All artifacts under 16,000,000 bytes
  • No SLOT, no n-gram cache
  • Legal score-first TTT

🤖 Generated with Claude Code

…val_bpb 1.0955 (3-seed mean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant