Skip to content

Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean)#1338

Closed
bigbag wants to merge 5 commits intoopenai:mainfrom
bigbag:submission/sp2048-3recur-swa-bigram-ttt
Closed

Record: SP2048 + 3-Layer Recurrence + SWA + BigramHash + Legal TTT — val_bpb 1.0955 (3-seed mean)#1338
bigbag wants to merge 5 commits intoopenai:mainfrom
bigbag:submission/sp2048-3recur-swa-bigram-ttt

Conversation

@bigbag
Copy link
Copy Markdown

@bigbag bigbag commented Apr 4, 2026

Summary

  • val_bpb = 1.0955 (3-seed mean, std 0.0004) | ~15.49 MB | 8×H100 SXM
  • First SP2048 submission with SWA + BigramHash + 3-layer depth recurrence + legal TTT
  • No SLOT, no n-gram cache — fully compliant

3-Seed Results

Seed Sliding BPB TTT BPB Artifact
42 1.0965 1.0952 15,498,155
314 1.0972 1.0960 15,493,880
999 1.0967 1.0954 15,474,490
Mean 1.0968 1.0955 15,488,842

Key Techniques

  1. SP2048 Vocabulary — 2048-token SentencePiece BPE
  2. 3-Layer Depth Recurrence (layers 3,4,5) — extends PR Record: ParallelResiduals + MiniDepthRecurrence, 1.1063 BPB / 1.8679 nats, -0.0072 vs PR #1179, -0.0143 vs merged SOTA #1204/Record: MuonEq-R + 3-Layer Recurrence + WD=0.095 + MLR=0.022 + All-Int6 — val_bpb 1.0900 (3-seed mean) #1331
  3. Stochastic Weight Averaging (from frac=0.75)
  4. BigramHash Embeddings (vocab=2048, dim=128)
  5. Legal Score-First TTT (SGD, lr=0.002, 3 epochs) — from PR Record: SP4096 + Depth Recurrence + Parallel Residuals + MuonEq-R + Legal TTT — val_bpb 1.0896 (3-seed mean) #1326
  6. Parallel Residuals (from layer 7)
  7. MuonEq-R + QK-Gain 5.0 + WD=0.095 + MLR=0.022
  8. Full GPTQ int6 + Brotli

Credits

PR #1326 @aryanbhosale, PR #1331 @dexhunter, PR #1204 @msisovic, PR #1218 @clarkkev, PR #1260 @dexhunter, PR #1217 @bigbag

Test plan

  • 3-seed validation (42, 314, 999)
  • All artifacts under 16,000,000 bytes
  • No SLOT, no n-gram cache
  • Legal score-first TTT

Pavel Liashkov and others added 5 commits March 22, 2026 23:41
…val_bpb 1.0955 (3-seed mean)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bigbag
Copy link
Copy Markdown
Author

bigbag commented Apr 4, 2026

Closing — PR contains unrelated files. Will resubmit with clean branch.

@bigbag bigbag closed this Apr 4, 2026
@bigbag bigbag deleted the submission/sp2048-3recur-swa-bigram-ttt branch April 4, 2026 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant