Skip to content

Record: SP8192 + Systems Optimization — val_bpb 1.0801 (3-seed mean)#1583

Open
codemath3000 wants to merge 1 commit intoopenai:mainfrom
codemath3000:submission/systems-opt-sp8192
Open

Record: SP8192 + Systems Optimization — val_bpb 1.0801 (3-seed mean)#1583
codemath3000 wants to merge 1 commit intoopenai:mainfrom
codemath3000:submission/systems-opt-sp8192

Conversation

@codemath3000
Copy link
Copy Markdown

@codemath3000 codemath3000 commented Apr 13, 2026

Summary

  • val_bpb: 1.0801 (3-seed mean, std 0.0001) | 8xH100 SXM, 600s | Legal TTT
  • Systems-level optimizations on the PR Record: SP8192 + 3-Layer Recurrence + Parallel Residuals + QK-Gain 5.25 + Legal TTT — val_bpb 1.0810 (3-seed mean) #1493 SOTA stack: fused Muon kernel, batched EMA, superchunk eval
  • Identical ML; faster step time yields extra training steps in the same 600s budget
  • Per Record Criterion 1: "For submissions that improve speed through systems optimization without changing the ML, this requirement [0.005 nats] is waived." This submission changes only systems-level code (kernel fusion, batched ops, memory preallocation) without altering model architecture, optimizer logic, loss function, or any hyperparameter, meaning the 0.005 nats threshold is waived.

Submission series: This PR is one of three related submissions applying the same systems optimizations to different base stacks (PR #1493, PR #1529, PR #1578). We submit against multiple bases so that a ready-to-merge option exists regardless of how the pending PRs are resolved. Judges should feel free to evaluate whichever base(s) they consider valid and disregard the rest.

Results

Seed TTT BPB Artifact
0 1.0799 15,993,737
3141 1.0801 15,995,437
42 1.0802 15,993,201
Mean 1.0801 15,994,125

Test plan

  • 3-seed training on 8xH100 SXM (seeds 0, 3141, 42)
  • All artifacts under 16MB
  • All runs under 600s training + 600s eval
  • Round-trip quantization verified
  • Judges verify reproducibility

🤖 Generated with Claude Code

Systems-level optimizations (fused Muon, EMA foreach, superchunk eval)
on the PR openai#1493 SOTA stack. Identical ML; faster step time yields extra
training steps. 3-seed mean: 1.0801 BPB / 2.7899 nats.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant