Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787) by vishesh9131 · Pull Request #310 · openai/parameter-golf

vishesh9131 · 2026-03-21T04:52:27Z

Summary

10-layer transformer (512 dim, 8 heads, 4 KV heads) with tuned hyperparameters
Sequence length 2048, batch size 393K, Muon momentum 0.98
Always-decaying warmdown (WD=15000) for tighter weights → reduced int8 quantization penalty
Test-time training with batched LoRA adapters (rank 8) on Q, V projections and LM head
Overtone spectral embedding init + phase-transition residual mixing

Results (Seed 1337, 8xH100 SXM)

Metric	Value
val_bpb (pre-quant)	1.1787
int8+zlib size	15.56 MB
Steps completed	13,282 / 200,000
Step avg	45.18 ms
Training time	600s (wallclock cap)

Val BPB progression

Step	val_bpb
1000	1.3883
5000	1.2598
8000	1.2333
10000	1.2130
12000	1.1917
13282	1.1787

Key changes from baseline

num_layers: 9 → 10
train_seq_len: 1024 → 2048
matrix_lr: 0.04 → 0.03
muon_momentum: 0.95 → 0.98
warmdown_iters: 1200 → 15000
train_batch_tokens: 524288 → 393216
grad_clip_norm: 0.0 → 1.0
Added TTT LoRA eval (rank 8, chunk 256)

Test plan

Full 8xH100 run with seed 1337 (run_seed1337_v2.log)

vishesh9131 added 3 commits March 21, 2026 09:41

optimized hyperparams + TTT LoRA

361eba1

Record: Vishesh 10L Seq2048 TTT - val_bpb 1.1787

e36a897

Record: Vishesh 10L Seq2048 TTT - val_bpb 1.1787

c7343f1

notapplica mentioned this pull request Mar 21, 2026

Parameter Golf Live AI Commentary + Analysis / Ideas | every 10 minutes #140

Open

Record: 10L Seq2048 TTT WarmdownQuant val_bpb=1.1787

918d34b

vishesh9131 mentioned this pull request Mar 21, 2026

Unofficial Leaderboard #83

Open

Merge branch 'main' into vishesh_submission

d280bf0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787)#310

Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787)#310
vishesh9131 wants to merge 5 commits intoopenai:mainfrom
vishesh9131:vishesh_submission

vishesh9131 commented Mar 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vishesh9131 commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results (Seed 1337, 8xH100 SXM)

Val BPB progression

Key changes from baseline

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vishesh9131 commented Mar 21, 2026 •

edited

Loading