Skip to content

Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787)#310

Open
vishesh9131 wants to merge 5 commits intoopenai:mainfrom
vishesh9131:vishesh_submission
Open

Record: 10L Seq2048 TTT LoRA WarmdownQuant (val_bpb=1.1787)#310
vishesh9131 wants to merge 5 commits intoopenai:mainfrom
vishesh9131:vishesh_submission

Conversation

@vishesh9131
Copy link

@vishesh9131 vishesh9131 commented Mar 21, 2026

Summary

  • 10-layer transformer (512 dim, 8 heads, 4 KV heads) with tuned hyperparameters
  • Sequence length 2048, batch size 393K, Muon momentum 0.98
  • Always-decaying warmdown (WD=15000) for tighter weights → reduced int8 quantization penalty
  • Test-time training with batched LoRA adapters (rank 8) on Q, V projections and LM head
  • Overtone spectral embedding init + phase-transition residual mixing

Results (Seed 1337, 8xH100 SXM)

Metric Value
val_bpb (pre-quant) 1.1787
int8+zlib size 15.56 MB
Steps completed 13,282 / 200,000
Step avg 45.18 ms
Training time 600s (wallclock cap)

Val BPB progression

Step val_bpb
1000 1.3883
5000 1.2598
8000 1.2333
10000 1.2130
12000 1.1917
13282 1.1787

Key changes from baseline

  • num_layers: 9 → 10
  • train_seq_len: 1024 → 2048
  • matrix_lr: 0.04 → 0.03
  • muon_momentum: 0.95 → 0.98
  • warmdown_iters: 1200 → 15000
  • train_batch_tokens: 524288 → 393216
  • grad_clip_norm: 0.0 → 1.0
  • Added TTT LoRA eval (rank 8, chunk 256)

Test plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant