Skip to content

Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047#1398

Open
Mertyandimata wants to merge 54 commits intoopenai:mainfrom
Mertyandimata:submission/sp1024-recur-markov-autoqmax
Open

Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047#1398
Mertyandimata wants to merge 54 commits intoopenai:mainfrom
Mertyandimata:submission/sp1024-recur-markov-autoqmax

Conversation

@Mertyandimata
Copy link
Copy Markdown

val_bpb: 1.1047 (single seed, SEED=42) | 15.89 MB | 8×H100 SXM

A quick personal note: Our vacation budget went to RunPod this month. My fiancée Virginia was okay with that — I don't come from an ML lab, but she backs the journey. This one's for her.

Key Results

  • Pre-quant val_bpb: 1.1359
  • Post-quant val_bpb: 1.1429
  • Sliding window val_bpb: 1.1065
  • TTT final val_bpb: 1.1047
  • Artifact: 15,888,861 bytes
  • Training: 5,183 steps in 590s

What's Different Here

  1. Adaptive Markov Curriculum — bigram-surprise-weighted loss scaling, steering capacity toward tokens that n-gram statistics can't predict
  2. Auto-QMax Budget Search — binary search over clip range to actually fill the 16MB budget instead of leaving megabytes on the table
  3. EMA + SWA Blend — 30/70 blend of both averaging methods instead of choosing one

Built on work from PR #1339 (@bigbag), PR #549 (@abaybektursun), PR #287 and #198 (@jfprincz), PR #374 (@signalrush).

Full details in README.md.

@Mertyandimata Mertyandimata changed the title SP1024 + Depth Recurrence + Markov Curriculum + TTT — val_bpb 1.1047 Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047 Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant