Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047 by Mertyandimata · Pull Request #1398 · openai/parameter-golf

Mertyandimata · 2026-04-06T01:15:11Z

val_bpb: 1.1047 (single seed, SEED=42) | 15.89 MB | 8×H100 SXM

A quick personal note: Our vacation budget went to RunPod this month. My fiancée Virginia was okay with that — I don't come from an ML lab, but she backs the journey. This one's for her.

Key Results

Pre-quant val_bpb: 1.1359
Post-quant val_bpb: 1.1429
Sliding window val_bpb: 1.1065
TTT final val_bpb: 1.1047
Artifact: 15,888,861 bytes
Training: 5,183 steps in 590s

What's Different Here

Adaptive Markov Curriculum — bigram-surprise-weighted loss scaling, steering capacity toward tokens that n-gram statistics can't predict
Auto-QMax Budget Search — binary search over clip range to actually fill the 16MB budget instead of leaving megabytes on the table
EMA + SWA Blend — 30/70 blend of both averaging methods instead of choosing one

Built on work from PR #1339 (@bigbag), PR #549 (@abaybektursun), PR #287 and #198 (@jfprincz), PR #374 (@signalrush).

Full details in README.md.

…val-only, Coarse-to-Fine gradient scaling, EMA, Markov curriculum

…er-wise qmax

Mertyandimata and others added 30 commits April 3, 2026 09:15

Rakı Training v3 - Markov curriculum + BigramHash + EMA

c607e49

Rakı v5 - Full integrated system: Stochastic Depth, TTT, BigramHash e…

9df8a78

…val-only, Coarse-to-Fine gradient scaling, EMA, Markov curriculum

Rakí v5 - hybrid entropy×surprise scoring

9d2c07b

Rakí v5 CUDA - H100 ready

925517d

v6 cuda bugfix

57f6510

v7 all bugs fixed

07379fa

Raki patcher for baseline train_gpt.py

9ba11c1

Delete train_raki_v6.py

20f8a8d

Delete train_raki_v3.py

f57362a

Delete train_raki_v5.py

33f213f

Delete train_raki_v5_cuda.py

36b99fd

Raki Training: OEE-inspired Markov curriculum + EMA

d5e29c4

Yandimata v2: GPU-Markov + sliding window + int6 + zstd + all meta

d902413

Delete train_raki_v7.py

8dcfbaa

Raki V2: INT8 fix + GPTQ-lite + all meta techniques

ff42cd6

Çift Raki: 6B×2rec + trigram + int8

b3dedfc

Çift Raki: 6B×2rec + trigram + int8

b33b017

V2: BigramHash + sliding window + SOTA params

152421b

V2: BigramHash + sliding window + SOTA params

be9fec8

V2: BigramHash + sliding window + SOTA params

00b43f8

V2: add pruning for 16MB fit

60e96dc

V3: mixed int6/int8 + Partial RoPE

d31e17b

V4: Adaptive Markov curriculum

afb1b8d

V4: Adaptive Markov + auto qmax

7d8b4dd

V5: Raki triple role - curriculum + adaptive + logit boost

a5c43de

V3: auto qmax, V5: triple Raki

10ee02e

fix: global→globals() for auto qmax

1b2593c

Delete patch_v2.py

8f37271

Delete patch_v4.py

85678fb

Delete patch_v3.py

fd88fef

Mertyandimata and others added 24 commits April 4, 2026 21:20

Delete patch_raki.py

5707147

Delete patch_v5.py

a5e4ffc

feat: Raki V6 — Hadamard rotation + SVD boost + depth recycling + lay…

cbf0fee

…er-wise qmax

V7: mulaw companding + bigram KL injection

6058dc7

V8: LeakyReLU² + Late QAT + XSA4 + LN Scale + MLP3x

b5d1953

V8 + comparison script

8485317

V5 V7 V8 patches

2851bc4

V8: LeakyReLU² + Late QAT + XSA4 + LN Scale + MLP3x

a7a731b

V10

e022bf9

V10

c885184

V10

d2262f2

V10 fix: GPTQ device mismatch

5b2e1f8

V11: GPTQ fix + Brotli-11 + qTTT + decay prior

98700da

V11: fix all 12 bugs

0858a2f

V11: GPTQ fix + Brotli-11 + qTTT + decay prior

e0e5a19

v12: SLOT-24 + pre-quant TTT

0d582ff

v13

cf5fe59

v14: PR1339 base + Markov curriculum + TurboMuon AOL + EMA-SWA blend

78e36d2

v14

79cfd27

v14.1 auto qmax

4337e26

v14.2 auto qmax + dynamo reset + full audit

e3bfbe2

SP1024 Depth Recurrence Markov Curriculum Auto-QMax val_bpb 1.1047

8aaf36f

SP1024 Depth Recurrence Markov Curriculum Auto-QMax val_bpb 1.1047

1037369

SP1024 Depth Recurrence Markov Curriculum Auto-QMax val_bpb 1.1047

67bc9ca

Mertyandimata changed the title ~~SP1024 + Depth Recurrence + Markov Curriculum + TTT — val_bpb 1.1047~~ Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047 Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047#1398

Non-Record: SP1024 + Depth Recurrence + Adaptive Markov Curriculum + Legal TTT — val_bpb 1.1047#1398
Mertyandimata wants to merge 54 commits intoopenai:mainfrom
Mertyandimata:submission/sp1024-recur-markov-autoqmax

Mertyandimata commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mertyandimata commented Apr 6, 2026

Key Results

What's Different Here

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant