Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (3-seed mean no-TTT: 1.1225) by Christopher-Lee-McClendon · Pull Request #1170 · openai/parameter-golf

Christopher-Lee-McClendon · 2026-03-31T04:52:31Z

Summary

Non-record submission exploring NativeFlowMatcher (NFM) — a 393K-parameter OT-CFM (Optimal Transport Conditional Flow Matching) velocity network that applies gated hidden-state correction to transformer hidden states, jointly trained with the AR objective. The Flow Matching module is trained as distribution transport, but used at inference as a small residual correction.

Results

Three-seed reproducibility (training-time sliding window, no TTT):

Seed	SLURM Job	Training val_bpb	Sliding BPB (no TTT)	Artifact Bytes
42	55342820	1.1380	1.12312	15,745,776
1337	55398556	1.1385	1.12367	15,736,933
2025	55398557	1.1359	1.12077	15,745,950
Mean ± Std		1.1375 ± 0.0014	1.12252 ± 0.00151

Primary (seed=42, with legal TTT):

Evaluation	val_bpb
Sliding window (stride=64), no TTT	1.12312
Sliding window (stride=64), legal TTT	1.11991

Legal TTT gain: −0.00321 BPB

Legal TTT evaluation for seeds 1337 and 2025 is pending (SLURM jobs 55411651–55411654).

Architecture

11L/512D/GQA(8H/4KV), 3×MLP, 27.5M params total
NativeFlowMatcher: 256-dim hidden velocity network with sinusoidal time conditioning, gated Euler step at t=1
XSA on all 11 layers, BigramHash(4096,128), LeakyReLU(0.5)², value residual, gated attention
Mixed int6/int5 quantization + zstd-16 compression
Artifact: 15,745,776 bytes (254K headroom under 16MB cap)

Training

7,000 steps on 1×A100 PCIe 40GB, ~3.86 hours per seed
Muon + Adam optimizer, 2048 sequence length
Three seeds completed: 42, 1337, 2025

Ablation Studies

2×2 Matrix: NFM × TTT (isolating NFM contribution):

Configuration	Params	No TTT (BPB)	Legal TTT (BPB)	Δ (TTT effect)
Base (no NFM)	27,137,223	1.12087	pending	pending
NFM (hd=256, lw=0.1)	27,530,952	1.12312	1.11991	−0.00321
Δ (NFM effect)	+393,729	+0.00225	pending	—

Base retraining is running. Loss weight sweep (lw=0.01, 0.05, 0.20) and hidden dim sweep (hd=128, 512) are queued.

Supplementary: E2E TTT + FlowRefiner 7k eval completed: legal TTT BPB = 1.12418.

Limitations

Three-seed reproducibility achieved (no-TTT): Mean sliding BPB = 1.12252 ± 0.00151. Legal TTT eval pending for seeds 1337, 2025.
Non-record — This submission documents the NFM idea and its interaction with legal TTT. Not sure whether the NFM is worth the extra compute cost for 10 min training / 10 min eval. Number of training steps was chosen to be consistent with those of similar base models (without NFM).
NFM adds +0.00225 BPB vs matched base (no NFM) at 7k steps — the extra 393K params do not improve val_bpb. The idea may be more relevant at longer training schedules or combined with other techniques.

Credits

Base architecture (PR #549, @abaybektursun), Muon (baseline), BigramHash/SmearGate (PR #65, @aquariouserworkman), XSA (PR #187/#265, @Idan3011/@unnir), mixed quant (PR #76), sliding window (PR #50, @mattqlf), legal TTT (PR #77, @samacqua, PR #461 @Christopher-Lee-McClendon ), VE/PartialRoPE/LN Scale (PR #315/#374, @jfprincz/@unnir), gated attention/value residual (PR #940), EMA (PR #65, @aquariouserworkman)

Checklist

- NativeFlowMatcher: 393K-param OT-CFM velocity network with gated hidden-state correction - Legal score-first TTT: SGD lr=0.002, 10 epochs, freeze_blocks=2 - val_bpb: 1.11991 (sliding window stride=64, legal TTT) - val_bpb: 1.12312 (sliding window stride=64, no TTT) - Artifact: 15,745,776 bytes (254K headroom) - Single-seed (42) exploratory submission - Supplementary: eval logs, SLURM scripts, comparison data

- 2x2 matrix: NFM x TTT with base no-TTT baseline (1.12087) - Loss weight sweep: 0.01, 0.05, 0.1, 0.2 - Hidden dim sweep: 128, 256, 512 - 13 SLURM jobs submitted (6 train + 7 eval) - Results pending, will update when jobs complete

Christopher-Lee-McClendon · 2026-03-31T12:08:54Z

Ablation Studies Submitted

13 SLURM jobs have been submitted to run comprehensive ablation studies for this NFM submission:

2×2 Matrix: NFM × Legal TTT

Isolating the individual contributions of NFM and legal TTT at matched 7k steps.

Configuration	Params	No TTT (BPB)	Legal TTT (BPB)
Base (no NFM)	27,137,223	1.12087 ✅	pending (→55398695)
NFM (hd=256, lw=0.1)	27,530,952	1.12312 ✅	1.11991 ✅

NFM Hyperparameter Sweeps

Loss weight sweep (hidden_dim=256, seed=42):

lw=0.01 → jobs 55398696→55398699
lw=0.05 → jobs 55398697→55398700
lw=0.10 (default) → 1.12312 ✅
lw=0.20 → jobs 55398698→55398701

Hidden dim sweep (loss_weight=0.1, seed=42):

hd=128 → jobs 55398702→55398704
hd=256 (default) → 1.12312 ✅
hd=512 → jobs 55398703→55398705

Also pending

3-seed reproducibility runs (seeds 1337, 2025): jobs 55398556–55398561
E2E TTT+Flow 7k reeval with 5h wallclock: job 55398555

Results will be updated in README as jobs complete.

- Training completed for seeds 42, 1337, 2025 (all 7k steps) - 3-seed mean sliding BPB (no TTT): 1.12252 ± 0.00151 - Seed 42: 1.12312, Seed 1337: 1.12367, Seed 2025: 1.12077 - Legal TTT eval jobs submitted (SLURM 55411651-55411654) - Added completed E2E TTT+Flow eval log (SLURM 55398555, BPB=1.12418) - Added training logs and SLURM scripts for all seed runs - Updated README with 3-seed results table and training trajectories - Updated submission.json with per-seed metrics and job IDs

Christopher-Lee-McClendon added 2 commits March 31, 2026 00:51

docs: add ablation studies section to NFM submission

24792dd

- 2x2 matrix: NFM x TTT with base no-TTT baseline (1.12087) - Loss weight sweep: 0.01, 0.05, 0.1, 0.2 - Hidden dim sweep: 128, 256, 512 - 13 SLURM jobs submitted (6 train + 7 eval) - Results pending, will update when jobs complete

Christopher-Lee-McClendon changed the title ~~Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (single seed)~~ Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (3-seed mean no-TTT: 1.1225) Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (3-seed mean no-TTT: 1.1225)#1170

Non-record: 11L NativeFlowMatcher + Legal TTT — val_bpb 1.1199 (3-seed mean no-TTT: 1.1225)#1170
Christopher-Lee-McClendon wants to merge 3 commits intoopenai:mainfrom
Christopher-Lee-McClendon:submission/11L-nativeflow-legal-ttt

Christopher-Lee-McClendon commented Mar 31, 2026 •

edited

Loading

Uh oh!

Christopher-Lee-McClendon commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Christopher-Lee-McClendon commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Architecture

Training

Ablation Studies

Limitations

Credits

Checklist

Uh oh!

Christopher-Lee-McClendon commented Mar 31, 2026

Ablation Studies Submitted

2×2 Matrix: NFM × Legal TTT

NFM Hyperparameter Sweeps

Also pending

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Christopher-Lee-McClendon commented Mar 31, 2026 •

edited

Loading