Record: 1.1140 BPB — ResidLambdas + Split-LR + Train-Budget GPTQ + Coprime Loader (12-seed mean)#1130
Open
Gusanidas wants to merge 2 commits intoopenai:mainfrom
Open
Record: 1.1140 BPB — ResidLambdas + Split-LR + Train-Budget GPTQ + Coprime Loader (12-seed mean)#1130Gusanidas wants to merge 2 commits intoopenai:mainfrom
Gusanidas wants to merge 2 commits intoopenai:mainfrom
Conversation
PR openai#549 / KitchenSinkV2 base with: - Residual lambdas: learnable per-sublayer scaling (init sqrt(1.1), 5x LR) - Bigram hash: 6144 buckets (up from 2048) - Value embeddings: dim=196 on layers 5,9,10 - Flash Attention 3 via flash_attn_interface - Train-data GPTQ int6 calibration within training budget - Sliding window eval stride=64 - Optuna-tuned LRs: matrix 0.036/0.044, scalar 0.028/0.018 12 seeds: mean 1.1140 bpb (1.8809 nats), std 0.0005 Improvement over leader: 0.0054 bpb / 0.0091 nats p < 0.0001 for >= 0.005 nats improvement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
haikosys
pushed a commit
to haikosys/parameter-golf
that referenced
this pull request
Mar 30, 2026
val_bpb: 1.1161 | val_loss: 1.884 nats | ~15.3 MB | 8×H100 SXM | Legal TTT Seeds: 42=1.1163, 1337=1.1160, 2024=1.1161 | Mean=1.1161, Std=0.0001 Novel contribution: EGGROLL Antithetic Ternary Bin Search — post-GPTQ quantization refinement that directly optimizes INT6 bin assignments against BPB loss during eval. Zeroth-order, strictly additive (cannot degrade quality), complementary to Hessian-based GPTQ. Also adds missing TTT call to PR openai#1130's eval pipeline. Built on PR openai#1130 by @Gusanidas (Kitchen Sink V2) Foundation: PR openai#549 by @abaybektursun Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
icryo
added a commit
to icryo/parameter-golf
that referenced
this pull request
Mar 30, 2026
… BPB) ResidLambdas: per-sublayer residual scaling (init sqrt(1.1), 5x scalar_lr, no WD) Tuned LRs: MATRIX_LR=0.036, SCALAR_LR=0.028, TIED_EMBED_LR=0.022 Bigger VE: dim=196 on layers 5,9,10 (was dim=128 on layers 9,10) PR openai#1130 achieved 1.1140 (12-seed mean) with these innovations.
haikosys
pushed a commit
to haikosys/parameter-golf
that referenced
this pull request
Mar 30, 2026
val_bpb: 1.1161 | val_loss: 1.884 nats | ~15.3 MB | 8×H100 SXM | Legal TTT Seeds: 42=1.1163, 1337=1.1160, 2024=1.1161 | Mean=1.1161, Std=0.0001 Novel: EGGROLL Antithetic Ternary Bin Search — post-GPTQ bin refinement Also: adds missing TTT call to PR openai#1130 eval pipeline Built on PR openai#1130 by @Gusanidas, PR openai#549 by @abaybektursun Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
haikosys
pushed a commit
to haikosys/parameter-golf
that referenced
this pull request
Mar 30, 2026
val_bpb: 1.1161 | val_loss: 1.884 nats | ~15.3 MB | 8×H100 SXM | Legal TTT Seeds: 42=1.1163, 1337=1.1160, 2024=1.1161 | Mean=1.1161, Std=0.0001 Novel: EGGROLL Antithetic Ternary Bin Search — post-GPTQ bin refinement Also: adds missing TTT call to PR openai#1130 eval pipeline Built on PR openai#1130 by @Gusanidas, PR openai#549 by @abaybektursun Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Record: Kitchen Sink V2 — val_bpb 1.1140 (12-seed mean, std 0.0005)
val_bpb: 1.1140 | val_loss: 1.8809 nats | ~15.88 MB | 8×H100 SXM | No TTT
Built on PR #549 by @abaybektursun. 12-seed validation, all artifacts under 16,000,000 bytes, all training under 600s.
Results (12 seeds, sliding window eval, stride=64)
Statistical significance vs SOTA (PR #549, 1.8843 nats)
What's new (over PR #549)
12 Tuned batch size — TRAIN_BATCH_TOKENS=548,864
Architecture
Timing
Credits