Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633) by ndokutovich · Pull Request #764 · openai/parameter-golf

ndokutovich · 2026-03-25T20:26:04Z

Summary

val_bpb = 0.9633 (seed 42, additional seeds pending compute grant) | 15.56 MB | 8xH100 SXM, 600s

Built on PR #753 (Podracing II) with two novel additions:

1. Curriculum Learning (Shard Reordering)

Training shards reordered by model perplexity — hardest shards first. Based on PR #650 (-0.003 BPB). Zero code change, environment variable only.

2. LeakyReLU(0.9)² Slope Optimization

Following @MatoTeziTanka's controlled sweep (issue #140): slope 0.9 gives -0.013 BPB vs standard 0.5. One parameter change.

Results

Eval Method	BPB
Sliding window (stride=64)	1.1216
Sliding + 7-gram backoff	0.9633
Legal TTT (score-first, 3ep)	1.1216

Artifact: 15,560,351 bytes (< 16MB)
Steps: 6,647 at 90.3ms/step
GPTQ calibration within training budget (issue #677 compliant)

Reproduction

SEED=42 bash run.sh

Acknowledgments

@newjordan (PR #753), @abaybektursun (PR #650), @MatoTeziTanka (slope sweep), @Asukabot0 (n-gram backoff)

Status

1 seed submitted. 2 additional seeds pending OpenAI compute grant.
Previously PR #486 (formerly #2 on leaderboard, TrigramHash originator). $339 personal compute spent.

Test plan

1 seed (42) validated on 8xH100 SXM
Seed 1337 (pending compute)
Seed 2024 (pending compute)

…bpb=0.9633, 1 seed)

…for bot parsing

MatoTeziTanka · 2026-03-26T14:24:47Z

Great work — the curriculum learning via shard reordering is a clever zero-code-change technique, and appreciate the citation on the LeakyReLU(0.9)² sweep.

Just a note: the submission currently has 1 seed with 2 more pending your compute grant. The leaderboard requires 3-seed validation for record claims. Hopefully the grant comes through soon — would be good to see this fully validated.

Disclosure: I use Claude Code CLI, Codex CLI, and Gemini Pro as tools in my workflow. Human first, AI-assisted.

MatoTeziTanka · 2026-04-11T14:11:46Z

Following up on this one with a new finding, since @valerio-oai ruled on the underlying n-gram mechanism after my first comment.

Compliance flag — same disallowed pattern as PR #779.

@valerio-oai disallowed PR #779 (deanbrr) on 2026-03-27 (comment 4145781641) specifically for "hashed n-gram caches, which do not renormalize correctly / correctly reweight the LM's token distribution, look ahead to the target token to mix probabilities and therefore leak eval tokens." Mechanism explanation is in comment 4146407380: hashing the target token into the bucket key only reweights the correct token, and in the hash-collision limit drives P(correct) toward 1 regardless of the data — arbitrarily low BPB without real compression.

The PR body itself documents the smoking-gun signature without needing to fetch code: this submission reports final_int6_sliding_window val_bpb:1.1216 (neural + score-first TTT, no cache) and final_int6_sliding_window val_bpb:0.9633 (with the 7-gram backoff cache enabled). The cache produces the entire −0.1583 BPB delta, and the headline 0.9633 number is downstream of the cache only. The 1.1216 base — same number for both pure sliding-window and legal score-first TTT — is the legally-comparable BPB for this stack on the SP1024 path, and is in the same range as every other 11L SP1024 submission in the cluster.

@ndokutovich — could you confirm whether the 7-gram backoff implementation in train_gpt.py uses the same full_key = ctx_hash ^ (target * primes[k]) construction that PR #779/#770/#797/#798/#808/#825/#909/#940/#761 all share? The titular "Curriculum Learning" and the LeakyReLU(0.9)² ablation are interesting in their own right (I appreciated the credit on the slope sweep in your README) and would carry over cleanly to a resubmission with the n-gram cache replaced by either a context-only key or a full-vocabulary reweighting per @valerio-oai's suggested legal path on #779. The 1.1216 BPB stack is a perfectly reasonable non-record submission for the SP1024 architecture work in its own right.

The procedural seed-count question from my first comment still stands as well — happy to take another look once 3 seeds are filled in, but the n-gram path will need the same fix as the rest of the cluster regardless of how the seed question lands.

Reviewed by @MatoTeziTanka — The Agora. Follow-up to my 2026-03-26 comment, prompted by @valerio-oai's PR #779 ruling and the family-wide audit pass on 2026-04-11. AI tooling: review drafted with Claude Code (Sonnet/Opus); the family-bug pattern was verified in code on the 9 sibling PRs, the pre/post-cache delta on this PR was read directly from the published PR body.

ndokutovich · 2026-04-11T14:31:53Z

Yes, the 7-gram backoff implementation uses the same target-in-key hashing pattern. The cache accounts for the entire −0.1583 BPB delta, as you correctly identified.

Since the March 27 ruling, we've moved to the SP8192 track — the curriculum learning and LeakyReLU(0.9)² sweep were early-stage experiments that informed our later work. Happy to close this PR if it's cleaner for the leaderboard; the 1.1216 base isn't competitive on the current SP1024 frontier anyway.

Thanks for the thorough review and the constructive suggestion.

@0hq

…cluster + CT2038 gauntlet provisioned Reviewed all 20 highest-priority Tier 1 PRs from openai/parameter-golf. Two cluster-level findings: - N-gram family bug (10 PRs CLOSED + 1 already ruled): full_key = ((ctx_hash ^ (target * primes[k])) & mask) — target token hashed into the eval-cache lookup key, ruled illegal by valerio-oai on PR openai#779. Same verbatim pattern in openai#770/openai#798/openai#808/openai#825/openai#786/openai#797/openai#909/openai#940/openai#761 + openai#764 follow-up. Upstream parent: lukacf (openai#659/openai#702/openai#727 — task #5 audit queued). - Standard SLOT cluster (4 HOLD pending openai#1336, 2 CLOSE): per-window delta+logit_bias optimized N steps against (per_token_nll * mask) where mask = scored positions [s:wlen]. PRs openai#1321/openai#1324/openai#1278/openai#1263 → HOLD; openai#1319/openai#1376 → CLOSE. Clean MERGE-eligible: openai#1420 (token_hint-only post-fix) and openai#1450 (TMA megakernel triple loop). Eval-budget gate (openai#915/openai#889 anthony-maio pair): clean ngram code, ~14.9 min ngram stage on 8xH100 SXM. One @0hq ruling on Issue openai#17 unblocks both PRs plus ~30 ngram-cache PRs. Infrastructure: provisioned CT2038 (proteus-engine, 128 GB RAM, 32 cores) as the dedicated parameter-golf gauntlet host. Installed Triton 3.6.0, deployed cpu_test.py + flash_attn_stub.py. Re-ran the 4 PRs originally skipped due to FA3/Triton blockers — all PASS. Edited 4 GitHub comments via gh api PATCH to add the rerun results. Coverage went from 9/20 to 14/20 fully gauntleted. Side session handed off via SOW_HF_DATASET_REPUBLISH.md (Scylla 998→1254 fix + SP4096/SP8192/SP12288/SP16384 publish + Cloudflare R2 mirror). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-04-12T14:50:23Z

Thanks for confirming and for the transparent response. Closing makes sense — the 1.1216 base is solid work on its own and the curriculum learning / LeakyReLU(0.9)² contributions are real. If you revisit on the SP8192 track with a context-only key or full-vocabulary reweighting, happy to take another look.

Good luck with the new direction.

ndokutovich · 2026-04-12T15:39:24Z

Closing as agreed — the 7-gram backoff uses the target-in-key hashing pattern disallowed in the March 27 ruling. The curriculum learning and LeakyReLU(0.9)² contributions live on in our SP8192 work. Thanks @MatoTeziTanka for the review.

Record: Curriculum Learning + LeakyReLU(0.9)^2 + 7-gram Backoff (val_…

36e9649

…bpb=0.9633, 1 seed)

MatoTeziTanka mentioned this pull request Mar 26, 2026

PROTEUS+STYX — val_bpb 0.8495 (3-seed mean) — LeakyReLU(0.9)² + 5-gram Eval Cache #769

Closed

10 tasks

fix: submission.json format — add name/blurb/date/bytes_total fields …

519eec3

…for bot parsing

MatoTeziTanka mentioned this pull request Mar 26, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

This was referenced Apr 11, 2026

Record: SP1024 + SLOT-24 + QK5.25 + Pre-Quant AdamW TTT — val_bpb 0.8265 (3-seed mean) #1488

Closed

Record: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish) #702

Open

ndokutovich closed this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633)#764

Record: Curriculum Learning + LeakyReLU(0.9)² + 7-gram Backoff (val_bpb=0.9633)#764
ndokutovich wants to merge 2 commits intoopenai:mainfrom
ndokutovich:submission-v7-curriculum-ngram

ndokutovich commented Mar 25, 2026

Uh oh!

MatoTeziTanka commented Mar 26, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

ndokutovich commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

ndokutovich commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ndokutovich commented Mar 25, 2026

Summary

1. Curriculum Learning (Shard Reordering)

2. LeakyReLU(0.9)² Slope Optimization

Results

Reproduction

Acknowledgments

Status

Test plan

Uh oh!

MatoTeziTanka commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MatoTeziTanka commented Apr 11, 2026

Uh oh!

ndokutovich commented Apr 11, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

ndokutovich commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MatoTeziTanka commented Mar 26, 2026 •

edited

Loading