[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache by skoustav35 · Pull Request #1185 · openai/parameter-golf

skoustav35 · 2026-03-31T16:48:30Z

Submitting a new entry for the 10-minute 16MB track that achieves a 3-seed exact mean of 0.9641 BPB (1.6274 nats).

This improves upon the current merged 1.1147 BPB baseline (PR #1019) by 0.1506 BPB (0.2548 nats), which exceeds the required 0.005 nats threshold by ~51× (Welch t = -328.3, p ≪ 0.01).

Techniques Used

Architecture: 11 Layers, 512 dim, GQA = 8H/4KV, MLP 3x, LeakyReLU(0.5)², XSA-5 (layers 6-10), Tied embeddings, Value Residual, Gated Attention, VE(128) on layers 8/9/10, MTP-2, BigramHash 2048.
Eval-time N-gram Backoff Cache:
- Multi-order backoff (orders 2–9), picking the highest matching order.
- Laplace (add-1) smoothing: Ensures the returned probability is a proper normalized distribution over the vocabulary and does not depend on target-oracle knowledge.
- Entropy-adaptive alpha scaling.
Test-Time Training (Legal, Score-First):
- SGD, 3 epochs, 32K token chunks, stride 64.
- Tokens are scored strictly backward-lookingly before updates.
Optimization & Quantization:
- Muon + Adam split.
- Int6 per-row quantization with LZMA compression. Late-stage CROWN-Q penalty.

Compliance & Margins

Training Time: Seeds complete in 599,384, 599,761, and 599,618 ms (Note: logged train_time excludes initial compilation and 20 warmup steps).
Artifact Size: 15,989,583 bytes max across seeds (well under 16,000,000 B).
N-Gram Cache Legality: We note that this technique builds on the cache method seen in closed PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727, and explicitly acknowledge the ongoing discussion in issue Illegal submissions megathread #677 regarding eval-time caching methods. This implementation uses zero artifact bytes and is strictly backward-looking.

Reproducibility

The script resolves data paths relative to the repo root automatically.

SEED=1337 RUN_ID=seed_1337 VOCAB_SIZE=1024 \
torchrun --standalone --nproc_per_node=8 \
  records/track_10min_16mb/2026-03-31_LeakyReLU2_LegalTTT_NGramCache_XSA/train_gpt.py

… timing caveats

, cascade code size

- logs/daily_research.md: append 2026-03-31 research section - PR openai#771 CLOSED (score-first TTT rule violation) - PR openai#727 CLOSED (n-gram illegal — no renormalization) - Merged SOTA: 1.1147 (PR openai#1019, 2026-03-25) - New PRs: openai#1184 (0.9485 Scylla tokenizer), openai#1185 (0.9641) - SLOT eval technique, Full GPTQ, QK-Gain 4.0 documented - CLAUDE.md: update Competition Strategy + lessons 21-24 - Merged SOTA updated to 1.1147 - Current Best Path rewritten for 2026-03-31 - Lessons openai#21-24: TTT fix, n-gram risk, Scylla, SLOT - TTT constraint clarified to score-first protocol - Version bumped to v9.0 https://claude.ai/code/session_015z6QKyKzDSYzTniW1GPhAe

…ct-for-golf-challenge Add opt-in MoD routing, SquareGLU MLP, EMA warmdown distillation, and Grokfast

Snehra AI and others added 4 commits March 31, 2026 11:09

Submit LeakyReLU2 + Legal TTT (Score-First) + N-gram Cache record

2eb387c

Fix stale baseline (PR549->PR609/1.1147), correct nats threshold, add…

a942395

… timing caveats

Fix data paths for records/ subfolder, update baseline to PR openai#1019

1b8d00f

, cascade code size

Merge branch 'openai:main' into main

20650e7

notapplica mentioned this pull request Mar 31, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

skoustav35 added 2 commits April 1, 2026 09:39

Add opt-in MoD, SquareGLU, EMA distillation, and Grokfast

c577b1c

Merge pull request #1 from skoustav35/codex/review-and-optimize-proje…

37fcb26

…ct-for-golf-challenge Add opt-in MoD routing, SquareGLU MLP, EMA warmdown distillation, and Grokfast

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache#1185

[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache#1185
skoustav35 wants to merge 6 commits intoopenai:mainfrom
skoustav35:main

skoustav35 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skoustav35 commented Mar 31, 2026

Techniques Used

Compliance & Margins

Reproducibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant