Skip to content

[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache#1185

Open
skoustav35 wants to merge 6 commits intoopenai:mainfrom
skoustav35:main
Open

[10min_16mb] 0.9641 BPB: LeakyReLU² + Score-First TTT + N-gram Backoff Cache#1185
skoustav35 wants to merge 6 commits intoopenai:mainfrom
skoustav35:main

Conversation

@skoustav35
Copy link
Copy Markdown

Submitting a new entry for the 10-minute 16MB track that achieves a 3-seed exact mean of 0.9641 BPB (1.6274 nats).

This improves upon the current merged 1.1147 BPB baseline (PR #1019) by 0.1506 BPB (0.2548 nats), which exceeds the required 0.005 nats threshold by ~51× (Welch t = -328.3, p ≪ 0.01).

Techniques Used

  • Architecture: 11 Layers, 512 dim, GQA = 8H/4KV, MLP 3x, LeakyReLU(0.5)², XSA-5 (layers 6-10), Tied embeddings, Value Residual, Gated Attention, VE(128) on layers 8/9/10, MTP-2, BigramHash 2048.
  • Eval-time N-gram Backoff Cache:
    • Multi-order backoff (orders 2–9), picking the highest matching order.
    • Laplace (add-1) smoothing: Ensures the returned probability is a proper normalized distribution over the vocabulary and does not depend on target-oracle knowledge.
    • Entropy-adaptive alpha scaling.
  • Test-Time Training (Legal, Score-First):
    • SGD, 3 epochs, 32K token chunks, stride 64.
    • Tokens are scored strictly backward-lookingly before updates.
  • Optimization & Quantization:
    • Muon + Adam split.
    • Int6 per-row quantization with LZMA compression. Late-stage CROWN-Q penalty.

Compliance & Margins

Reproducibility

The script resolves data paths relative to the repo root automatically.

SEED=1337 RUN_ID=seed_1337 VOCAB_SIZE=1024 \
torchrun --standalone --nproc_per_node=8 \
  records/track_10min_16mb/2026-03-31_LeakyReLU2_LegalTTT_NGramCache_XSA/train_gpt.py

sunnypatneedi pushed a commit to sunnypatneedi/parameter-golf that referenced this pull request Mar 31, 2026
- logs/daily_research.md: append 2026-03-31 research section
  - PR openai#771 CLOSED (score-first TTT rule violation)
  - PR openai#727 CLOSED (n-gram illegal — no renormalization)
  - Merged SOTA: 1.1147 (PR openai#1019, 2026-03-25)
  - New PRs: openai#1184 (0.9485 Scylla tokenizer), openai#1185 (0.9641)
  - SLOT eval technique, Full GPTQ, QK-Gain 4.0 documented
- CLAUDE.md: update Competition Strategy + lessons 21-24
  - Merged SOTA updated to 1.1147
  - Current Best Path rewritten for 2026-03-31
  - Lessons openai#21-24: TTT fix, n-gram risk, Scylla, SLOT
  - TTT constraint clarified to score-first protocol
  - Version bumped to v9.0

https://claude.ai/code/session_015z6QKyKzDSYzTniW1GPhAe
…ct-for-golf-challenge

Add opt-in MoD routing, SquareGLU MLP, EMA warmdown distillation, and Grokfast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant