Implement EngramLite + Complementary Training + BackoffNgramMixer stack targeting sub-1.0 BPB by Copilot · Pull Request #2 · kailean/parameter-golf

Copilot · 2026-04-01T16:44:48Z

Implements the three highest-EV innovations from the PR #1 analysis into train_gpt_mlx_kl.py. Combined estimated improvement: ~0.08–0.18 BPB over the 1.10 baseline, targeting sub-1.0.

EngramLite (replaces BigramHash by default)

ENGRAM_LITE_ENABLED=1 (new default). Gated multi-head bigram+trigram hash embedding.

K hash heads per n-gram order reduce collision rate without increasing vocab-size tables
Per-order learned gate initialized suppressed (sigmoid(-2)≈0.12) — grows only when lookups are trustworthy; suppresses the +0.0049 BPB noise that raw TrigramHash causes (Non-record: 11L XSA-all + Full GPTQ + Selective Pruning (val_bpb=1.1154, 3-seed) openai/parameter-golf#609)
Bottleneck projection (embed_dim=256 → vocab_size) regularizes vs. direct mapping
Separate ENGRAM_HASH_SIZE (default 8192) knob; BIGRAM_HASH_SIZE still controls plain BigramHash fallback

# Effective capacity: 2 orders × 2 heads × 8192 × 256 = 8.39M params + 0.26M proj
# vs. BigramHash: 16384 × 1024 = 16.78M params (direct)

BackoffNgramMixer (eval-only, zero artifact cost)

NGRAM_MIXER_ENABLED=1. Causal Laplace-smoothed n-gram LM (orders 1–4, configurable) mixed with neural model at eval time.

Two-phase eval: (1) sliding-window neural pass stores per-absolute-position NLL; (2) sequential causal pass mixes distributions
NLL_mix = -log((1-α)·P_neural + α·P_ngram) — mathematically valid (mixed dist sums to 1)
Seeded from pre-built training-data bigram stats (warm start, still causal); then updated from scored val tokens
NGRAM_ALPHA=0.25, NGRAM_MAX_ORDER=4

Complementary Training

COMPLEMENT_ALPHA=0.5. Down-weights tokens already well-predicted by bigrams, forcing specialization on hard tokens.

Pre-computes bigram stats at startup via vectorized np.bincount on first training shard (~2s)
Per-token weight: clip(1 − α·P_bigram(target|ctx), 0.1, 1.0), mean-normalized to preserve effective LR
Module-level globals _COMPLEMENT_BIGRAM_PROBS/_COMPLEMENT_ALPHA captured by mx.compile at graph-trace time

SkipGramHashEmbedding (disabled by default)

SKIPGRAM_HASH_SIZE=2048 to enable. Hash of non-contiguous positions (t-1,t-3) and (t-1,t-5) with per-pattern learned gating. Captures structured repetition (HTML, code, prose templates).

Infra changes

GPT._apply_hash_biases() helper shared by loss() and token_losses()
SplitOptimizers extended to route skipgram_hash.* params (2D→Muon, 1D→Adam)
_MIN_COMPLEMENT_WEIGHT = 0.1, _MIN_PROB = 1e-45 named constants
n_heads bounds check against available primes in EngramLiteEmbedding.__init__

Original prompt

Analyse this Pull Request and brainstorm ways to improve and push the bpb to sub 1.0 : https://github.com/kailean/parameter-golf/pull/1/files

…Gram for sub-1.0 BPB Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/205d0581-0f54-4b88-b48f-39c66db315dc Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>

Initial plan

d1c342d

Copilot AI assigned Copilot and kailean Apr 1, 2026

Copilot started work on behalf of kailean April 1, 2026 16:44 View session

Implement EngramLite, Complementary Training, BackoffNgramMixer, Skip…

b5049dc

…Gram for sub-1.0 BPB Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/205d0581-0f54-4b88-b48f-39c66db315dc Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Analyze pull request and brainstorm improvements for BPB~~ Implement EngramLite + Complementary Training + BackoffNgramMixer stack targeting sub-1.0 BPB Apr 1, 2026

Copilot AI requested a review from kailean April 1, 2026 17:11

Copilot finished work on behalf of kailean April 1, 2026 17:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement EngramLite + Complementary Training + BackoffNgramMixer stack targeting sub-1.0 BPB#2

Implement EngramLite + Complementary Training + BackoffNgramMixer stack targeting sub-1.0 BPB#2
Copilot wants to merge 2 commits intocopilot/brainstorm-novel-approachesfrom
copilot/improve-bpb-to-sub-1-0

Copilot AI commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

EngramLite (replaces BigramHash by default)

BackoffNgramMixer (eval-only, zero artifact cost)

Complementary Training

SkipGramHashEmbedding (disabled by default)

Infra changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Apr 1, 2026 •

edited

Loading