Skip to content

Add EngramLite, SkipGram, BackoffNgramMixer, and Complementary Training to training stack#4

Draft
Copilot wants to merge 2 commits intocopilot/verify-sub-1-0-pbp-resultsfrom
copilot/continue-verify-and-merge-changes
Draft

Add EngramLite, SkipGram, BackoffNgramMixer, and Complementary Training to training stack#4
Copilot wants to merge 2 commits intocopilot/verify-sub-1-0-pbp-resultsfrom
copilot/continue-verify-and-merge-changes

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 1, 2026

Implements the full moonshot feature stack from PR #3's checklist. All features default OFF — no impact on existing runs.

New components

  • EngramLiteEmbedding — Replaces BigramHash when ENGRAM_LITE_ENABLED=1. Multi-head hashed bigram+trigram logit bias with learned sigmoid gating. Gating is essential: raw trigrams hurt BPB (Non-record: 11L XSA-all + Full GPTQ + Selective Pruning (val_bpb=1.1154, 3-seed) openai/parameter-golf#609); gating suppresses collision noise. ~1.2M params vs ~16.7M for BigramHash at BIGRAM_HASH_SIZE=16384.

  • SkipGramHashEmbedding — Additive logit bias from non-adjacent token pairs (e.g. t[-1]×t[-3]). Enabled via SKIPGRAM_HASH_SIZE>0.

  • BackoffNgramMixer — Eval-time causal n-gram LM with linear-interpolation backoff and entropy-adaptive mixing. Zero artifact cost (built from already-scored tokens, never saved). Competition-compliant: full-vocabulary normalized distributions, strictly causal.

  • build_bigram_stats() — Vectorized P(next|prev) precomputation from training shards (Laplace-smoothed). One-time cost at run start; not stored in artifact.

Model / training changes

  • GPT._add_logit_biases() — unified helper: EngramLite XOR BigramHash, plus optional SkipGram
  • GPT.token_logits() — returns (B, T, V) raw logits for BackoffNgramMixer
  • GPT.complementary_loss() — down-weights tokens with high bigram probability, forcing neural capacity toward hard tokens
  • SplitOptimizersengram_lite.* and skipgram_hash.* params now routed to Muon/Adam correctly

Eval changes

  • eval_val_sliding_ngram() — sliding-window eval with per-token BackoffNgramMixer mixing; activated when NGRAM_MIXER_ENABLED=1
  • Final eval path checks ngram_mixer_enabled before ttt_enabled or plain sliding

Env vars

Flag Default Purpose
ENGRAM_LITE_ENABLED 0 Replace BigramHash with EngramLite
SKIPGRAM_HASH_SIZE 0 Enable skip-gram logit bias
COMPLEMENT_ALPHA 0.0 Complementary training strength
NGRAM_MIXER_ENABLED 0 BackoffNgramMixer at eval
NGRAM_ALPHA 0.25 Fixed mixing weight (entropy-adaptive when mode=entropy_adaptive)
NGRAM_MAX_ORDER 4 Max n-gram order

Full moonshot invocation: ENGRAM_LITE_ENABLED=1 COMPLEMENT_ALPHA=0.5 NGRAM_MIXER_ENABLED=1 NGRAM_ALPHA=0.25 NGRAM_MAX_ORDER=4

Original prompt

Pull Request: #3 continue and after verification merge and push all changes.

…er, complementary training

Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/804473af-ee1a-48f2-a64e-cf855f911984

Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
Copilot AI changed the title [WIP] Continue verification and merge changes Add EngramLite, SkipGram, BackoffNgramMixer, and Complementary Training to training stack Apr 1, 2026
Copilot AI requested a review from kailean April 1, 2026 18:48
Copilot AI added a commit that referenced this pull request Apr 3, 2026
- Upgrade train_gpt_mlx_kl.py to feature-complete version from PR #4:
  EngramLite, SkipGram, BackoffNgramMixer, Complementary Training,
  SmearGate, partial RoPE, LN scale, XSA, GPTQ-lite, TTT, sliding eval
- Add pg_novel_ideas.md comprehensive analysis from brainstorm branch
- Update module docstring to list all 17 innovations
- Fix CLAUDE.md venv activation path and add moonshot smoke test command

Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/a0c7ea6e-8952-4355-8557-7137e4a94e4c

Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants