Add EngramLite, SkipGram, BackoffNgramMixer, and Complementary Training to training stack#4
Draft
Copilot wants to merge 2 commits intocopilot/verify-sub-1-0-pbp-resultsfrom
Conversation
…er, complementary training Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/804473af-ee1a-48f2-a64e-cf855f911984 Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Continue verification and merge changes
Add EngramLite, SkipGram, BackoffNgramMixer, and Complementary Training to training stack
Apr 1, 2026
kailean
approved these changes
Apr 1, 2026
Copilot AI
added a commit
that referenced
this pull request
Apr 3, 2026
- Upgrade train_gpt_mlx_kl.py to feature-complete version from PR #4: EngramLite, SkipGram, BackoffNgramMixer, Complementary Training, SmearGate, partial RoPE, LN scale, XSA, GPTQ-lite, TTT, sliding eval - Add pg_novel_ideas.md comprehensive analysis from brainstorm branch - Update module docstring to list all 17 innovations - Fix CLAUDE.md venv activation path and add moonshot smoke test command Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/a0c7ea6e-8952-4355-8557-7137e4a94e4c Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the full moonshot feature stack from PR #3's checklist. All features default OFF — no impact on existing runs.
New components
EngramLiteEmbedding— ReplacesBigramHashwhenENGRAM_LITE_ENABLED=1. Multi-head hashed bigram+trigram logit bias with learned sigmoid gating. Gating is essential: raw trigrams hurt BPB (Non-record: 11L XSA-all + Full GPTQ + Selective Pruning (val_bpb=1.1154, 3-seed) openai/parameter-golf#609); gating suppresses collision noise. ~1.2M params vs ~16.7M for BigramHash atBIGRAM_HASH_SIZE=16384.SkipGramHashEmbedding— Additive logit bias from non-adjacent token pairs (e.g.t[-1]×t[-3]). Enabled viaSKIPGRAM_HASH_SIZE>0.BackoffNgramMixer— Eval-time causal n-gram LM with linear-interpolation backoff and entropy-adaptive mixing. Zero artifact cost (built from already-scored tokens, never saved). Competition-compliant: full-vocabulary normalized distributions, strictly causal.build_bigram_stats()— Vectorized P(next|prev) precomputation from training shards (Laplace-smoothed). One-time cost at run start; not stored in artifact.Model / training changes
GPT._add_logit_biases()— unified helper: EngramLite XOR BigramHash, plus optional SkipGramGPT.token_logits()— returns(B, T, V)raw logits for BackoffNgramMixerGPT.complementary_loss()— down-weights tokens with high bigram probability, forcing neural capacity toward hard tokensSplitOptimizers—engram_lite.*andskipgram_hash.*params now routed to Muon/Adam correctlyEval changes
eval_val_sliding_ngram()— sliding-window eval with per-token BackoffNgramMixer mixing; activated whenNGRAM_MIXER_ENABLED=1ngram_mixer_enabledbeforettt_enabledor plain slidingEnv vars
ENGRAM_LITE_ENABLED0SKIPGRAM_HASH_SIZE0COMPLEMENT_ALPHA0.0NGRAM_MIXER_ENABLED0NGRAM_ALPHA0.25entropy_adaptive)NGRAM_MAX_ORDER4Full moonshot invocation:
ENGRAM_LITE_ENABLED=1 COMPLEMENT_ALPHA=0.5 NGRAM_MIXER_ENABLED=1 NGRAM_ALPHA=0.25 NGRAM_MAX_ORDER=4Original prompt