Skip to content

feat: consolidate moonshot stack into clean submission-ready training script#7

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/create-clean-submission-ready-pr
Draft

feat: consolidate moonshot stack into clean submission-ready training script#7
Copilot wants to merge 5 commits intomainfrom
copilot/create-clean-submission-ready-pr

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 3, 2026

Multiple open PRs developed the moonshot feature stack in parallel branches that were never merged to main. This PR consolidates the best artifacts from all branches into a single coherent submission.

Changes

train_gpt_mlx_kl.py — upgraded to full moonshot build (1284 → 1848 lines)

Replaces the stripped-down main-branch version with the feature-complete build from PR #4 (copilot/continue-verify-and-merge-changes), adding:

  • EngramLite — gated multi-head bigram+trigram hash logit bias; replaces BigramHash when ENGRAM_LITE_ENABLED=1
  • SkipGramHash — non-adjacent token pair logit bias (SKIPGRAM_HASH_SIZE>0)
  • BackoffNgramMixer — causal Laplace-smoothed n-gram LM mixed at eval time; zero artifact cost, never serialized (NGRAM_MIXER_ENABLED=1)
  • Complementary Training — per-token CE weighting that down-weights bigram-easy tokens, forcing neural capacity toward hard tokens (COMPLEMENT_ALPHA=0.5)
  • GPTQ-lite per-row scale quantization (USE_GPTQ_LITE=1)
  • SmearGate, partial RoPE, depth-aware LN scale 1/√(layer+1), XSA on last N layers
  • Sliding-window eval + LoRA TTT at eval
  • All features default OFF — no impact on existing baseline runs

Moonshot invocation:

ENGRAM_LITE_ENABLED=1 COMPLEMENT_ALPHA=0.5 NGRAM_MIXER_ENABLED=1 \
NGRAM_ALPHA=0.25 NGRAM_MAX_ORDER=4 python3 train_gpt_mlx_kl.py

pg_novel_ideas.md — added from PR #1 brainstorm branch

Comprehensive analysis of 8 approaches to sub-1.10 BPB, ranked idea list, dead-idea evidence table, and POC stubs. Previously only existed on copilot/brainstorm-novel-approaches.

CLAUDE.md

  • Fixed broken venv activation path (missing /bin/)
  • Added moonshot smoke test and full H100 run commands

- Upgrade train_gpt_mlx_kl.py to feature-complete version from PR #4:
  EngramLite, SkipGram, BackoffNgramMixer, Complementary Training,
  SmearGate, partial RoPE, LN scale, XSA, GPTQ-lite, TTT, sliding eval
- Add pg_novel_ideas.md comprehensive analysis from brainstorm branch
- Update module docstring to list all 17 innovations
- Fix CLAUDE.md venv activation path and add moonshot smoke test command

Agent-Logs-Url: https://github.com/kailean/parameter-golf/sessions/a0c7ea6e-8952-4355-8557-7137e4a94e4c

Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
Copilot AI changed the title [WIP] Create a clean submission ready pull request feat: consolidate moonshot stack into clean submission-ready training script Apr 3, 2026
Copilot AI requested a review from kailean April 3, 2026 21:53
- Fix orphaned clip_grad_tree function body by adding proper def line
- Remove verbose section separator comment blocks (17+ instances)
- Compact 26-line module docstring to 2-line summary
- Trim multi-line docstrings to single lines throughout
- Remove redundant inline comments that restate the code
- Remove unnecessary blank lines within function bodies
- Compact Hyperparameters class by removing section comment headers

All functionality, logic, algorithms, and class/function signatures preserved.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: kailean <49617037+kailean@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants