Fix kimi yarn settings for draft model#54
Conversation
There was a problem hiding this comment.
Pull request overview
Aligns Kimi EAGLE3 draft-model RoPE/YARN configuration with the target model so long-context behavior matches between draft and target (consistent with recommendations from SPEED-Bench).
Changes:
- Wire
rope_thetaintoLlamaYarnRotaryEmbeddingvia thebaseparameter. - Ensure draft model config generation carries over
rope_thetaandrope_scaling(with safe copying for non-primitive values). - Update Kimi draft-model JSON to include YARN
rope_scalingand the intendedrope_theta, and add a unit test validating RoPE wiring.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| torchspec/models/draft/llama3_eagle.py | Passes rope_theta through to YARN rotary embedding initialization. |
| torchspec/config/utils.py | Copies rope_theta/rope_scaling from target to draft configs; adds safe value copying. |
| tests/test_eagle3_loss.py | Adds a test asserting rope_theta and YARN settings are reflected in the model’s rotary embedding. |
| configs/draft_models/kimi_k25_eagle3.json | Updates Kimi draft config to use YARN scaling settings and rope_theta=50000.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e187fe522e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Normalize minimal YaRN rope_scaling configs when generating draft model JSONs so copied target settings remain loadable instead of failing later during rotary cache initialization.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 997ea0fdd0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
In this paper SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding, an interesting finding is that draft model training in EAGLE3 with yarn often suffers from the wrong initialization config with max_position_embeddings and rope config. Most of the time the draft model is trained with short context and does not work well in long context. They suggest to add back yarn config later during inference.
While TorchSpec don't typically suffer from this because we enabled long context training. However, it is still worth it to align the yarn configs between draft and target for the Kimi case.
Testing Done
WIP