Skip to content

Fix kimi yarn settings for draft model#54

Merged
yubofredwang merged 2 commits intomainfrom
ywang/fix-kimi-yarn
Apr 3, 2026
Merged

Fix kimi yarn settings for draft model#54
yubofredwang merged 2 commits intomainfrom
ywang/fix-kimi-yarn

Conversation

@yubofredwang
Copy link
Copy Markdown
Collaborator

Summary

In this paper SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding, an interesting finding is that draft model training in EAGLE3 with yarn often suffers from the wrong initialization config with max_position_embeddings and rope config. Most of the time the draft model is trained with short context and does not work well in long context. They suggest to add back yarn config later during inference.

While TorchSpec don't typically suffer from this because we enabled long context training. However, it is still worth it to align the yarn configs between draft and target for the Kimi case.

Testing Done

WIP

@yubofredwang yubofredwang marked this pull request as ready for review March 27, 2026 18:24
Copilot AI review requested due to automatic review settings March 27, 2026 18:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns Kimi EAGLE3 draft-model RoPE/YARN configuration with the target model so long-context behavior matches between draft and target (consistent with recommendations from SPEED-Bench).

Changes:

  • Wire rope_theta into LlamaYarnRotaryEmbedding via the base parameter.
  • Ensure draft model config generation carries over rope_theta and rope_scaling (with safe copying for non-primitive values).
  • Update Kimi draft-model JSON to include YARN rope_scaling and the intended rope_theta, and add a unit test validating RoPE wiring.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
torchspec/models/draft/llama3_eagle.py Passes rope_theta through to YARN rotary embedding initialization.
torchspec/config/utils.py Copies rope_theta/rope_scaling from target to draft configs; adds safe value copying.
tests/test_eagle3_loss.py Adds a test asserting rope_theta and YARN settings are reflected in the model’s rotary embedding.
configs/draft_models/kimi_k25_eagle3.json Updates Kimi draft config to use YARN scaling settings and rope_theta=50000.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e187fe522e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Normalize minimal YaRN rope_scaling configs when generating draft model JSONs so copied target settings remain loadable instead of failing later during rotary cache initialization.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 997ea0fdd0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@yubofredwang yubofredwang merged commit 06a9760 into main Apr 3, 2026
1 check passed
@yubofredwang yubofredwang deleted the ywang/fix-kimi-yarn branch April 3, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants