[Bugfix] Resolve MTP > 1 issue when lm head tp > 1 #4254

jianzs · 2025-11-18T09:51:29Z

What this PR does / why we need it?

Previously, the dummy run executed compute_logits only once, regardless of num_speculative_tokens. This caused execute_model to hang on compute_logits when lm head tensor parallelism exceeded 1. The fix ensures compute_logits executes correctly during dummy run, matching num_speculative_tokens.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

gemini-code-assist

Code Review

This pull request aims to fix an issue with speculative decoding (MTP) when tensor parallelism is used on the language model head. The core of the fix is to ensure the dummy run correctly simulates the multiple compute_logits calls that occur in a real run. While the fix is correctly applied for MtpProposer, it seems to be incomplete for EagleProposer, which could lead to the same issue in that scenario. Additionally, a refactoring in model_runner_v1.py appears to have introduced an AttributeError by calling a non-existent method on the drafter object. I've provided critical comments and suggestions for both issues.

gemini-code-assist · 2025-11-18T09:54:49Z

vllm_ascend/worker/model_runner_v1.py

                        hidden_states[dummy_indices])

+                def dummy_drafter_compute_logits(hidden_states):
+                    return self.drafter.compute_logits(


The dummy_drafter_compute_logits function calls self.drafter.compute_logits, but the compute_logits method is on the model attribute of the drafter object, not on the drafter itself. This will result in an AttributeError. The call should be self.drafter.model.compute_logits.

Suggested change

return self.drafter.compute_logits(

return self.drafter.model.compute_logits(

github-actions · 2025-11-18T10:05:56Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

whx-sjtu

We firstly fixed hanging issue of running MTP=1 with llm head tp in PR #3915. This PR refactors it to run dummy_compute_logits in drafter's dummy_run and further fixes MTP > 1 scenario. LGTM.

github-actions · 2025-11-20T12:37:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

zouyida2052 · 2025-11-21T14:05:49Z

I've tested it on deepseek and it proves to be useful, please make ci happy

Previously, the dummy run executed compute_logits only once, regardless of num_speculative_tokens. This caused execute_model to hang on compute_logits when lm head tensor parallelism exceeded 1. The fix ensures compute_logits executes correctly during dummy run, matching num_speculative_tokens. Signed-off-by: Jade Zheng <[email protected]>

Signed-off-by: Jade Zheng <[email protected]>

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

whx-sjtu approved these changes Nov 19, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Nov 20, 2025

jianzs added 3 commits November 21, 2025 23:11

update

c38c6bb

Signed-off-by: Jade Zheng <[email protected]>

update

84253da

Signed-off-by: Jade Zheng <[email protected]>

jianzs force-pushed the gh-fix-mtp-gt-1 branch from 28752d2 to 84253da Compare November 21, 2025 15:15

jianzs added ready read for review ready-for-test start test by label for PR labels Nov 21, 2025

github-actions bot removed the merge-conflicts label Nov 21, 2025

update

a65681c

Signed-off-by: Jade Zheng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Resolve MTP > 1 issue when lm head tp > 1 #4254

[Bugfix] Resolve MTP > 1 issue when lm head tp > 1 #4254

jianzs commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

whx-sjtu left a comment

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

zouyida2052 commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	return self.drafter.compute_logits(
	return self.drafter.model.compute_logits(

[Bugfix] Resolve MTP > 1 issue when lm head tp > 1 #4254

Are you sure you want to change the base?

[Bugfix] Resolve MTP > 1 issue when lm head tp > 1 #4254

Conversation

jianzs commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

whx-sjtu left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

zouyida2052 commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jianzs commented Nov 18, 2025 •

edited by github-actions bot

Loading