[Feat] Add draft_logprobs for Speculative Decode MTP #4287
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
描述:
本 PR 为 MTP 的 Speculative Decode MTP 功能增加
draft_logprobs支持,并在 OpenAI 兼容接口上做了对应扩展。主要改动:
新增请求参数
include_draft_logprobs/chat/completions和/completions接口中,用户可以通过该参数控制是否返回draft_logprobs。新增响应参数
draft_logprobsinclude_draft_logprobs为true时,响应中会包含draft_logprobs,用于记录推测解码过程中中间的概率值。改动说明: 非Speculative Decode MTP 核心解码逻辑保持不变,与现有解码流程兼容。
目的:
示例请求 (
curl):curl https://{ip}:{port}/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "your-model", "prompt": "Hello, world!", "logprobs": 5, "include_draft_logprobs": true }'curl https://{ip}:{port}/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "your-model", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello, world!"} ], "logprobs": true, "top_logprobs": 5, "include_draft_logprobs": true }'示例响应片段:
{ "id": "cmpl-xxx", "object": "text_completion", "choices": [ { "text": "Hello", "logprobs": [ ... ], "draft_logprobs": [ ... ] } ] }{ "id": "chatcmpl-xxx", "object": "chat.completion", "choices": [ { "message": { "role": "assistant", "content": "Hello", "logprobs": [ ... ], "draft_logprobs": [ ... ] } } ] }