Multi-turn tool call chat completions conversations#712
Multi-turn tool call chat completions conversations#712jaredoconnell wants to merge 11 commits intovllm-project:mainfrom
Conversation
Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Adds variable size responses for synthetic data, and better handles edge cases for external datasets. Also improves documentation. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Extracts functionality to new static methods. Assisted-by: Claude Code Sonnet 4.5 Signed-off-by: Jared O'Connell <joconnel@redhat.com>
sjmonson
left a comment
There was a problem hiding this comment.
Not finished reviewing; will add more comments in a bit. Github is acting up and not letting me add to this review for some reason.
| # Preserve the original turn index from the column name so | ||
| # that sparse columns (e.g. tools_0, tools_3) stay aligned | ||
| # with the turns they belong to. | ||
| for original_turn, column_name in sorted(turn_columns): | ||
| column_type = cast("GenerativeDatasetColumnType", column_type) | ||
| mappings[(column_type, turn)].append((index, column_name)) | ||
| mappings[(column_type, original_turn)].append((index, column_name)) |
There was a problem hiding this comment.
I'm fine with this change since I debated this behavior initially, but remove the comment and rename the var back to turn since original_turn is meaningless to someone who is not looking at the original code.
| stop_conversation: bool = Field( | ||
| default=False, | ||
| description=( | ||
| "When True, the worker cancels all remaining turns in the " | ||
| "conversation after this request completes successfully. " | ||
| "Set by the backend when a turn's result means the conversation " | ||
| "should not continue (e.g. expected tool call not produced)." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
We already use exceptions to trigger early stops from the backend so I prefer that route. "error_stop" can throw any exception and "ignore_stop" might work to throw asyncio.CancelledError (See future comment for examples).
There was a problem hiding this comment.
Don't need these changes with above suggestion.
| # Tool calling requires the model to stop naturally after producing | ||
| # valid JSON; ignore_eos would force generation past that point and | ||
| # break the server's constrained decoding grammar. | ||
| body.pop("ignore_eos", None) | ||
| body.pop("stop", None) | ||
|
|
||
| # On tool-call turns, let the model finish valid JSON naturally; | ||
| # max_completion_tokens would truncate output mid-JSON and corrupt | ||
| # the arguments sent in conversation history on follow-up turns. | ||
| if data.expects_tool_call: | ||
| body.pop("max_completion_tokens", None) | ||
| body.pop("max_tokens", None) |
There was a problem hiding this comment.
This does not seem right. Either all of these key pops belong under if data.expects_tool_call or none of them do. If you pop ignore_eos but not max_tokens then you end up with the same problem as is mentioned in the first comment.
There was a problem hiding this comment.
Its also a very bad idea to remove these keys unless we absolutely have to because it could destroy benchmark reproducibility. So if you haven't already I would suggest you double check the behavior.
There was a problem hiding this comment.
I think it was designed to unconditionally pop some of those when tools were present was in case the model created some anyway. For now I moved them all to the conditional, and added a section to the documentation on why this happens.
| # Tool calling configuration | ||
| @click.option( | ||
| "--tool-choice", | ||
| type=str, | ||
| default=None, | ||
| help=( | ||
| 'Tool choice mode: "required", "auto", or "none". ' | ||
| "Controls whether the model is forced to produce tool calls on " | ||
| "tool-call turns. Overrides the per-request default (required) set " | ||
| "when tools come from the dataset." | ||
| ), | ||
| ) | ||
| @click.option( | ||
| "--tool-call-missing-behavior", | ||
| type=click.Choice(["ignore_continue", "ignore_stop", "error_stop"]), | ||
| default=None, | ||
| help=( | ||
| "What the worker does when a tool call is expected but the model " | ||
| "does not produce one. ignore_continue: continue to next turn, " | ||
| "ignore_stop: cancel remaining turns, error_stop: error and " | ||
| "cancel remaining turns. Default: error_stop." | ||
| ), | ||
| ) |
There was a problem hiding this comment.
No new top-level args. Handle this in --backend-kwargs
There was a problem hiding this comment.
This should be its own guide, not inserted into the generic Multiturn one.
These are the diffs recommended in the comments. They are untested, and require some follow-up changes. Co-authored-by: Samuel Monson <smonson@irbash.net> Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Moves documentation. Switches fully to exceptions to stop conversations early. Also includes info gained from vLLM contributor. Assisted-by: Cursor AI Signed-off-by: Jared O'Connell <joconnel@redhat.com>
| ] | ||
|
|
||
| # Short placeholder used when no tool_response_tokens size is configured. | ||
| DEFAULT_SYNTHETIC_TOOL_RESPONSE = '{"status": "ok"}' |
| @@ -434,6 +449,12 @@ async def _process_next_request( # noqa: C901 | |||
| logger.opt(exception=True).debug( | |||
| f"Backend exception for request {request_info.request_id}" | |||
| ) | |||
| # Cancel remaining conversation turns on backend error | |||
| for skip_req, skip_info in conversation: | |||
| skip_info.error = f"Cancelled: {request_info.error}" | |||
| skip_info.timings.resolve_end = time.time() | |||
| self._send_update("cancelled", None, skip_req, skip_info) | |||
| conversation.clear() | |||
| finally: | |||
| if request_info is not None: | |||
There was a problem hiding this comment.
What do you think of this design?
Assisted-by: Cursor AI Claude 4.6 Opus High Thinking Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Assisted-by: Cursor AI Claude 4.6 Opus High Thinking Signed-off-by: Jared O'Connell <joconnel@redhat.com>
Summary
This PR adds client-side chat completions conversations to the http backend.
Details
autoorrequiredto the model. Good for testing various scenarios. Models behave differently depending on the value set.requiredis best for predictability.Test Plan
Related Issues
Use of AI
## WRITTEN BY AI ##)