[Responses API] tool_choice support (auto / required / none) for GPT-OSS#37433
[Responses API] tool_choice support (auto / required / none) for GPT-OSS#37433will-deines wants to merge 12 commits intovllm-project:mainfrom
Conversation
fdb422e to
19f3680
Compare
There was a problem hiding this comment.
Code Review
This pull request enhances structured output and tool usage within the OpenAI API response handling. Key changes include adding a function to inject response format schemas into developer messages, updating sampling parameters to support json_object response formats, and introducing new helper functions to extract and convert structured output constraints. The GptOssReasoningParser has been significantly refactored to dynamically prepare structural tags based on tool_choice, function_tools, and final_content_format, ensuring proper integration of both builtin and function tools, and managing the final output constraints. A potential issue was identified where the tag_with_builtin_funcs function could cause unintended side effects by modifying a shared dictionary, and another where the create_responses function did not explicitly handle a None case for structured_outputs, which could lead to errors.
| @@ -52,8 +53,6 @@ def from_builtin_tool_to_tag(tool: str) -> list[dict]: | |||
|
|
|||
|
|
|||
| def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict: | |||
There was a problem hiding this comment.
The tag_with_builtin_funcs function modifies the no_func_reasoning_tag dictionary, which is defined outside the scope of this function. This could lead to unintended side effects if no_func_reasoning_tag is used elsewhere. It's better to create a deep copy within the function to avoid modifying the original dictionary.
def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict:
import copy
new_tag = copy.deepcopy(no_func_reasoning_tag)
new_tag["format"]["triggers"].append("<|channel|>commentary to=")There was a problem hiding this comment.
The function already deep-copies the input on the very next line (new_tag = copy.deepcopy(no_func_reasoning_tag)) — the original dict is never mutated. The only change in this PR is moving import copy from inside the function to the module-level import block.
| reasoning_parser.prepare_structured_tag( | ||
| struct_out.structural_tag, | ||
| self.tool_server, | ||
| tool_choice=request.tool_choice, | ||
| function_tools=function_tools_for_parser, | ||
| ) | ||
| ), | ||
| ) | ||
| else: | ||
| # Content constraint present (json, regex, | ||
| # grammar, choice, json_object). Embed it in the | ||
| # final channel tag within the structural tag. | ||
| content_fmt = _constraint_to_content_format(struct_out) | ||
| if content_fmt is not None: |
There was a problem hiding this comment.
The code block checks if struct_out.all_non_structural_tag_constraints_none() is true. If it is, the structural_tag is replaced. However, if struct_out is None, the code proceeds to the else block without handling the case where struct_out is None. This could lead to unexpected behavior or errors later on. It's important to handle the None case explicitly.
| reasoning_parser.prepare_structured_tag( | |
| struct_out.structural_tag, | |
| self.tool_server, | |
| tool_choice=request.tool_choice, | |
| function_tools=function_tools_for_parser, | |
| ) | |
| ), | |
| ) | |
| else: | |
| # Content constraint present (json, regex, | |
| # grammar, choice, json_object). Embed it in the | |
| # final channel tag within the structural tag. | |
| content_fmt = _constraint_to_content_format(struct_out) | |
| if content_fmt is not None: | |
| if isinstance(struct_out, StructuredOutputsParams): | |
| if struct_out.all_non_structural_tag_constraints_none(): | |
| # No content constraint — just apply reasoning | |
| # channel tags + tool_choice + function tools | |
| sampling_params.structured_outputs = replace( | |
| struct_out, | |
| structural_tag=( | |
| reasoning_parser.prepare_structured_tag( | |
| struct_out.structural_tag, | |
| self.tool_server, | |
| tool_choice=request.tool_choice, | |
| function_tools=function_tools_for_parser, | |
| ) | |
| ), | |
| ) | |
| else: |
There was a problem hiding this comment.
The None case is handled — there's an elif struct_out is None: branch further down (line 580 in the current diff) that applies reasoning channel tags + tool_choice even when no structured output is requested. The review snippet cut off before reaching it.
…d tags Extend prepare_structured_tag() to be the single authority for all generation constraints in GPT-OSS Harmony models: channel structure, tool enforcement, argument validation, and content constraints. tool_choice=required support: - New from_function_tool_to_tag() and tag_with_function_tools() helpers - prepare_structured_tag() extended with tool_choice, function_tools params - Channel blocking: omit <|channel|>final trigger to force tool calls - Remove NotImplementedError for non-auto tool_choice in Harmony path Absorbed from upstream PR vllm-project#35904 (structured output + reasoning): - Content constraint embedding in <|channel|>final tag - _constraint_to_content_format() and _extract_response_format_schema() - struct_out is None branch (reasoning tags always applied) - inject_response_formats() for Harmony cookbook compliance - json_object format handling (was silently ignored) - Streaming .model_dump() alias bug fix Signed-off-by: Will Deines <will@garr.io>
…=none When tool_choice=none, the structural tag grammar correctly blocks tool-calling channels, but the system/developer messages still described the tools to the model. The model would then attempt tool calls that leaked through the output parser. Introduce a tools_visible flag that is false when tool_choice=none, suppressing both the commentary channel in the system message and tool descriptions in the developer message. Signed-off-by: Will Deines <will@garr.io>
… constraint triggered_tags allows free text between triggers, so omitting <|channel|>final from the tag list was not sufficient to prevent the model from using it. xgrammar's at_least_one=True forces the grammar to begin with a triggered channel immediately, blocking <|channel|>final and EOS at the token level. With this flag set on tool_choice=required or a named tool, only 2 tokens are allowed at generation start (the <|channel|> special token), and after that only analysis/commentary continuations — not "final" — are valid. Signed-off-by: Will Deines <will@garr.io>
…is from at_least_one scope The previous approach (empty base with pre-set triggers) caused duplicate triggers after tag_with_function_tools ran, crashing xgrammar with 500. New approach: use the normal base tag, then filter out the pure analysis tag (<|channel|>analysis<|message|>) for required/named tool_choice before setting at_least_one=True. The analysis trigger is kept so analysis-to-functions tags remain reachable in triggered_tags_sub. Verified locally: grammar compiles, triggered_tags_first has only tool call options, after <|channel|> only analysis/commentary continuations allowed (11 tokens), final channel blocked. Signed-off-by: Will Deines <will@garr.io>
- ruff: reformat long lines in tests and serving.py - mypy: annotate base_tag as dict[str, Any] in gptoss_reasoning_parser to allow nested dict indexing through copy.deepcopy result - mypy: suppress arg-type on list(name) in abs_reasoning_parsers where TypeIs narrowing is not respected under --python-version 3.10 Signed-off-by: Will Deines <will@garr.io>
8b3f7c8 to
a933921
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecfb064f1a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| response_format_schema = _extract_response_format_schema(request) | ||
| needs_dev_msg = tools_visible or response_format_schema is not None | ||
|
|
||
| if needs_dev_msg: |
There was a problem hiding this comment.
Keep developer instructions when
tool_choice="none" hides tools
When tool_choice="none" is used with custom function tools, tools_visible becomes false here, so needs_dev_msg also becomes false unless a response schema is present. On GPT-OSS the default VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS=0 means request.instructions are only serialized by get_developer_message(), so these requests now drop the user's instructions entirely and run as if no instructions were provided.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 34c8b47. needs_dev_msg now includes or request.instructions is not None, so instructions flow through get_developer_message() even when tools_visible=False.
| tools_visible = with_custom_tools and request.tool_choice != "none" | ||
|
|
||
| sys_msg = self._construct_harmony_system_input_message( | ||
| request, with_custom_tools, tool_types | ||
| request, tools_visible, tool_types |
There was a problem hiding this comment.
Actually suppress builtin tool prompts for
tool_choice="none"
This new tools_visible flag does not stop _construct_harmony_system_input_message() from populating browser/python/container descriptions, and get_system_message() does not use with_custom_tools to filter them out. As a result, valid tool_choice="none" requests that include builtin tools still advertise those tools in the system prompt, which defeats the prompt-level half of the new none enforcement for GPT-OSS.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 34c8b47. Renamed the parameter to tools_visible and gated all three builtin descriptions (browser_description, python_description, container_description) on it. When tools_visible=False, all descriptions are None and get_system_message() never calls with_tools().
| if tool_choice != "none" and tool_server is not None: | ||
| builtin_tool_list: list[str] = [] | ||
| if tool_server.has_tool("browser"): | ||
| builtin_tool_list.append("browser") | ||
| if tool_server.has_tool("python"): |
There was a problem hiding this comment.
Exclude builtin channels when forcing a named function
A named tool_choice is supposed to leave exactly one callable tool, but builtin tool tags are still added whenever the server exposes them because only tool_choice == "none" skips this block. In any deployment with browser/python/container enabled, tool_choice={"type":"function",...} can satisfy the new at_least_one=True grammar by emitting a builtin call instead of the named function, so the advertised named-function forcing is not deterministic.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 34c8b47. Builtin tool tags are now skipped when isinstance(tool_choice, dict), so named function forcing only includes the named function's channels. Added test_named_tool_choice_excludes_builtins to verify.
1. Preserve developer instructions when tool_choice="none": add `request.instructions is not None` to `needs_dev_msg` condition so instructions flow through even when tools are hidden. 2. Suppress builtin tool descriptions for tool_choice="none": rename `with_custom_tools` param to `tools_visible` in `_construct_harmony_system_input_message` and gate browser/python/ container descriptions on it. 3. Exclude builtin channels for named function tool_choice: skip builtin tool tags when `isinstance(tool_choice, dict)` so they cannot satisfy the `at_least_one` grammar constraint instead of the named function. Signed-off-by: Will Deines <will@garr.io>
34c8b47 to
d23f78f
Compare
… fails When tool_choice="required" or a named tool is specified, the grammar layer (at_least_one + final channel removal) is the primary enforcement. But if the parser discards all tokens or the model produces text instead of a tool call, the response silently returns status="completed" with no function call — violating the contract. Add _check_tool_choice_violation() as a post-generation safety net that detects when no ResponseFunctionToolCall is present in output and forces status="incomplete" with a diagnostic warning log. Signed-off-by: Will Deines <will@garr.io>
…-choice-required Signed-off-by: Will Deines <will@garr.io>
Signed-off-by: Will Deines <will@garr.io>
Inject structural tags into sampling_params for the Chat Completions path, mirroring the existing Responses API implementation. Without this, tool_choice="auto" fails for Harmony models because the model stops after analysis reasoning and never emits the actual tool call. Also suppress tools in the prompt when tool_choice="none" for Harmony models, matching the Responses API behavior. Signed-off-by: Will Deines <will@garr.io>
tool_choice="required" now works (added in feat/responses-tool-choice-required), so the test should expect a successful function call instead of InternalServerError. Signed-off-by: Will Deines <will@garrio.com> Signed-off-by: Will Deines <will@garr.io>
|
This pull request has merge conflicts that must be resolved before it can be |
Summary
Implements
tool_choice="auto","required","none", and named-functionforcing for the Responses API on GPT-OSS (Harmony-format) models, using the
triggered_tagsstructural tag grammar introduced by #35904.Three enforcement mechanisms, one per mode:
auto— function-tool channel tags added to the structural tag; modeldecides whether to call a tool based on context.
required—at_least_one=Trueset on the grammar + the pure analysistag removed from
triggered_tags_first, forcing the model's very firsttoken sequence to be a function-call channel (
<|channel|>commentary to=functions.X<|message|>or<|channel|>analysis to=functions.X<|message|>).After the first tag the sub-dispatch loop is permissive.
none— tool descriptions stripped from the system/developer prompt(
tools_visible=False); only the analysis channel is in the structural tag,so no tool-call channel is ever triggered.
Named-function forcing (
tool_choice={"type":"function","name":"X"}) ishandled by filtering
effective_function_toolsto only the named tool,excluding builtin tool channels from the grammar, and applying
requiredenforcement.
Related Issues & PRs
prepare_structured_tag()withtool_choiceandfunction_toolsparamsResilientStreamableParserneeded for reliable parsing of forced tool callsanalysis to=functions.Xchannel tags depend on this fix being liveTriggeredTagsFormatapproach, broader scope (full chat format). This PR is narrower: tool_choice onlytool_choice="auto"TriggeredTagsFormat; this PR uses the sameat_least_onemechanism forrequiredDecisions we made
1.
at_least_one=True+ remove pure-analysis tag forrequiredWhat we chose: Set
at_least_one=Trueon thetriggered_tagsgrammar ANDfilter out the
<|channel|>analysis<|message|>tag from thetagslist beforesetting
at_least_one. The<|channel|>analysistrigger is kept (soanalysis to=functions.Xtags remain reachable intriggered_tags_sub).Alternative considered: Use an empty base tag with only function-call tags.
This caused xgrammar to crash with duplicate triggers because
tag_with_function_toolsadded
<|channel|>commentary to=functions.to an already-populated trigger list.Why we chose this: Surgically removes the problematic option from
triggered_tags_firstwithout changing the sub-dispatch behavior. No duplicatetriggers.
What reviewers might debate: After the first forced tool call, the sub-dispatch
loop allows all tokens (the
triggered_tags_subTagDispatchis fully permissive).This means the model can output text or a second analysis channel after the
first tool call. We accept this as correct behavior (you asked for at least one
tool call; subsequent output is free).
2.
tools_visibleflag fornone(prompt-level + grammar-level)What we chose: Two-layer enforcement for
tool_choice=none:<|channel|>analysisin the structural tag (no function-call channels)tools_visible=Falsestrips tool descriptions from system/developer messagesWhy grammar alone is insufficient:
triggered_tagsallows free text betweentriggers, so a function-call channel sequence is technically still generable as
"free text" even when not listed as a trigger. Prompt suppression prevents the
model from even reasoning about calling tools.
3. Structural tag embedding vs. system-prompt injection for
requiredWhat we chose: Grammar-level enforcement via xgrammar structural tags.
Alternative: Add an instruction to the system prompt ("you must call a tool").
Unreliable — the model reasons away prompts for semantically irrelevant tools.
Why we chose this: Token-level enforcement is deterministic; no prompt
injection needed.
Changes
vllm/reasoning/gptoss_reasoning_parser.pyfrom_function_tool_to_tag(),tag_with_function_tools(); extendprepare_structured_tag()withtool_choice+function_tools;at_least_one+ pure-analysis removal forrequired; exclude builtin tool tags for named functiontool_choicevllm/reasoning/abs_reasoning_parsers.pytool_choice,function_tools,final_content_formatparams to base classprepare_structured_tag()signaturevllm/entrypoints/openai/responses/serving.pyNotImplementedErrorfor non-autotool_choicein Harmony path; wiretool_choice+function_tools_for_parserthroughprepare_structured_tag()calls; addtools_visibleflag gating both system and developer messages;needs_dev_msglogic includesrequest.instructions; embed content constraints in structural tagvllm/entrypoints/openai/responses/protocol.pyresponse_format.type == "json_object"in structured output configtests/reasoning/test_gptoss_reasoning_parser.pytests/entrypoints/openai/responses/test_tool_choice_harmony.pyneeds_dev_msgwith instructions andtools_visiblesuppressing builtin descriptionstests/entrypoints/openai/responses/test_sampling_params.pystructured_outputsfield passthroughtests/entrypoints/openai/responses/test_response_formats.py_extract_response_format_schemawithstructured_outputsTest plan
prepare_structured_taggrammar fortool_choice=auto/required/none/namedat_least_one=Trueblocks pure analysis intriggered_tags_first; only 2 tokens allowed at start (verified with xgrammar GrammarMatcher + bitmask)tools_visible=Falseremoves tools from developer messagetools_visible=Falsesuppresses builtin tool descriptions (browser/python/container) in system messagerequest.instructionspreserved in developer message whentool_choice=nonehides toolstool_choiceexcludes builtin channels from grammartool_choice=autocalls tool for weather questiontool_choice=requiredcalls tool even for semantically-mismatched question (math + weather tool)tool_choice=noneproduces zerofunction_calloutput itemstool_choice=required+ multiple tools selects appropriate tooltool_choice=required+ streaming emits tool call events