[Responses API] tool_choice support (auto / required / none) for GPT-OSS by will-deines · Pull Request #37433 · vllm-project/vllm

will-deines · 2026-03-18T13:13:25Z

Depends on #35904, #35906, #35907 (open; will rebase once merged)

Summary

Implements tool_choice="auto", "required", "none", and named-function
forcing for the Responses API on GPT-OSS (Harmony-format) models, using the
triggered_tags structural tag grammar introduced by #35904.

Three enforcement mechanisms, one per mode:

auto — function-tool channel tags added to the structural tag; model
decides whether to call a tool based on context.
required — at_least_one=True set on the grammar + the pure analysis
tag removed from triggered_tags_first, forcing the model's very first
token sequence to be a function-call channel (<|channel|>commentary to=functions.X<|message|> or <|channel|>analysis to=functions.X<|message|>).
After the first tag the sub-dispatch loop is permissive.
none — tool descriptions stripped from the system/developer prompt
(tools_visible=False); only the analysis channel is in the structural tag,
so no tool-call channel is ever triggered.

Named-function forcing (tool_choice={"type":"function","name":"X"}) is
handled by filtering effective_function_tools to only the named tool,
excluding builtin tool channels from the grammar, and applying required
enforcement.

Related Issues & PRs

#	Title	Status	Relation
#35904	Structured output + reasoning via structural tag embedding	Open	Dependency — this PR extends `prepare_structured_tag()` with `tool_choice` and `function_tools` params
#35906	Sanitize leaked Harmony control tokens	Open	Dependency — `ResilientStreamableParser` needed for reliable parsing of forced tool calls
#35907	Fix analysis-channel tool calls	Open	Dependency — `analysis to=functions.X` channel tags depend on this fix being live
#28148	Chat format GD for tool calling with gpt-oss	Open	Related — same `TriggeredTagsFormat` approach, broader scope (full chat format). This PR is narrower: tool_choice only
#36891	Kimi K2: guided decoding for `tool_choice="auto"`	Open	Validation — first tool parser to use `TriggeredTagsFormat`; this PR uses the same `at_least_one` mechanism for `required`
#33306	Add tool_choice=required support for GPT-OSS Harmony models	Open	Chat Completions counterpart — implements tool_choice=required for the Chat Completions API via EBNF grammar. This PR is the Responses API counterpart using the same structural tag approach
#35936	tool_choice="required" falls back to tool_parser for non-JSON formats	Open	Related — fixes required fallback for non-JSON tool call formats (XML, Qwen3). Different mechanism (tool_parser-based) but same user-facing feature
#37258	Fix crash when tool_choice=required exceeds max_output_tokens	Merged	Validation — recently merged edge case fix for tool_choice=required in Responses API

Decisions we made

1. `at_least_one=True` + remove pure-analysis tag for `required`

What we chose: Set at_least_one=True on the triggered_tags grammar AND
filter out the <|channel|>analysis<|message|> tag from the tags list before
setting at_least_one. The <|channel|>analysis trigger is kept (so
analysis to=functions.X tags remain reachable in triggered_tags_sub).

Alternative considered: Use an empty base tag with only function-call tags.
This caused xgrammar to crash with duplicate triggers because tag_with_function_tools
added <|channel|>commentary to=functions. to an already-populated trigger list.

Why we chose this: Surgically removes the problematic option from
triggered_tags_first without changing the sub-dispatch behavior. No duplicate
triggers.

What reviewers might debate: After the first forced tool call, the sub-dispatch
loop allows all tokens (the triggered_tags_sub TagDispatch is fully permissive).
This means the model can output text or a second analysis channel after the
first tool call. We accept this as correct behavior (you asked for at least one
tool call; subsequent output is free).

2. `tools_visible` flag for `none` (prompt-level + grammar-level)

What we chose: Two-layer enforcement for tool_choice=none:

Grammar: only <|channel|>analysis in the structural tag (no function-call channels)
Prompt: tools_visible=False strips tool descriptions from system/developer messages

Why grammar alone is insufficient: triggered_tags allows free text between
triggers, so a function-call channel sequence is technically still generable as
"free text" even when not listed as a trigger. Prompt suppression prevents the
model from even reasoning about calling tools.

3. Structural tag embedding vs. system-prompt injection for `required`

What we chose: Grammar-level enforcement via xgrammar structural tags.

Alternative: Add an instruction to the system prompt ("you must call a tool").
Unreliable — the model reasons away prompts for semantically irrelevant tools.

Why we chose this: Token-level enforcement is deterministic; no prompt
injection needed.

Changes

File	Change
`vllm/reasoning/gptoss_reasoning_parser.py`	Add `from_function_tool_to_tag()`, `tag_with_function_tools()`; extend `prepare_structured_tag()` with `tool_choice` + `function_tools`; `at_least_one` + pure-analysis removal for `required`; exclude builtin tool tags for named function `tool_choice`
`vllm/reasoning/abs_reasoning_parsers.py`	Add `tool_choice`, `function_tools`, `final_content_format` params to base class `prepare_structured_tag()` signature
`vllm/entrypoints/openai/responses/serving.py`	Remove `NotImplementedError` for non-auto `tool_choice` in Harmony path; wire `tool_choice` + `function_tools_for_parser` through `prepare_structured_tag()` calls; add `tools_visible` flag gating both system and developer messages; `needs_dev_msg` logic includes `request.instructions`; embed content constraints in structural tag
`vllm/entrypoints/openai/responses/protocol.py`	Handle `response_format.type == "json_object"` in structured output config
`tests/reasoning/test_gptoss_reasoning_parser.py`	New test cases: tool_choice grammar for each mode (auto/required/none/named), function tool tag generation, builtin exclusion for named/none tool_choice
`tests/entrypoints/openai/responses/test_tool_choice_harmony.py`	New: unit tests for `needs_dev_msg` with instructions and `tools_visible` suppressing builtin descriptions
`tests/entrypoints/openai/responses/test_sampling_params.py`	Tests for `structured_outputs` field passthrough
`tests/entrypoints/openai/responses/test_response_formats.py`	Tests for `_extract_response_format_schema` with `structured_outputs`

Note: This PR absorbs the #35904 structured output changes. The diff against main also includes harmony_utils.py (inject_response_formats), protocol.py (json_object handling), and additional test files from #35904 — those are described in that PR's body.

Test plan

gemini-code-assist

Code Review

This pull request enhances structured output and tool usage within the OpenAI API response handling. Key changes include adding a function to inject response format schemas into developer messages, updating sampling parameters to support json_object response formats, and introducing new helper functions to extract and convert structured output constraints. The GptOssReasoningParser has been significantly refactored to dynamically prepare structural tags based on tool_choice, function_tools, and final_content_format, ensuring proper integration of both builtin and function tools, and managing the final output constraints. A potential issue was identified where the tag_with_builtin_funcs function could cause unintended side effects by modifying a shared dictionary, and another where the create_responses function did not explicitly handle a None case for structured_outputs, which could lead to errors.

gemini-code-assist · 2026-03-18T13:26:03Z

vllm/reasoning/gptoss_reasoning_parser.py

@@ -52,8 +53,6 @@ def from_builtin_tool_to_tag(tool: str) -> list[dict]:


 def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict:


The tag_with_builtin_funcs function modifies the no_func_reasoning_tag dictionary, which is defined outside the scope of this function. This could lead to unintended side effects if no_func_reasoning_tag is used elsewhere. It's better to create a deep copy within the function to avoid modifying the original dictionary.

def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict: import copy new_tag = copy.deepcopy(no_func_reasoning_tag) new_tag["format"]["triggers"].append("<|channel|>commentary to=")

The function already deep-copies the input on the very next line (new_tag = copy.deepcopy(no_func_reasoning_tag)) — the original dict is never mutated. The only change in this PR is moving import copy from inside the function to the module-level import block.

gemini-code-assist · 2026-03-18T13:26:03Z

vllm/entrypoints/openai/responses/serving.py

+                                reasoning_parser.prepare_structured_tag(
+                                    struct_out.structural_tag,
+                                    self.tool_server,
+                                    tool_choice=request.tool_choice,
+                                    function_tools=function_tools_for_parser,
+                                )
+                            ),
+                        )
+                    else:
+                        # Content constraint present (json, regex,
+                        # grammar, choice, json_object). Embed it in the
+                        # final channel tag within the structural tag.
+                        content_fmt = _constraint_to_content_format(struct_out)
+                        if content_fmt is not None:


The code block checks if struct_out.all_non_structural_tag_constraints_none() is true. If it is, the structural_tag is replaced. However, if struct_out is None, the code proceeds to the else block without handling the case where struct_out is None. This could lead to unexpected behavior or errors later on. It's important to handle the None case explicitly.

Suggested change

reasoning_parser.prepare_structured_tag(

struct_out.structural_tag,

self.tool_server,

tool_choice=request.tool_choice,

function_tools=function_tools_for_parser,

)

),

)

else:

# Content constraint present (json, regex,

# grammar, choice, json_object). Embed it in the

# final channel tag within the structural tag.

content_fmt = _constraint_to_content_format(struct_out)

if content_fmt is not None:

if isinstance(struct_out, StructuredOutputsParams):

if struct_out.all_non_structural_tag_constraints_none():

# No content constraint — just apply reasoning

# channel tags + tool_choice + function tools

sampling_params.structured_outputs = replace(

struct_out,

structural_tag=(

reasoning_parser.prepare_structured_tag(

struct_out.structural_tag,

self.tool_server,

tool_choice=request.tool_choice,

function_tools=function_tools_for_parser,

)

),

)

else:

The None case is handled — there's an elif struct_out is None: branch further down (line 580 in the current diff) that applies reasoning channel tags + tool_choice even when no structured output is requested. The review snippet cut off before reaching it.

…d tags Extend prepare_structured_tag() to be the single authority for all generation constraints in GPT-OSS Harmony models: channel structure, tool enforcement, argument validation, and content constraints. tool_choice=required support: - New from_function_tool_to_tag() and tag_with_function_tools() helpers - prepare_structured_tag() extended with tool_choice, function_tools params - Channel blocking: omit <|channel|>final trigger to force tool calls - Remove NotImplementedError for non-auto tool_choice in Harmony path Absorbed from upstream PR vllm-project#35904 (structured output + reasoning): - Content constraint embedding in <|channel|>final tag - _constraint_to_content_format() and _extract_response_format_schema() - struct_out is None branch (reasoning tags always applied) - inject_response_formats() for Harmony cookbook compliance - json_object format handling (was silently ignored) - Streaming .model_dump() alias bug fix Signed-off-by: Will Deines <will@garr.io>

…=none When tool_choice=none, the structural tag grammar correctly blocks tool-calling channels, but the system/developer messages still described the tools to the model. The model would then attempt tool calls that leaked through the output parser. Introduce a tools_visible flag that is false when tool_choice=none, suppressing both the commentary channel in the system message and tool descriptions in the developer message. Signed-off-by: Will Deines <will@garr.io>

… constraint triggered_tags allows free text between triggers, so omitting <|channel|>final from the tag list was not sufficient to prevent the model from using it. xgrammar's at_least_one=True forces the grammar to begin with a triggered channel immediately, blocking <|channel|>final and EOS at the token level. With this flag set on tool_choice=required or a named tool, only 2 tokens are allowed at generation start (the <|channel|> special token), and after that only analysis/commentary continuations — not "final" — are valid. Signed-off-by: Will Deines <will@garr.io>

…is from at_least_one scope The previous approach (empty base with pre-set triggers) caused duplicate triggers after tag_with_function_tools ran, crashing xgrammar with 500. New approach: use the normal base tag, then filter out the pure analysis tag (<|channel|>analysis<|message|>) for required/named tool_choice before setting at_least_one=True. The analysis trigger is kept so analysis-to-functions tags remain reachable in triggered_tags_sub. Verified locally: grammar compiles, triggered_tags_first has only tool call options, after <|channel|> only analysis/commentary continuations allowed (11 tokens), final channel blocked. Signed-off-by: Will Deines <will@garr.io>

- ruff: reformat long lines in tests and serving.py - mypy: annotate base_tag as dict[str, Any] in gptoss_reasoning_parser to allow nested dict indexing through copy.deepcopy result - mypy: suppress arg-type on list(name) in abs_reasoning_parsers where TypeIs narrowing is not respected under --python-version 3.10 Signed-off-by: Will Deines <will@garr.io>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecfb064f1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-18T17:13:29Z

vllm/entrypoints/openai/responses/serving.py

+            response_format_schema = _extract_response_format_schema(request)
+            needs_dev_msg = tools_visible or response_format_schema is not None
+
+            if needs_dev_msg:


Keep developer instructions when tool_choice="none" hides tools

When tool_choice="none" is used with custom function tools, tools_visible becomes false here, so needs_dev_msg also becomes false unless a response schema is present. On GPT-OSS the default VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS=0 means request.instructions are only serialized by get_developer_message(), so these requests now drop the user's instructions entirely and run as if no instructions were provided.

Useful? React with 👍 / 👎.

Fixed in 34c8b47. needs_dev_msg now includes or request.instructions is not None, so instructions flow through get_developer_message() even when tools_visible=False.

chatgpt-codex-connector · 2026-03-18T17:13:29Z

vllm/entrypoints/openai/responses/serving.py

+            tools_visible = with_custom_tools and request.tool_choice != "none"
+
            sys_msg = self._construct_harmony_system_input_message(
-                request, with_custom_tools, tool_types
+                request, tools_visible, tool_types


Actually suppress builtin tool prompts for tool_choice="none"

This new tools_visible flag does not stop _construct_harmony_system_input_message() from populating browser/python/container descriptions, and get_system_message() does not use with_custom_tools to filter them out. As a result, valid tool_choice="none" requests that include builtin tools still advertise those tools in the system prompt, which defeats the prompt-level half of the new none enforcement for GPT-OSS.

Useful? React with 👍 / 👎.

Fixed in 34c8b47. Renamed the parameter to tools_visible and gated all three builtin descriptions (browser_description, python_description, container_description) on it. When tools_visible=False, all descriptions are None and get_system_message() never calls with_tools().

chatgpt-codex-connector · 2026-03-18T17:13:29Z

vllm/reasoning/gptoss_reasoning_parser.py

+        if tool_choice != "none" and tool_server is not None:
+            builtin_tool_list: list[str] = []
+            if tool_server.has_tool("browser"):
+                builtin_tool_list.append("browser")
+            if tool_server.has_tool("python"):


Exclude builtin channels when forcing a named function

A named tool_choice is supposed to leave exactly one callable tool, but builtin tool tags are still added whenever the server exposes them because only tool_choice == "none" skips this block. In any deployment with browser/python/container enabled, tool_choice={"type":"function",...} can satisfy the new at_least_one=True grammar by emitting a builtin call instead of the named function, so the advertised named-function forcing is not deterministic.

Useful? React with 👍 / 👎.

Fixed in 34c8b47. Builtin tool tags are now skipped when isinstance(tool_choice, dict), so named function forcing only includes the named function's channels. Added test_named_tool_choice_excludes_builtins to verify.

1. Preserve developer instructions when tool_choice="none": add `request.instructions is not None` to `needs_dev_msg` condition so instructions flow through even when tools are hidden. 2. Suppress builtin tool descriptions for tool_choice="none": rename `with_custom_tools` param to `tools_visible` in `_construct_harmony_system_input_message` and gate browser/python/ container descriptions on it. 3. Exclude builtin channels for named function tool_choice: skip builtin tool tags when `isinstance(tool_choice, dict)` so they cannot satisfy the `at_least_one` grammar constraint instead of the named function. Signed-off-by: Will Deines <will@garr.io>

… fails When tool_choice="required" or a named tool is specified, the grammar layer (at_least_one + final channel removal) is the primary enforcement. But if the parser discards all tokens or the model produces text instead of a tool call, the response silently returns status="completed" with no function call — violating the contract. Add _check_tool_choice_violation() as a post-generation safety net that detects when no ResponseFunctionToolCall is present in output and forces status="incomplete" with a diagnostic warning log. Signed-off-by: Will Deines <will@garr.io>

…-choice-required Signed-off-by: Will Deines <will@garr.io>

Signed-off-by: Will Deines <will@garr.io>

Inject structural tags into sampling_params for the Chat Completions path, mirroring the existing Responses API implementation. Without this, tool_choice="auto" fails for Harmony models because the model stops after analysis reasoning and never emits the actual tool call. Also suppress tools in the prompt when tool_choice="none" for Harmony models, matching the Responses API behavior. Signed-off-by: Will Deines <will@garr.io>

tool_choice="required" now works (added in feat/responses-tool-choice-required), so the test should expect a successful function call instead of InternalServerError. Signed-off-by: Will Deines <will@garrio.com> Signed-off-by: Will Deines <will@garr.io>

mergify · 2026-03-23T11:56:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Mar 18, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 18, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 18, 2026

will-deines force-pushed the feat/responses-tool-choice-required branch from fdb422e to 19f3680 Compare March 18, 2026 13:15

gemini-code-assist bot reviewed Mar 18, 2026

View reviewed changes

garrio-1 added 5 commits March 18, 2026 10:08

will-deines force-pushed the feat/responses-tool-choice-required branch from 8b3f7c8 to a933921 Compare March 18, 2026 14:09

Merge branch 'main' into feat/responses-tool-choice-required

ecfb064

will-deines marked this pull request as ready for review March 18, 2026 17:07

will-deines requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners March 18, 2026 17:07

chatgpt-codex-connector bot reviewed Mar 18, 2026

View reviewed changes

will-deines force-pushed the feat/responses-tool-choice-required branch from 34c8b47 to d23f78f Compare March 18, 2026 18:31

garrio-1 added 3 commits March 19, 2026 16:40

Merge remote-tracking branch 'upstream/main' into feat/responses-tool…

ea10b93

…-choice-required Signed-off-by: Will Deines <will@garr.io>

fix: resolve mypy error for num_output_tokens on ConversationContext ABC

19e4867

Signed-off-by: Will Deines <will@garr.io>

mergify bot added the tool-calling label Mar 20, 2026

github-project-automation bot added this to Tool Calling Mar 20, 2026

will-deines requested a review from njhill as a code owner March 20, 2026 17:20

mergify bot added the needs-rebase label Mar 23, 2026

		@@ -52,8 +53,6 @@ def from_builtin_tool_to_tag(tool: str) -> list[dict]:


		def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict:

Uh oh!

Conversation

will-deines commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues & PRs

Decisions we made

1. at_least_one=True + remove pure-analysis tag for required

2. tools_visible flag for none (prompt-level + grammar-level)

3. Structural tag embedding vs. system-prompt injection for required

Changes

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

will-deines Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

will-deines commented Mar 18, 2026 •

edited

Loading

1. `at_least_one=True` + remove pure-analysis tag for `required`

2. `tools_visible` flag for `none` (prompt-level + grammar-level)

3. Structural tag embedding vs. system-prompt injection for `required`