Skip to content

[Responses API] tool_choice support (auto / required / none) for GPT-OSS#37433

Open
will-deines wants to merge 12 commits intovllm-project:mainfrom
will-deines:feat/responses-tool-choice-required
Open

[Responses API] tool_choice support (auto / required / none) for GPT-OSS#37433
will-deines wants to merge 12 commits intovllm-project:mainfrom
will-deines:feat/responses-tool-choice-required

Conversation

@will-deines
Copy link
Copy Markdown

@will-deines will-deines commented Mar 18, 2026

Depends on #35904, #35906, #35907 (open; will rebase once merged)

Summary

Implements tool_choice="auto", "required", "none", and named-function
forcing for the Responses API on GPT-OSS (Harmony-format) models, using the
triggered_tags structural tag grammar introduced by #35904.

Three enforcement mechanisms, one per mode:

  • auto — function-tool channel tags added to the structural tag; model
    decides whether to call a tool based on context.
  • requiredat_least_one=True set on the grammar + the pure analysis
    tag removed from triggered_tags_first, forcing the model's very first
    token sequence to be a function-call channel (<|channel|>commentary to=functions.X<|message|> or <|channel|>analysis to=functions.X<|message|>).
    After the first tag the sub-dispatch loop is permissive.
  • none — tool descriptions stripped from the system/developer prompt
    (tools_visible=False); only the analysis channel is in the structural tag,
    so no tool-call channel is ever triggered.

Named-function forcing (tool_choice={"type":"function","name":"X"}) is
handled by filtering effective_function_tools to only the named tool,
excluding builtin tool channels from the grammar, and applying required
enforcement.

Related Issues & PRs

# Title Status Relation
#35904 Structured output + reasoning via structural tag embedding Open Dependency — this PR extends prepare_structured_tag() with tool_choice and function_tools params
#35906 Sanitize leaked Harmony control tokens Open DependencyResilientStreamableParser needed for reliable parsing of forced tool calls
#35907 Fix analysis-channel tool calls Open Dependencyanalysis to=functions.X channel tags depend on this fix being live
#28148 Chat format GD for tool calling with gpt-oss Open Related — same TriggeredTagsFormat approach, broader scope (full chat format). This PR is narrower: tool_choice only
#36891 Kimi K2: guided decoding for tool_choice="auto" Open Validation — first tool parser to use TriggeredTagsFormat; this PR uses the same at_least_one mechanism for required
#33306 Add tool_choice=required support for GPT-OSS Harmony models Open Chat Completions counterpart — implements tool_choice=required for the Chat Completions API via EBNF grammar. This PR is the Responses API counterpart using the same structural tag approach
#35936 tool_choice="required" falls back to tool_parser for non-JSON formats Open Related — fixes required fallback for non-JSON tool call formats (XML, Qwen3). Different mechanism (tool_parser-based) but same user-facing feature
#37258 Fix crash when tool_choice=required exceeds max_output_tokens Merged Validation — recently merged edge case fix for tool_choice=required in Responses API

Decisions we made

1. at_least_one=True + remove pure-analysis tag for required

What we chose: Set at_least_one=True on the triggered_tags grammar AND
filter out the <|channel|>analysis<|message|> tag from the tags list before
setting at_least_one. The <|channel|>analysis trigger is kept (so
analysis to=functions.X tags remain reachable in triggered_tags_sub).

Alternative considered: Use an empty base tag with only function-call tags.
This caused xgrammar to crash with duplicate triggers because tag_with_function_tools
added <|channel|>commentary to=functions. to an already-populated trigger list.

Why we chose this: Surgically removes the problematic option from
triggered_tags_first without changing the sub-dispatch behavior. No duplicate
triggers.

What reviewers might debate: After the first forced tool call, the sub-dispatch
loop allows all tokens (the triggered_tags_sub TagDispatch is fully permissive).
This means the model can output text or a second analysis channel after the
first tool call. We accept this as correct behavior (you asked for at least one
tool call; subsequent output is free).

2. tools_visible flag for none (prompt-level + grammar-level)

What we chose: Two-layer enforcement for tool_choice=none:

  1. Grammar: only <|channel|>analysis in the structural tag (no function-call channels)
  2. Prompt: tools_visible=False strips tool descriptions from system/developer messages

Why grammar alone is insufficient: triggered_tags allows free text between
triggers, so a function-call channel sequence is technically still generable as
"free text" even when not listed as a trigger. Prompt suppression prevents the
model from even reasoning about calling tools.

3. Structural tag embedding vs. system-prompt injection for required

What we chose: Grammar-level enforcement via xgrammar structural tags.

Alternative: Add an instruction to the system prompt ("you must call a tool").
Unreliable — the model reasons away prompts for semantically irrelevant tools.

Why we chose this: Token-level enforcement is deterministic; no prompt
injection needed.

Changes

File Change
vllm/reasoning/gptoss_reasoning_parser.py Add from_function_tool_to_tag(), tag_with_function_tools(); extend prepare_structured_tag() with tool_choice + function_tools; at_least_one + pure-analysis removal for required; exclude builtin tool tags for named function tool_choice
vllm/reasoning/abs_reasoning_parsers.py Add tool_choice, function_tools, final_content_format params to base class prepare_structured_tag() signature
vllm/entrypoints/openai/responses/serving.py Remove NotImplementedError for non-auto tool_choice in Harmony path; wire tool_choice + function_tools_for_parser through prepare_structured_tag() calls; add tools_visible flag gating both system and developer messages; needs_dev_msg logic includes request.instructions; embed content constraints in structural tag
vllm/entrypoints/openai/responses/protocol.py Handle response_format.type == "json_object" in structured output config
tests/reasoning/test_gptoss_reasoning_parser.py New test cases: tool_choice grammar for each mode (auto/required/none/named), function tool tag generation, builtin exclusion for named/none tool_choice
tests/entrypoints/openai/responses/test_tool_choice_harmony.py New: unit tests for needs_dev_msg with instructions and tools_visible suppressing builtin descriptions
tests/entrypoints/openai/responses/test_sampling_params.py Tests for structured_outputs field passthrough
tests/entrypoints/openai/responses/test_response_formats.py Tests for _extract_response_format_schema with structured_outputs

Note: This PR absorbs the #35904 structured output changes. The diff against main also includes harmony_utils.py (inject_response_formats), protocol.py (json_object handling), and additional test files from #35904 — those are described in that PR's body.

Test plan

  • Unit: prepare_structured_tag grammar for tool_choice=auto/required/none/named
  • Unit: at_least_one=True blocks pure analysis in triggered_tags_first; only 2 tokens allowed at start (verified with xgrammar GrammarMatcher + bitmask)
  • Unit: tools_visible=False removes tools from developer message
  • Unit: tools_visible=False suppresses builtin tool descriptions (browser/python/container) in system message
  • Unit: request.instructions preserved in developer message when tool_choice=none hides tools
  • Unit: Named function tool_choice excludes builtin channels from grammar
  • E2e (gpt-oss-20b via Modal): tool_choice=auto calls tool for weather question
  • E2e: tool_choice=required calls tool even for semantically-mismatched question (math + weather tool)
  • E2e: tool_choice=none produces zero function_call output items
  • E2e: tool_choice=required + multiple tools selects appropriate tool
  • E2e: tool_choice=required + streaming emits tool call events
  • Integration test with local vLLM server (needs GPU)

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Mar 18, 2026
@will-deines will-deines force-pushed the feat/responses-tool-choice-required branch from fdb422e to 19f3680 Compare March 18, 2026 13:15
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances structured output and tool usage within the OpenAI API response handling. Key changes include adding a function to inject response format schemas into developer messages, updating sampling parameters to support json_object response formats, and introducing new helper functions to extract and convert structured output constraints. The GptOssReasoningParser has been significantly refactored to dynamically prepare structural tags based on tool_choice, function_tools, and final_content_format, ensuring proper integration of both builtin and function tools, and managing the final output constraints. A potential issue was identified where the tag_with_builtin_funcs function could cause unintended side effects by modifying a shared dictionary, and another where the create_responses function did not explicitly handle a None case for structured_outputs, which could lead to errors.

@@ -52,8 +53,6 @@ def from_builtin_tool_to_tag(tool: str) -> list[dict]:


def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The tag_with_builtin_funcs function modifies the no_func_reasoning_tag dictionary, which is defined outside the scope of this function. This could lead to unintended side effects if no_func_reasoning_tag is used elsewhere. It's better to create a deep copy within the function to avoid modifying the original dictionary.

def tag_with_builtin_funcs(no_func_reasoning_tag, builtin_tool_list: list[str]) -> dict:
    import copy

    new_tag = copy.deepcopy(no_func_reasoning_tag)
    new_tag["format"]["triggers"].append("<|channel|>commentary to=")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function already deep-copies the input on the very next line (new_tag = copy.deepcopy(no_func_reasoning_tag)) — the original dict is never mutated. The only change in this PR is moving import copy from inside the function to the module-level import block.

Comment on lines +546 to +559
reasoning_parser.prepare_structured_tag(
struct_out.structural_tag,
self.tool_server,
tool_choice=request.tool_choice,
function_tools=function_tools_for_parser,
)
),
)
else:
# Content constraint present (json, regex,
# grammar, choice, json_object). Embed it in the
# final channel tag within the structural tag.
content_fmt = _constraint_to_content_format(struct_out)
if content_fmt is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The code block checks if struct_out.all_non_structural_tag_constraints_none() is true. If it is, the structural_tag is replaced. However, if struct_out is None, the code proceeds to the else block without handling the case where struct_out is None. This could lead to unexpected behavior or errors later on. It's important to handle the None case explicitly.

Suggested change
reasoning_parser.prepare_structured_tag(
struct_out.structural_tag,
self.tool_server,
tool_choice=request.tool_choice,
function_tools=function_tools_for_parser,
)
),
)
else:
# Content constraint present (json, regex,
# grammar, choice, json_object). Embed it in the
# final channel tag within the structural tag.
content_fmt = _constraint_to_content_format(struct_out)
if content_fmt is not None:
if isinstance(struct_out, StructuredOutputsParams):
if struct_out.all_non_structural_tag_constraints_none():
# No content constraint — just apply reasoning
# channel tags + tool_choice + function tools
sampling_params.structured_outputs = replace(
struct_out,
structural_tag=(
reasoning_parser.prepare_structured_tag(
struct_out.structural_tag,
self.tool_server,
tool_choice=request.tool_choice,
function_tools=function_tools_for_parser,
)
),
)
else:

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The None case is handled — there's an elif struct_out is None: branch further down (line 580 in the current diff) that applies reasoning channel tags + tool_choice even when no structured output is requested. The review snippet cut off before reaching it.

…d tags

Extend prepare_structured_tag() to be the single authority for all
generation constraints in GPT-OSS Harmony models: channel structure,
tool enforcement, argument validation, and content constraints.

tool_choice=required support:
- New from_function_tool_to_tag() and tag_with_function_tools() helpers
- prepare_structured_tag() extended with tool_choice, function_tools params
- Channel blocking: omit <|channel|>final trigger to force tool calls
- Remove NotImplementedError for non-auto tool_choice in Harmony path

Absorbed from upstream PR vllm-project#35904 (structured output + reasoning):
- Content constraint embedding in <|channel|>final tag
- _constraint_to_content_format() and _extract_response_format_schema()
- struct_out is None branch (reasoning tags always applied)
- inject_response_formats() for Harmony cookbook compliance
- json_object format handling (was silently ignored)
- Streaming .model_dump() alias bug fix

Signed-off-by: Will Deines <will@garr.io>
…=none

When tool_choice=none, the structural tag grammar correctly blocks
tool-calling channels, but the system/developer messages still
described the tools to the model. The model would then attempt tool
calls that leaked through the output parser.

Introduce a tools_visible flag that is false when tool_choice=none,
suppressing both the commentary channel in the system message and
tool descriptions in the developer message.

Signed-off-by: Will Deines <will@garr.io>
… constraint

triggered_tags allows free text between triggers, so omitting <|channel|>final
from the tag list was not sufficient to prevent the model from using it.
xgrammar's at_least_one=True forces the grammar to begin with a triggered
channel immediately, blocking <|channel|>final and EOS at the token level.

With this flag set on tool_choice=required or a named tool, only 2 tokens
are allowed at generation start (the <|channel|> special token), and after
that only analysis/commentary continuations — not "final" — are valid.

Signed-off-by: Will Deines <will@garr.io>
…is from at_least_one scope

The previous approach (empty base with pre-set triggers) caused duplicate
triggers after tag_with_function_tools ran, crashing xgrammar with 500.

New approach: use the normal base tag, then filter out the pure analysis
tag (<|channel|>analysis<|message|>) for required/named tool_choice before
setting at_least_one=True. The analysis trigger is kept so
analysis-to-functions tags remain reachable in triggered_tags_sub.

Verified locally: grammar compiles, triggered_tags_first has only tool call
options, after <|channel|> only analysis/commentary continuations allowed
(11 tokens), final channel blocked.

Signed-off-by: Will Deines <will@garr.io>
- ruff: reformat long lines in tests and serving.py
- mypy: annotate base_tag as dict[str, Any] in gptoss_reasoning_parser
  to allow nested dict indexing through copy.deepcopy result
- mypy: suppress arg-type on list(name) in abs_reasoning_parsers where
  TypeIs narrowing is not respected under --python-version 3.10

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines force-pushed the feat/responses-tool-choice-required branch from 8b3f7c8 to a933921 Compare March 18, 2026 14:09
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecfb064f1a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1257 to +1260
response_format_schema = _extract_response_format_schema(request)
needs_dev_msg = tools_visible or response_format_schema is not None

if needs_dev_msg:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep developer instructions when tool_choice="none" hides tools

When tool_choice="none" is used with custom function tools, tools_visible becomes false here, so needs_dev_msg also becomes false unless a response schema is present. On GPT-OSS the default VLLM_GPT_OSS_HARMONY_SYSTEM_INSTRUCTIONS=0 means request.instructions are only serialized by get_developer_message(), so these requests now drop the user's instructions entirely and run as if no instructions were provided.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c8b47. needs_dev_msg now includes or request.instructions is not None, so instructions flow through get_developer_message() even when tools_visible=False.

Comment on lines +1247 to +1250
tools_visible = with_custom_tools and request.tool_choice != "none"

sys_msg = self._construct_harmony_system_input_message(
request, with_custom_tools, tool_types
request, tools_visible, tool_types
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Actually suppress builtin tool prompts for tool_choice="none"

This new tools_visible flag does not stop _construct_harmony_system_input_message() from populating browser/python/container descriptions, and get_system_message() does not use with_custom_tools to filter them out. As a result, valid tool_choice="none" requests that include builtin tools still advertise those tools in the system prompt, which defeats the prompt-level half of the new none enforcement for GPT-OSS.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c8b47. Renamed the parameter to tools_visible and gated all three builtin descriptions (browser_description, python_description, container_description) on it. When tools_visible=False, all descriptions are None and get_system_message() never calls with_tools().

Comment on lines +209 to +213
if tool_choice != "none" and tool_server is not None:
builtin_tool_list: list[str] = []
if tool_server.has_tool("browser"):
builtin_tool_list.append("browser")
if tool_server.has_tool("python"):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude builtin channels when forcing a named function

A named tool_choice is supposed to leave exactly one callable tool, but builtin tool tags are still added whenever the server exposes them because only tool_choice == "none" skips this block. In any deployment with browser/python/container enabled, tool_choice={"type":"function",...} can satisfy the new at_least_one=True grammar by emitting a builtin call instead of the named function, so the advertised named-function forcing is not deterministic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c8b47. Builtin tool tags are now skipped when isinstance(tool_choice, dict), so named function forcing only includes the named function's channels. Added test_named_tool_choice_excludes_builtins to verify.

1. Preserve developer instructions when tool_choice="none": add
   `request.instructions is not None` to `needs_dev_msg` condition so
   instructions flow through even when tools are hidden.

2. Suppress builtin tool descriptions for tool_choice="none": rename
   `with_custom_tools` param to `tools_visible` in
   `_construct_harmony_system_input_message` and gate browser/python/
   container descriptions on it.

3. Exclude builtin channels for named function tool_choice: skip builtin
   tool tags when `isinstance(tool_choice, dict)` so they cannot satisfy
   the `at_least_one` grammar constraint instead of the named function.

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines force-pushed the feat/responses-tool-choice-required branch from 34c8b47 to d23f78f Compare March 18, 2026 18:31
… fails

When tool_choice="required" or a named tool is specified, the grammar
layer (at_least_one + final channel removal) is the primary enforcement.
But if the parser discards all tokens or the model produces text instead
of a tool call, the response silently returns status="completed" with no
function call — violating the contract.

Add _check_tool_choice_violation() as a post-generation safety net that
detects when no ResponseFunctionToolCall is present in output and forces
status="incomplete" with a diagnostic warning log.

Signed-off-by: Will Deines <will@garr.io>
…-choice-required

Signed-off-by: Will Deines <will@garr.io>
@mergify mergify bot added the tool-calling label Mar 20, 2026
Inject structural tags into sampling_params for the Chat Completions
path, mirroring the existing Responses API implementation. Without this,
tool_choice="auto" fails for Harmony models because the model stops
after analysis reasoning and never emits the actual tool call.

Also suppress tools in the prompt when tool_choice="none" for Harmony
models, matching the Responses API behavior.

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines requested a review from njhill as a code owner March 20, 2026 17:20
tool_choice="required" now works (added in feat/responses-tool-choice-required),
so the test should expect a successful function call instead of InternalServerError.

Signed-off-by: Will Deines <will@garrio.com>

Signed-off-by: Will Deines <will@garr.io>
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 23, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status
Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants