Skip to content

Conversation

@qandrew
Copy link
Contributor

@qandrew qandrew commented Dec 1, 2025

Purpose

This PR is part 2 for the ResponsesParser, which provides the tool parser for responsesParser and the ability to run a MCP python tool.

Not in this PR

Test Plan

Added unit tests, and tested the following manually:

Minimax M2

VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS=web_search_preview,container,code_interpreter VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT=1 vllm serve MiniMaxAI/MiniMax-M2   --tensor-parallel-size 4   --tool-call-parser minimax_m2   --reasoning-parser minimax_m2    --enable-auto-tool-choice --trust-remote-code  --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python
curl -X POST "http://localhost:8000/v1/responses"   -H "Content-Type: application/json"   -H "Authorization: Bearer dummy-api-key"   -d '{
        "model": "MiniMaxAI/MiniMax-M2",
        "input": "Multiply 64548*15151 using the python tool.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "code_interpreter",
            "headers": {"test": "test"},
            "server_url": "IGNORED"
          }
        ]
      }'

Kimi K2

VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS=web_search_preview,container,code_interpreter VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT=1 vllm serve moonshotai/Kimi-K2-Thinking   --trust-remote-code   --tensor-parallel-size 8   --enable-auto-tool-choice   --max-num-batched-tokens 32768   --tool-call-parser kimi_k2   --reasoning-parser kimi_k2  --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python
curl -X POST "http://localhost:8000/v1/responses"   -H "Content-Type: application/json"   -H "Authorization: Bearer dummy-api-key"   -d '{
        "model": "moonshotai/Kimi-K2-Thinking",
        "input": "Multiply 64548*15151 using the python tool.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "code_interpreter",
            "headers": {"test": "test"},
            "server_url": "IGNORED"
          }
        ]
      }'
{
    "id": "resp_a42bc867864795cd",
    "created_at": 1764137463,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "moonshotai/Kimi-K2-Thinking",
    "object": "response",
    "output": [
        {
            "id": "rs_a59c0ff3d139f3ad",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": " The user wants me to multiply two numbers: 64548 and 15151. I should use the Python tool to compute this accurately.\n\nLet me set up the calculation. I'll use the arithmetic multiplication operator (*) in Python. ",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "lol",
            "arguments": "{\"code\": \"result = 64548 * 15151\\nresult\", \"restart\": false}",
            "name": "code_interpreter",
            "server_label": "code_interpreter",
            "type": "mcp_call",
            "approval_request_id": null,
            "error": null,
            "output": "977966748\n",
            "status": "completed"
        },
        {
            "id": "rs_818e3eeeb7e9efa7",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": " The result of multiplying 64548 by 15151 is **977,966,748**. ",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "msg_bf62d1a50301381c",
            "content": [
                {
                    "annotations": [],
                    "text": " The result of multiplying 64548 by 15151 is **977,966,748**.",
                    "type": "output_text",
                    "logprobs": null
                }
            ],
            "role": "assistant",
            "status": "completed",
            "type": "message"
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "auto",
    "tools": [
        {
            "server_label": "code_interpreter",
            "type": "mcp",
            "allowed_tools": null,
            "authorization": null,
            "connector_id": null,
            "headers": {
                "test": "test"
            },
            "require_approval": null,
            "server_description": null,
            "server_url": "IGNORED"
        }
    ],
    "top_p": 1.0,
    "background": false,
    "max_output_tokens": 261990,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "reasoning": null,
    "service_tier": "auto",
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": "disabled",
    "usage": {
        "input_tokens": 154,
        "input_tokens_details": {
            "cached_tokens": 64,
            "input_tokens_per_turn": [],
            "cached_tokens_per_turn": []
        },
        "output_tokens": 121,
        "output_tokens_details": {
            "reasoning_tokens": 0,
            "tool_output_tokens": 0,
            "output_tokens_per_turn": [],
            "tool_output_tokens_per_turn": []
        },
        "total_tokens": 275
    },
    "user": null,
    "input_messages": null,
    "output_messages": null
}

Andrew Xia added 2 commits December 2, 2025 19:18
This reverts commit 38558b1.

un-revert some changes

Signed-off-by: Andrew Xia <[email protected]>

fixes and found some more bugs

Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
@mergify
Copy link

mergify bot commented Dec 3, 2025

Documentation preview: https://vllm--29798.org.readthedocs.build/en/29798/

@mergify mergify bot added the documentation Improvements or additions to documentation label Dec 3, 2025
@qandrew qandrew changed the title [responsesAPI][5] ResponsesParser with tools for full MCP loop [responsesAPI][5] ResponsesParser with tools for full MCP python loop Dec 3, 2025
@qandrew qandrew marked this pull request as ready for review December 3, 2025 06:54
Signed-off-by: Andrew Xia <[email protected]>
@qandrew
Copy link
Contributor Author

qandrew commented Dec 3, 2025

cc @chaunceyjiang @yeqcharlotte ready for review (:

@chatgpt-codex-connector
Copy link

💡 Codex Review

import fbvscode
fbvscode.set_trace()

P0 Badge Drop debugger import that halts responses API

create_responses now immediately imports fbvscode and calls set_trace() before any validation. fbvscode is not a declared dependency, so every responses request will either raise ModuleNotFoundError or break into a debugger, preventing the endpoint from serving responses at all. This is a hard blocker for the responses API. (vllm/entrypoints/openai/serving_responses.py:312-314)


message = ResponseFunctionToolCallOutputItem(
id=f"fco_{random_uuid()}",
type="function_call_output",
call_id=f"call_{random_uuid()}",
output=result_str,

P1 Badge Preserve tool call id when emitting python tool output

call_python_tool creates the ResponseFunctionToolCallOutputItem with a new random call_id instead of reusing the call_id from the preceding function_call. When the next turn is rendered, construct_input_messages uses the output item’s call_id as the tool_call_id (see vllm/entrypoints/responses_utils.py:141-146), so the tool result is associated with an id that does not match the assistant’s tool call. This breaks multi‑turn python/mcp tool conversations under ParsableContext because the model cannot link tool output to the original call. (vllm/entrypoints/context.py:281-285)

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Andrew Xia <[email protected]>
@heheda12345
Copy link
Collaborator

CC @yeqcharlotte

reasoning_parser_cls: Callable[[AnyTokenizer], ReasoningParser],
response_messages: list[ResponseInputOutputItem],
request: ResponsesRequest,
tool_parser_cls,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type?

This function converts parsable context types to harmony and
back so we can use GPTOSS demo python tool
"""
from vllm.entrypoints.context import ParsableContext
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not import it at the top?

Copy link
Contributor Author

@qandrew qandrew Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was similar in for HarmonyContext. I think if we move to the top we get a circular import

Signed-off-by: Andrew Xia <[email protected]>
Copy link
Collaborator

@chaunceyjiang chaunceyjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks~

@github-project-automation github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Dec 5, 2025
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025
Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all the oai entrypoint logic we add can we introduce some unit tests? also how adapter is it for anthropic apis?

VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT="1",
# uncomment for tool calling
# PYTHON_EXECUTION_BACKEND="dangerously_use_uv",
PYTHON_EXECUTION_BACKEND="dangerously_use_uv",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh why was this commented before? did it have issues with ci?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left it there in a previous PR because we didn't have tool calling yet, so it wasn't necessary yet. There weren't any CI issues

Comment on lines 278 to 288
def need_builtin_tool_call(self) -> bool:
"""Return true if the last message is a MCP tool call"""
last_message = self.parser.response_messages[-1]
# TODO: figure out which tools are MCP tools
if ( # noqa: SIM103
last_message.type == "function_call"
and last_message.name in ("code_interpreter", "python")
):
return True

return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this format is quite bad lol. let's directly check the condition. also should we hardcode "code_interpreter", "python" here? i remember @alecsolder made the changes to centralize all tools to go through mcp tool type.

if xxxx:
    return True
return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking to clean up the code in #29989, which will include browser & container tool if that's okay? This PR is just to complete the ability to call only the python tool lol

@qandrew
Copy link
Contributor Author

qandrew commented Dec 5, 2025

for all the oai entrypoint logic we add can we introduce some unit tests? also how adapter is it for anthropic apis?

I added some unit tests in this PR in tests/entrypoints/openai/test_response_api_parsable_context.py :)
Right now, going from ResponsesAPI <-> ChatCompletions is pretty easy with https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/responses_utils.py#L44. I think it should be pretty adaptable; one way to do it is to have a ResponsesAPI <-> MessagesAPI converter, or we could write a MessagesParser similar to what we have in ResponsesParser.

@zou3519 zou3519 merged commit da7bc54 into vllm-project:main Dec 5, 2025
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants