[responsesAPI][5] ResponsesParser with tools for full MCP python loop #29798

qandrew · 2025-12-01T17:55:08Z

Purpose

This PR is part 2 for the ResponsesParser, which provides the tool parser for responsesParser and the ability to run a MCP python tool.

Not in this PR

other types of MCP tools like browser: [responsesAPI][7] Browser, Container MCP tools for non harmony models #29989
logging for tokens
[responsesAPI][6] input/output messages for ResponsesParser qandrew/vllm#12 supporting raw input / output messages
there is a shortcoming in this PR where the 2nd turn onwards doesn't properly insert the tokens back in to the engine with the same prefix, due to responses / chat messages mapping to the template. I'll follow up with a PR to fix this.

Test Plan

Added unit tests, and tested the following manually:

Minimax M2

VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS=web_search_preview,container,code_interpreter VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT=1 vllm serve MiniMaxAI/MiniMax-M2   --tensor-parallel-size 4   --tool-call-parser minimax_m2   --reasoning-parser minimax_m2    --enable-auto-tool-choice --trust-remote-code  --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python

curl -X POST "http://localhost:8000/v1/responses"   -H "Content-Type: application/json"   -H "Authorization: Bearer dummy-api-key"   -d '{
        "model": "MiniMaxAI/MiniMax-M2",
        "input": "Multiply 64548*15151 using the python tool.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "code_interpreter",
            "headers": {"test": "test"},
            "server_url": "IGNORED"
          }
        ]
      }'

Kimi K2

VLLM_GPT_OSS_SYSTEM_TOOL_MCP_LABELS=web_search_preview,container,code_interpreter VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT=1 vllm serve moonshotai/Kimi-K2-Thinking   --trust-remote-code   --tensor-parallel-size 8   --enable-auto-tool-choice   --max-num-batched-tokens 32768   --tool-call-parser kimi_k2   --reasoning-parser kimi_k2  --tool-server=localhost:8081/container,localhost:8081/browser,localhost:8081/python

curl -X POST "http://localhost:8000/v1/responses"   -H "Content-Type: application/json"   -H "Authorization: Bearer dummy-api-key"   -d '{
        "model": "moonshotai/Kimi-K2-Thinking",
        "input": "Multiply 64548*15151 using the python tool.",
        "tools": [
          {
            "type": "mcp",
            "server_label": "code_interpreter",
            "headers": {"test": "test"},
            "server_url": "IGNORED"
          }
        ]
      }'

{
    "id": "resp_a42bc867864795cd",
    "created_at": 1764137463,
    "incomplete_details": null,
    "instructions": null,
    "metadata": null,
    "model": "moonshotai/Kimi-K2-Thinking",
    "object": "response",
    "output": [
        {
            "id": "rs_a59c0ff3d139f3ad",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": " The user wants me to multiply two numbers: 64548 and 15151. I should use the Python tool to compute this accurately.\n\nLet me set up the calculation. I'll use the arithmetic multiplication operator (*) in Python. ",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "lol",
            "arguments": "{\"code\": \"result = 64548 * 15151\\nresult\", \"restart\": false}",
            "name": "code_interpreter",
            "server_label": "code_interpreter",
            "type": "mcp_call",
            "approval_request_id": null,
            "error": null,
            "output": "977966748\n",
            "status": "completed"
        },
        {
            "id": "rs_818e3eeeb7e9efa7",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": " The result of multiplying 64548 by 15151 is **977,966,748**. ",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        },
        {
            "id": "msg_bf62d1a50301381c",
            "content": [
                {
                    "annotations": [],
                    "text": " The result of multiplying 64548 by 15151 is **977,966,748**.",
                    "type": "output_text",
                    "logprobs": null
                }
            ],
            "role": "assistant",
            "status": "completed",
            "type": "message"
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "auto",
    "tools": [
        {
            "server_label": "code_interpreter",
            "type": "mcp",
            "allowed_tools": null,
            "authorization": null,
            "connector_id": null,
            "headers": {
                "test": "test"
            },
            "require_approval": null,
            "server_description": null,
            "server_url": "IGNORED"
        }
    ],
    "top_p": 1.0,
    "background": false,
    "max_output_tokens": 261990,
    "max_tool_calls": null,
    "previous_response_id": null,
    "prompt": null,
    "reasoning": null,
    "service_tier": "auto",
    "status": "completed",
    "text": null,
    "top_logprobs": null,
    "truncation": "disabled",
    "usage": {
        "input_tokens": 154,
        "input_tokens_details": {
            "cached_tokens": 64,
            "input_tokens_per_turn": [],
            "cached_tokens_per_turn": []
        },
        "output_tokens": 121,
        "output_tokens_details": {
            "reasoning_tokens": 0,
            "tool_output_tokens": 0,
            "output_tokens_per_turn": [],
            "tool_output_tokens_per_turn": []
        },
        "total_tokens": 275
    },
    "user": null,
    "input_messages": null,
    "output_messages": null
}

This reverts commit 38558b1. un-revert some changes Signed-off-by: Andrew Xia <[email protected]> fixes and found some more bugs Signed-off-by: Andrew Xia <[email protected]>

Signed-off-by: Andrew Xia <[email protected]>

mergify · 2025-12-03T04:23:24Z

Documentation preview: https://vllm--29798.org.readthedocs.build/en/29798/

Signed-off-by: Andrew Xia <[email protected]>

qandrew · 2025-12-03T06:56:11Z

cc @chaunceyjiang @yeqcharlotte ready for review (:

chatgpt-codex-connector · 2025-12-03T06:59:07Z

💡 Codex Review

vllm/vllm/entrypoints/openai/serving_responses.py

Lines 312 to 314 in a57e4d8

    
           import fbvscode 
        
           fbvscode.set_trace()

Drop debugger import that halts responses API

create_responses now immediately imports fbvscode and calls set_trace() before any validation. fbvscode is not a declared dependency, so every responses request will either raise ModuleNotFoundError or break into a debugger, preventing the endpoint from serving responses at all. This is a hard blocker for the responses API. (vllm/entrypoints/openai/serving_responses.py:312-314)

vllm/vllm/entrypoints/context.py

Lines 281 to 285 in a57e4d8

    
           message = ResponseFunctionToolCallOutputItem( 
        
               id=f"fco_{random_uuid()}", 
        
               type="function_call_output", 
        
               call_id=f"call_{random_uuid()}", 
        
               output=result_str,

Preserve tool call id when emitting python tool output

call_python_tool creates the ResponseFunctionToolCallOutputItem with a new random call_id instead of reusing the call_id from the preceding function_call. When the next turn is rendered, construct_input_messages uses the output item’s call_id as the tool_call_id (see vllm/entrypoints/responses_utils.py:141-146), so the tool result is associated with an id that does not match the assistant’s tool call. This breaks multi‑turn python/mcp tool conversations under ParsableContext because the model cannot link tool output to the original call. (vllm/entrypoints/context.py:281-285)

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Andrew Xia <[email protected]>

heheda12345 · 2025-12-03T18:19:20Z

CC @yeqcharlotte

Signed-off-by: Andrew Xia <[email protected]>

vllm/entrypoints/context.py

chaunceyjiang · 2025-12-04T10:38:43Z

vllm/entrypoints/openai/parser/responses_parser.py

        reasoning_parser_cls: Callable[[AnyTokenizer], ReasoningParser],
        response_messages: list[ResponseInputOutputItem],
        request: ResponsesRequest,
+        tool_parser_cls,


chaunceyjiang · 2025-12-04T10:42:35Z

vllm/entrypoints/tool.py

+        This function converts parsable context types to harmony and
+        back so we can use GPTOSS demo python tool
+        """
+        from vllm.entrypoints.context import ParsableContext


Why not import it at the top?

it was similar in for HarmonyContext. I think if we move to the top we get a circular import

Signed-off-by: Andrew Xia <[email protected]>

chaunceyjiang

Thanks~

yeqcharlotte

for all the oai entrypoint logic we add can we introduce some unit tests? also how adapter is it for anthropic apis?

yeqcharlotte · 2025-12-05T06:30:26Z

tests/entrypoints/openai/test_response_api_parsable_context.py

        VLLM_USE_EXPERIMENTAL_PARSER_CONTEXT="1",
-        # uncomment for tool calling
-        # PYTHON_EXECUTION_BACKEND="dangerously_use_uv",
+        PYTHON_EXECUTION_BACKEND="dangerously_use_uv",


oh why was this commented before? did it have issues with ci?

i left it there in a previous PR because we didn't have tool calling yet, so it wasn't necessary yet. There weren't any CI issues

yeqcharlotte · 2025-12-05T06:31:40Z

vllm/entrypoints/context.py

    def need_builtin_tool_call(self) -> bool:
        """Return true if the last message is a MCP tool call"""
+        last_message = self.parser.response_messages[-1]
+        # TODO: figure out which tools are MCP tools
+        if (  # noqa: SIM103
+            last_message.type == "function_call"
+            and last_message.name in ("code_interpreter", "python")
+        ):
+            return True
+
        return False


this format is quite bad lol. let's directly check the condition. also should we hardcode "code_interpreter", "python" here? i remember @alecsolder made the changes to centralize all tools to go through mcp tool type.

if xxxx: return True return False

i was thinking to clean up the code in #29989, which will include browser & container tool if that's okay? This PR is just to complete the ability to call only the python tool lol

qandrew · 2025-12-05T06:56:10Z

for all the oai entrypoint logic we add can we introduce some unit tests? also how adapter is it for anthropic apis?

I added some unit tests in this PR in tests/entrypoints/openai/test_response_api_parsable_context.py :)
Right now, going from ResponsesAPI <-> ChatCompletions is pretty easy with https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/responses_utils.py#L44. I think it should be pretty adaptable; one way to do it is to have a ResponsesAPI <-> MessagesAPI converter, or we could write a MessagesParser similar to what we have in ResponsesParser.

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Dec 1, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Dec 1, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Dec 1, 2025

qandrew force-pushed the mcp-2-with-tools branch from 5ac5265 to 60a350e Compare December 1, 2025 18:45

qandrew mentioned this pull request Dec 1, 2025

[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug #29555

Merged

qandrew force-pushed the mcp-2-with-tools branch 2 times, most recently from d0f9eb7 to 4260d51 Compare December 2, 2025 18:00

Andrew Xia added 2 commits December 2, 2025 19:18

initial commit revert no tools

7dc5727

This reverts commit 38558b1. un-revert some changes Signed-off-by: Andrew Xia <[email protected]> fixes and found some more bugs Signed-off-by: Andrew Xia <[email protected]>

clean

c707a49

Signed-off-by: Andrew Xia <[email protected]>

qandrew force-pushed the mcp-2-with-tools branch from 4260d51 to c707a49 Compare December 3, 2025 03:18

getting review ready

d136ff7

Signed-off-by: Andrew Xia <[email protected]>

qandrew force-pushed the mcp-2-with-tools branch from 03c3f26 to d136ff7 Compare December 3, 2025 04:22

mergify bot added the documentation Improvements or additions to documentation label Dec 3, 2025

qandrew changed the title ~~[responsesAPI][5] ResponsesParser with tools for full MCP loop~~ [responsesAPI][5] ResponsesParser with tools for full MCP python loop Dec 3, 2025

qandrew marked this pull request as ready for review December 3, 2025 06:54

qandrew requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang and robertgshaw2-redhat as code owners December 3, 2025 06:54

mcp test for non harmony

6cf0d2a

Signed-off-by: Andrew Xia <[email protected]>

qandrew force-pushed the mcp-2-with-tools branch from a57e4d8 to 6cf0d2a Compare December 3, 2025 06:55

nits

1cee382

Signed-off-by: Andrew Xia <[email protected]>

qandrew force-pushed the mcp-2-with-tools branch from d00871e to 1cee382 Compare December 3, 2025 07:06

Merge branch 'main' into mcp-2-with-tools

da0105e

Signed-off-by: Andrew Xia <[email protected]>

chaunceyjiang reviewed Dec 4, 2025

View reviewed changes

vllm/entrypoints/context.py Outdated Show resolved Hide resolved

chaunceyjiang reviewed Dec 4, 2025

View reviewed changes

vllm/entrypoints/context.py Outdated Show resolved Hide resolved

chaunceyjiang reviewed Dec 4, 2025

View reviewed changes

chauncey comments

8984744

Signed-off-by: Andrew Xia <[email protected]>

qandrew force-pushed the mcp-2-with-tools branch from c7d95f9 to 8984744 Compare December 4, 2025 22:33

qandrew requested a review from chaunceyjiang December 4, 2025 22:52

chaunceyjiang approved these changes Dec 5, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Dec 5, 2025

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025

yeqcharlotte reviewed Dec 5, 2025

View reviewed changes

qandrew mentioned this pull request Dec 5, 2025

[Feature]: MCP Support for Non Harmony Models #30115

Open

1 task

zou3519 merged commit da7bc54 into vllm-project:main Dec 5, 2025
49 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Dec 5, 2025

Uh oh!

[responsesAPI][5] ResponsesParser with tools for full MCP python loop #29798

[responsesAPI][5] ResponsesParser with tools for full MCP python loop #29798

Uh oh!

Conversation

qandrew commented Dec 1, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

mergify bot commented Dec 3, 2025

Uh oh!

qandrew commented Dec 3, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 3, 2025

💡 Codex Review

Uh oh!

heheda12345 commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

chaunceyjiang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew commented Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

qandrew commented Dec 1, 2025 •

edited by github-actions bot

Loading

qandrew Dec 4, 2025 •

edited

Loading