[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients by will-deines · Pull Request #35901 · vllm-project/vllm

will-deines · 2026-03-03T20:09:20Z

Recreated from #35881, which was closed when the fork was temporarily made private.

Summary

GPT-OSS models leak Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) into tool names and recipient fields during generation. This causes:

Tool name contamination — e.g. manage_cart<|channel|>commentary instead of manage_cart, corrupting function call routing and causing infinite tool-call loops
<|constrain|> as recipient — e.g. <|constrain|>json matches no routing pattern, falls through to MCP handler or raises errors
Missing <|start|> between channels — model omits start token between consecutive outputs, causing StreamableParser to throw HarmonyError
Malformed <|constrain|> in headers — produces garbage in recipient or content_type fields

Three layers of defense

sanitize_harmony_name() — Pure string function that finds the earliest Harmony control token in a name and returns only the text before it. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.
ResilientStreamableParser — Drop-in wrapper around StreamableParser that intercepts two malformed token patterns:
- Missing <|start|> recovery: when parser expects <|start|> but gets <|channel|>, inject the missing tokens
- Malformed <|constrain|> in headers: skip tokens until <|message|> or <|end|>
Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as None so it falls through to _parse_message_no_recipient() (produces a user-visible message instead of a misrouted MCP call).

Related Issues & PRs

#	Title	Relation
#32587	Special tokens leak into tool names	Primary bug report for tool name contamination
#30372	Distorted tool names + infinite tool-call loop	Consequence of tool name contamination
#23567	HarmonyError: unexpected tokens in message header	Parser crash from malformed sequences
#28262	Incorrect input/output handling in Responses API	Channel metadata loss causing `<\|constrain\|>` misrouting
#31677	Sanitize malformed tool call recipients (stale PR)	Strips `<\|channel\|>` from recipients
#32633	Fix token leaks in tool names and streaming (stale PR)	Defines sanitize + strip functions
#28303	Parse gpt-oss refusals w/ non-strict mode (stale PR)	Different approach via openai-harmony library
#29236	Fix gpt oss tool parser v2 (stale PR)	Also addresses tag sanitization
#34857	Responses API & Tool Calling H1 2026 roadmap	Lists "guided decode and structured outputs" as focus area

Decisions to debate

Wrapper vs. monkey-patch for StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This means get_streamable_parser_for_assistant() returns our wrapper instead of a raw StreamableParser. All existing consumers work unchanged, but isinstance(parser, StreamableParser) checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.
String-level vs. token-level sanitization: sanitize_harmony_name() operates on strings, not token IDs. This is intentional — by the time we have a message.recipient or function_name, it's already a string. Token-level recovery is handled separately by ResilientStreamableParser.process(). The two layers are complementary, not redundant.
Hardcoded token IDs (200003, 200005–200008): The ResilientStreamableParser references specific GPT-OSS encoding token IDs. These are stable across the harmony-gpt-oss encoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.
Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the ResilientStreamableParser should catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. direct Message construction in tests or from previous_input_messages), the string-level sanitization still catches leaked tokens.
Empty-after-sanitization → None fallback: When sanitizing a recipient produces an empty string, we convert it to None rather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.

Files changed

File	Change
`vllm/entrypoints/openai/parser/harmony_utils.py`	Add `sanitize_harmony_name()`, `ResilientStreamableParser`, wrap `get_streamable_parser_for_assistant()`, sanitize input parsing
`vllm/entrypoints/openai/responses/harmony.py`	Sanitize recipients in output dispatch + input parsing functions
`vllm/entrypoints/openai/responses/context.py`	Sanitize recipients in tool routing
`vllm/entrypoints/openai/chat_completion/stream_harmony.py`	Sanitize tool names in streaming delta extraction
`tests/entrypoints/openai/parser/test_harmony_utils.py`	Unit tests for `sanitize_harmony_name` + `ResilientStreamableParser`
`tests/entrypoints/openai/responses/test_harmony_utils.py`	Unit tests for output sanitization (contaminated recipients + tool names)

Test plan

TestSanitizeHarmonyName — 7 cases: clean passthrough, <|channel|> stripping, <|constrain|> stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespace
TestResilientStreamableParser — 3 cases: normal sequence unchanged, missing <|start|> recovery, <|constrain|> in header skip
TestHarmonyOutputSanitization — 2 cases: <|constrain|>json recipient → message output, contaminated function name → cleaned
All existing parser and responses unit tests pass (90 total, 0 regressions)
Integration test with live GPT-OSS model (needs model access)

…and recipients GPT-OSS models generate Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions during output generation, causing tool name contamination, recipient misrouting, and parser crashes. Three layers of defense: 1. sanitize_harmony_name() — pure string function that strips leaked control token strings from tool/recipient names. 2. ResilientStreamableParser — wrapper around StreamableParser that recovers from missing <|start|> tokens between messages and malformed <|constrain|> tokens in headers. 3. Routing-level fallback — sanitized-to-empty recipients fall through to _parse_message_no_recipient() instead of being misrouted. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.

gemini-code-assist

Code Review

This pull request introduces a robust, multi-layered defense mechanism to sanitize leaked Harmony protocol control tokens from tool names and recipients, including a new sanitize_harmony_name utility, a ResilientStreamableParser, and routing fallbacks. While this is a critical security improvement to prevent infinite tool-call loops and protocol misrouting, the current implementation is incomplete. It fails to sanitize these tokens when they are reflected back into the conversation history as author names for tool responses, allowing for protocol smuggling where an attacker can inject protocol delimiters into the conversation state, potentially bypassing security controls in subsequent turns. The implementation is otherwise clean, well-documented, and accompanied by a comprehensive set of unit tests, though there is one suggestion to enhance a test case for complete validation of the error recovery logic.

gemini-code-assist · 2026-03-03T20:12:54Z

vllm/entrypoints/openai/responses/context.py

        if recipient is not None:
+            recipient = sanitize_harmony_name(recipient)
+        if recipient:


The last_msg.recipient is used unsafely to set the Author name in tool responses. Since last_msg.recipient can contain leaked Harmony control tokens (as acknowledged by this PR), an attacker can manipulate the LLM to output a contaminated recipient that, when reflected back in the conversation history as an author name, injects protocol delimiters. This allows for 'protocol smuggling' where a single tool response can be interpreted as multiple messages in subsequent turns, potentially bypassing security controls or misrepresenting the conversation state.

To remediate this, ensure that the recipient is sanitized before being assigned back to the message object or used as an author name.

if recipient is not None: recipient = sanitize_harmony_name(recipient) last_msg.recipient = recipient if recipient:

gemini-code-assist · 2026-03-03T20:12:54Z

vllm/entrypoints/openai/responses/harmony.py

        for call in tool_calls:
            func = call.get("function", {})
-            name = func.get("name", "")
+            name = sanitize_harmony_name(func.get("name", ""))


In _parse_chat_format_message, when the role is tool, the name is extracted from chat_msg without sanitization (line 108). This contaminated name is then used to create an Author object (line 117), which leads to the same protocol smuggling vulnerability described in other parts of this review.

While line 108 is not directly modified in this diff, the introduction of sanitize_harmony_name in this file (line 97) makes it clear that this sanitization should be applied here as well to ensure consistency and security across all message parsing paths.

gemini-code-assist · 2026-03-03T20:12:54Z

tests/entrypoints/openai/parser/test_harmony_utils.py

+        assert len(parser.messages) == 2
+        assert parser.messages[0].content[0].text == "First."


This test verifies that the parser produces two messages, but it only asserts the content of the first message. To ensure the error recovery logic for malformed headers is fully functional, it's important to also assert the content of the second message, which should have been parsed correctly after skipping the garbage tokens.

Suggested change

assert len(parser.messages) == 2

assert parser.messages[0].content[0].text == "First."

assert len(parser.messages) == 2

assert parser.messages[0].content[0].text == "First."

assert parser.messages[1].content[0].text == "Second."

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3468a33ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-03T20:17:40Z

vllm/entrypoints/openai/parser/harmony_utils.py

+        # Pattern 2: <|constrain|> during HEADER → enter skip mode
+        if state == StreamState.HEADER and token_id == _TOK_CONSTRAIN:
+            self._skip_until_message_or_end = True
+            return


Preserve valid <|constrain|> headers during parsing

<|constrain|> is part of normal Harmony tool-call headers (for example, existing chat tests build calls as ...<|constrain|>json<|message|>...), but this branch unconditionally treats any HEADER-state constrain token as malformed and skips everything until <|message|>/<|end|>. That strips legitimate header metadata (notably content_type) from otherwise valid outputs, so downstream parsing loses type information and can mis-handle non-JSON constrained tool payloads.

Useful? React with 👍 / 👎.

will-deines requested review from DarkLight1337, NickLucche, aarnphm, chaunceyjiang, robertgshaw2-redhat and russellb as code owners March 3, 2026 20:09

mergify bot added frontend gpt-oss Related to GPT-OSS models labels Mar 3, 2026

github-project-automation bot added this to gpt-oss Issues & Enhancements Mar 3, 2026

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Mar 3, 2026

will-deines closed this Mar 3, 2026

github-project-automation bot moved this from To Triage to Done in gpt-oss Issues & Enhancements Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35901

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35901
will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines:harmony-token-sanitization

will-deines commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		assert len(parser.messages) == 2
		assert parser.messages[0].content[0].text == "First."

Uh oh!

Conversation

will-deines commented Mar 3, 2026

Summary

Three layers of defense

Related Issues & PRs

Decisions to debate

Files changed

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants