[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35901
[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35901will-deines wants to merge 1 commit intovllm-project:mainfrom
Conversation
…and recipients GPT-OSS models generate Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions during output generation, causing tool name contamination, recipient misrouting, and parser crashes. Three layers of defense: 1. sanitize_harmony_name() — pure string function that strips leaked control token strings from tool/recipient names. 2. ResilientStreamableParser — wrapper around StreamableParser that recovers from missing <|start|> tokens between messages and malformed <|constrain|> tokens in headers. 3. Routing-level fallback — sanitized-to-empty recipients fall through to _parse_message_no_recipient() instead of being misrouted. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.
There was a problem hiding this comment.
Code Review
This pull request introduces a robust, multi-layered defense mechanism to sanitize leaked Harmony protocol control tokens from tool names and recipients, including a new sanitize_harmony_name utility, a ResilientStreamableParser, and routing fallbacks. While this is a critical security improvement to prevent infinite tool-call loops and protocol misrouting, the current implementation is incomplete. It fails to sanitize these tokens when they are reflected back into the conversation history as author names for tool responses, allowing for protocol smuggling where an attacker can inject protocol delimiters into the conversation state, potentially bypassing security controls in subsequent turns. The implementation is otherwise clean, well-documented, and accompanied by a comprehensive set of unit tests, though there is one suggestion to enhance a test case for complete validation of the error recovery logic.
| if recipient is not None: | ||
| recipient = sanitize_harmony_name(recipient) | ||
| if recipient: |
There was a problem hiding this comment.
The last_msg.recipient is used unsafely to set the Author name in tool responses. Since last_msg.recipient can contain leaked Harmony control tokens (as acknowledged by this PR), an attacker can manipulate the LLM to output a contaminated recipient that, when reflected back in the conversation history as an author name, injects protocol delimiters. This allows for 'protocol smuggling' where a single tool response can be interpreted as multiple messages in subsequent turns, potentially bypassing security controls or misrepresenting the conversation state.
To remediate this, ensure that the recipient is sanitized before being assigned back to the message object or used as an author name.
if recipient is not None:
recipient = sanitize_harmony_name(recipient)
last_msg.recipient = recipient
if recipient:| for call in tool_calls: | ||
| func = call.get("function", {}) | ||
| name = func.get("name", "") | ||
| name = sanitize_harmony_name(func.get("name", "")) |
There was a problem hiding this comment.
In _parse_chat_format_message, when the role is tool, the name is extracted from chat_msg without sanitization (line 108). This contaminated name is then used to create an Author object (line 117), which leads to the same protocol smuggling vulnerability described in other parts of this review.
While line 108 is not directly modified in this diff, the introduction of sanitize_harmony_name in this file (line 97) makes it clear that this sanitization should be applied here as well to ensure consistency and security across all message parsing paths.
| assert len(parser.messages) == 2 | ||
| assert parser.messages[0].content[0].text == "First." |
There was a problem hiding this comment.
This test verifies that the parser produces two messages, but it only asserts the content of the first message. To ensure the error recovery logic for malformed headers is fully functional, it's important to also assert the content of the second message, which should have been parsed correctly after skipping the garbage tokens.
| assert len(parser.messages) == 2 | |
| assert parser.messages[0].content[0].text == "First." | |
| assert len(parser.messages) == 2 | |
| assert parser.messages[0].content[0].text == "First." | |
| assert parser.messages[1].content[0].text == "Second." |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b3468a33ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # Pattern 2: <|constrain|> during HEADER → enter skip mode | ||
| if state == StreamState.HEADER and token_id == _TOK_CONSTRAIN: | ||
| self._skip_until_message_or_end = True | ||
| return |
There was a problem hiding this comment.
Preserve valid
<|constrain|> headers during parsing
<|constrain|> is part of normal Harmony tool-call headers (for example, existing chat tests build calls as ...<|constrain|>json<|message|>...), but this branch unconditionally treats any HEADER-state constrain token as malformed and skips everything until <|message|>/<|end|>. That strips legitimate header metadata (notably content_type) from otherwise valid outputs, so downstream parsing loses type information and can mis-handle non-JSON constrained tool payloads.
Useful? React with 👍 / 👎.
Summary
GPT-OSS models leak Harmony protocol control tokens (
<|channel|>,<|constrain|>,<|start|>,<|end|>,<|message|>) into tool names and recipient fields during generation. This causes:manage_cart<|channel|>commentaryinstead ofmanage_cart, corrupting function call routing and causing infinite tool-call loops<|constrain|>as recipient — e.g.<|constrain|>jsonmatches no routing pattern, falls through to MCP handler or raises errors<|start|>between channels — model omits start token between consecutive outputs, causingStreamableParserto throwHarmonyError<|constrain|>in headers — produces garbage in recipient or content_type fieldsThree layers of defense
sanitize_harmony_name()— Pure string function that finds the earliest Harmony control token in a name and returns only the text before it. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.ResilientStreamableParser— Drop-in wrapper aroundStreamableParserthat intercepts two malformed token patterns:<|start|>recovery: when parser expects<|start|>but gets<|channel|>, inject the missing tokens<|constrain|>in headers: skip tokens until<|message|>or<|end|>Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as
Noneso it falls through to_parse_message_no_recipient()(produces a user-visible message instead of a misrouted MCP call).Related Issues & PRs
<|constrain|>misrouting<|channel|>from recipientsDecisions to debate
Wrapper vs. monkey-patch for
StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This meansget_streamable_parser_for_assistant()returns our wrapper instead of a rawStreamableParser. All existing consumers work unchanged, butisinstance(parser, StreamableParser)checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.String-level vs. token-level sanitization:
sanitize_harmony_name()operates on strings, not token IDs. This is intentional — by the time we have amessage.recipientorfunction_name, it's already a string. Token-level recovery is handled separately byResilientStreamableParser.process(). The two layers are complementary, not redundant.Hardcoded token IDs (200003, 200005–200008): The
ResilientStreamableParserreferences specific GPT-OSS encoding token IDs. These are stable across theharmony-gpt-ossencoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the
ResilientStreamableParsershould catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. directMessageconstruction in tests or fromprevious_input_messages), the string-level sanitization still catches leaked tokens.Empty-after-sanitization →
Nonefallback: When sanitizing a recipient produces an empty string, we convert it toNonerather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.Files changed
vllm/entrypoints/openai/parser/harmony_utils.pysanitize_harmony_name(),ResilientStreamableParser, wrapget_streamable_parser_for_assistant(), sanitize input parsingvllm/entrypoints/openai/responses/harmony.pyvllm/entrypoints/openai/responses/context.pyvllm/entrypoints/openai/chat_completion/stream_harmony.pytests/entrypoints/openai/parser/test_harmony_utils.pysanitize_harmony_name+ResilientStreamableParsertests/entrypoints/openai/responses/test_harmony_utils.pyTest plan
TestSanitizeHarmonyName— 7 cases: clean passthrough,<|channel|>stripping,<|constrain|>stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespaceTestResilientStreamableParser— 3 cases: normal sequence unchanged, missing<|start|>recovery,<|constrain|>in header skipTestHarmonyOutputSanitization— 2 cases:<|constrain|>jsonrecipient → message output, contaminated function name → cleaned