Skip to content

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35901

Closed
will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines:harmony-token-sanitization
Closed

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35901
will-deines wants to merge 1 commit intovllm-project:mainfrom
will-deines:harmony-token-sanitization

Conversation

@will-deines
Copy link
Copy Markdown

Recreated from #35881, which was closed when the fork was temporarily made private.

Summary

GPT-OSS models leak Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) into tool names and recipient fields during generation. This causes:

  • Tool name contamination — e.g. manage_cart<|channel|>commentary instead of manage_cart, corrupting function call routing and causing infinite tool-call loops
  • <|constrain|> as recipient — e.g. <|constrain|>json matches no routing pattern, falls through to MCP handler or raises errors
  • Missing <|start|> between channels — model omits start token between consecutive outputs, causing StreamableParser to throw HarmonyError
  • Malformed <|constrain|> in headers — produces garbage in recipient or content_type fields

Three layers of defense

  1. sanitize_harmony_name() — Pure string function that finds the earliest Harmony control token in a name and returns only the text before it. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.

  2. ResilientStreamableParser — Drop-in wrapper around StreamableParser that intercepts two malformed token patterns:

    • Missing <|start|> recovery: when parser expects <|start|> but gets <|channel|>, inject the missing tokens
    • Malformed <|constrain|> in headers: skip tokens until <|message|> or <|end|>
  3. Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as None so it falls through to _parse_message_no_recipient() (produces a user-visible message instead of a misrouted MCP call).

Related Issues & PRs

# Title Relation
#32587 Special tokens leak into tool names Primary bug report for tool name contamination
#30372 Distorted tool names + infinite tool-call loop Consequence of tool name contamination
#23567 HarmonyError: unexpected tokens in message header Parser crash from malformed sequences
#28262 Incorrect input/output handling in Responses API Channel metadata loss causing <|constrain|> misrouting
#31677 Sanitize malformed tool call recipients (stale PR) Strips <|channel|> from recipients
#32633 Fix token leaks in tool names and streaming (stale PR) Defines sanitize + strip functions
#28303 Parse gpt-oss refusals w/ non-strict mode (stale PR) Different approach via openai-harmony library
#29236 Fix gpt oss tool parser v2 (stale PR) Also addresses tag sanitization
#34857 Responses API & Tool Calling H1 2026 roadmap Lists "guided decode and structured outputs" as focus area

Decisions to debate

  1. Wrapper vs. monkey-patch for StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This means get_streamable_parser_for_assistant() returns our wrapper instead of a raw StreamableParser. All existing consumers work unchanged, but isinstance(parser, StreamableParser) checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.

  2. String-level vs. token-level sanitization: sanitize_harmony_name() operates on strings, not token IDs. This is intentional — by the time we have a message.recipient or function_name, it's already a string. Token-level recovery is handled separately by ResilientStreamableParser.process(). The two layers are complementary, not redundant.

  3. Hardcoded token IDs (200003, 200005–200008): The ResilientStreamableParser references specific GPT-OSS encoding token IDs. These are stable across the harmony-gpt-oss encoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.

  4. Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the ResilientStreamableParser should catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. direct Message construction in tests or from previous_input_messages), the string-level sanitization still catches leaked tokens.

  5. Empty-after-sanitization → None fallback: When sanitizing a recipient produces an empty string, we convert it to None rather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.

Files changed

File Change
vllm/entrypoints/openai/parser/harmony_utils.py Add sanitize_harmony_name(), ResilientStreamableParser, wrap get_streamable_parser_for_assistant(), sanitize input parsing
vllm/entrypoints/openai/responses/harmony.py Sanitize recipients in output dispatch + input parsing functions
vllm/entrypoints/openai/responses/context.py Sanitize recipients in tool routing
vllm/entrypoints/openai/chat_completion/stream_harmony.py Sanitize tool names in streaming delta extraction
tests/entrypoints/openai/parser/test_harmony_utils.py Unit tests for sanitize_harmony_name + ResilientStreamableParser
tests/entrypoints/openai/responses/test_harmony_utils.py Unit tests for output sanitization (contaminated recipients + tool names)

Test plan

  • TestSanitizeHarmonyName — 7 cases: clean passthrough, <|channel|> stripping, <|constrain|> stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespace
  • TestResilientStreamableParser — 3 cases: normal sequence unchanged, missing <|start|> recovery, <|constrain|> in header skip
  • TestHarmonyOutputSanitization — 2 cases: <|constrain|>json recipient → message output, contaminated function name → cleaned
  • All existing parser and responses unit tests pass (90 total, 0 regressions)
  • Integration test with live GPT-OSS model (needs model access)

…and recipients

GPT-OSS models generate Harmony protocol control tokens (<|channel|>,
<|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions
during output generation, causing tool name contamination, recipient
misrouting, and parser crashes.

Three layers of defense:

1. sanitize_harmony_name() — pure string function that strips leaked
   control token strings from tool/recipient names.

2. ResilientStreamableParser — wrapper around StreamableParser that
   recovers from missing <|start|> tokens between messages and
   malformed <|constrain|> tokens in headers.

3. Routing-level fallback — sanitized-to-empty recipients fall through
   to _parse_message_no_recipient() instead of being misrouted.

Applied at all input parsing, output dispatching, tool routing, and
streaming delta extraction sites.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust, multi-layered defense mechanism to sanitize leaked Harmony protocol control tokens from tool names and recipients, including a new sanitize_harmony_name utility, a ResilientStreamableParser, and routing fallbacks. While this is a critical security improvement to prevent infinite tool-call loops and protocol misrouting, the current implementation is incomplete. It fails to sanitize these tokens when they are reflected back into the conversation history as author names for tool responses, allowing for protocol smuggling where an attacker can inject protocol delimiters into the conversation state, potentially bypassing security controls in subsequent turns. The implementation is otherwise clean, well-documented, and accompanied by a comprehensive set of unit tests, though there is one suggestion to enhance a test case for complete validation of the error recovery logic.

Comment on lines 690 to +692
if recipient is not None:
recipient = sanitize_harmony_name(recipient)
if recipient:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The last_msg.recipient is used unsafely to set the Author name in tool responses. Since last_msg.recipient can contain leaked Harmony control tokens (as acknowledged by this PR), an attacker can manipulate the LLM to output a contaminated recipient that, when reflected back in the conversation history as an author name, injects protocol delimiters. This allows for 'protocol smuggling' where a single tool response can be interpreted as multiple messages in subsequent turns, potentially bypassing security controls or misrepresenting the conversation state.

To remediate this, ensure that the recipient is sanitized before being assigned back to the message object or used as an author name.

        if recipient is not None:
            recipient = sanitize_harmony_name(recipient)
            last_msg.recipient = recipient
        if recipient:

for call in tool_calls:
func = call.get("function", {})
name = func.get("name", "")
name = sanitize_harmony_name(func.get("name", ""))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

In _parse_chat_format_message, when the role is tool, the name is extracted from chat_msg without sanitization (line 108). This contaminated name is then used to create an Author object (line 117), which leads to the same protocol smuggling vulnerability described in other parts of this review.

While line 108 is not directly modified in this diff, the introduction of sanitize_harmony_name in this file (line 97) makes it clear that this sanitization should be applied here as well to ensure consistency and security across all message parsing paths.

Comment on lines +959 to +960
assert len(parser.messages) == 2
assert parser.messages[0].content[0].text == "First."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This test verifies that the parser produces two messages, but it only asserts the content of the first message. To ensure the error recovery logic for malformed headers is fully functional, it's important to also assert the content of the second message, which should have been parsed correctly after skipping the garbage tokens.

Suggested change
assert len(parser.messages) == 2
assert parser.messages[0].content[0].text == "First."
assert len(parser.messages) == 2
assert parser.messages[0].content[0].text == "First."
assert parser.messages[1].content[0].text == "Second."

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3468a33ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +109 to +112
# Pattern 2: <|constrain|> during HEADER → enter skip mode
if state == StreamState.HEADER and token_id == _TOK_CONSTRAIN:
self._skip_until_message_or_end = True
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve valid <|constrain|> headers during parsing

<|constrain|> is part of normal Harmony tool-call headers (for example, existing chat tests build calls as ...<|constrain|>json<|message|>...), but this branch unconditionally treats any HEADER-state constrain token as malformed and skips everything until <|message|>/<|end|>. That strips legitimate header metadata (notably content_type) from otherwise valid outputs, so downstream parsing loses type information and can mis-handle non-JSON constrained tool payloads.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants