Skip to content

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35906

Open
will-deines wants to merge 9 commits intovllm-project:mainfrom
will-deines:harmony-token-sanitization
Open

[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35906
will-deines wants to merge 9 commits intovllm-project:mainfrom
will-deines:harmony-token-sanitization

Conversation

@will-deines
Copy link
Copy Markdown

@will-deines will-deines commented Mar 3, 2026

Recreated from #35881, which was closed when the fork was temporarily made private.

Summary

GPT-OSS models leak Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) into tool names and recipient fields during generation. This causes:

  • Tool name contamination — e.g. manage_cart<|channel|>commentary instead of manage_cart, corrupting function call routing and causing infinite tool-call loops
  • <|constrain|> as recipient — e.g. <|constrain|>json matches no routing pattern, falls through to MCP handler or raises errors
  • Missing <|start|> between channels — model omits start token between consecutive outputs, causing StreamableParser to throw HarmonyError
  • Malformed <|constrain|> in headers — produces garbage in recipient or content_type fields
  • Free text between channel messages — model outputs trailing plain text after <|end|> before starting the next channel message, causing HarmonyError in EXPECT_START state

Three layers of defense

  1. sanitize_harmony_name() / sanitize_harmony_recipient() — Pure string functions that strip leaked control tokens. sanitize_harmony_name() finds the earliest control token and returns only the text before it. sanitize_harmony_recipient() extends this for dotted recipients (e.g. browser.search) by splitting on ., sanitizing each part individually, and rejoining — preserving the dotted structure while cleaning each component (e.g. browser<|channel|>.searchbrowser.search). If any dotted component is entirely consumed by control tokens (e.g. functions.<|constrain|>json), the whole recipient is considered corrupt and returns empty string, triggering the safe no-recipient fallback rather than misrouting. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.

  2. ResilientStreamableParser — Drop-in wrapper around StreamableParser that intercepts three malformed token patterns:

    • Pattern 1: Missing <|start|> recovery — when parser expects <|start|> but gets <|channel|>, inject the missing <|start|> + role tokens. Role is tracked dynamically from self._inner.current_role during processing (not hardcoded), so it works correctly for any role.
    • Pattern 2: Malformed <|constrain|> in headers — skip tokens until <|message|> or <|end|>
    • Pattern 3: Free text between messages — silently discard any token in EXPECT_START state that is not <|start|>. The triggered_tags grammar allows free tokens in the sub-dispatch loop, so the model may generate trailing text after a <|end|> that isn't part of any channel. Discarding these tokens preserves all completed messages while ignoring inter-message garbage.
    • last_consumed_token tracking — tracks which tokens were actually forwarded to the inner parser, so that callers (e.g. StreamingHarmonyContext.append_output()) don't record discarded tokens in last_tok. Without this, render_for_completion() could IndexError searching for a token that was never rendered.
  3. Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as None so it falls through to _parse_message_no_recipient() (produces a user-visible message instead of a misrouted MCP call).

Related Issues & PRs

# Title Status Relation
#32587 Special tokens leak into tool names Open Primary bug report for tool name contamination
#30372 Distorted tool names + infinite tool-call loop Open Consequence of tool name contamination
#23567 HarmonyError: unexpected tokens in message header Open Parser crash from malformed sequences — Pattern 2 + 3 address this
#28262 Incorrect input/output handling in Responses API Open Channel metadata loss causing <|constrain|> misrouting
#31607 Add SM 12.1 support + Fix GPT-OSS Harmony garbled reasoning and HarmonyError crashes Open Same class of HarmonyError crashes; our ResilientStreamableParser handles the parser-level recovery
#31677 Sanitize malformed tool call recipients (stale PR) Open Strips <|channel|> from recipients; our approach is broader (all token types, structured recipients)
#32633 Fix token leaks in tool names and streaming (stale PR) Open Defines sanitize + strip functions; our approach unifies string-level + token-level recovery
#28303 Parse gpt-oss refusals w/ non-strict mode (stale PR) Open Different approach via openai-harmony library
#29236 Fix gpt oss tool parser v2 (stale PR) Open Also addresses tag sanitization
#34857 Responses API & Tool Calling H1 2026 roadmap Open Lists "guided decode and structured outputs" as focus area
#37433 [Responses API] tool_choice support for GPT-OSS Draft (ours) Downstream dependent — depends on ResilientStreamableParser for reliable parsing of forced tool calls

Decisions to debate

  1. Wrapper vs. monkey-patch for StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This means get_streamable_parser_for_assistant() returns our wrapper instead of a raw StreamableParser. All existing consumers work unchanged, but isinstance(parser, StreamableParser) checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.

  2. String-level vs. token-level sanitization: sanitize_harmony_name() operates on strings, not token IDs. This is intentional — by the time we have a message.recipient or function_name, it's already a string. Token-level recovery is handled separately by ResilientStreamableParser.process(). The two layers are complementary, not redundant.

  3. Hardcoded token IDs (200003, 200005–200008): The ResilientStreamableParser references specific GPT-OSS encoding token IDs. These are stable across the harmony-gpt-oss encoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.

  4. Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the ResilientStreamableParser should catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. direct Message construction in tests or from previous_input_messages), the string-level sanitization still catches leaked tokens.

  5. Empty-after-sanitization → None fallback: When sanitizing a recipient produces an empty string, we convert it to None rather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.

  6. Structured recipient sanitization (sanitize_harmony_recipient): Rather than applying sanitize_harmony_name() to the full dotted string (which would truncate functions.get_weather<|channel|>commentary to just functions), we split on ., sanitize each part, and rejoin. If any component sanitizes to empty (e.g. functions.<|constrain|>json where the second component is entirely a control token), the whole recipient is treated as corrupt and returns empty string — this prevents partial recipients like "functions" from failing startswith("functions.") checks and falling through to incorrect routing. The downside is slightly more complexity; the upside is that dotted names like browser.search or container.exec survive contamination in any component, while fully-corrupt recipients safely fall back to no-recipient handling.

Files changed

File Change
vllm/entrypoints/openai/parser/harmony_utils.py Add sanitize_harmony_name(), sanitize_harmony_recipient() (returns empty when any component is fully consumed), ResilientStreamableParser (3 patterns + dynamic role tracking + last_consumed_token property), wrap get_streamable_parser_for_assistant(), sanitize input parsing
vllm/entrypoints/openai/responses/harmony.py Sanitize recipients in output dispatch (harmony_to_response_output) + input parsing (response_input_to_harmony, _parse_chat_format_message) + function name extraction (_parse_function_call)
vllm/entrypoints/openai/responses/context.py Sanitize recipients in tool routing (need_builtin_tool_call, call_tool, call_search_tool, call_container_tool); fix append_output() to only update last_tok from consumed tokens; add bounds check in render_for_completion() to prevent IndexError
vllm/entrypoints/openai/chat_completion/stream_harmony.py Sanitize tool names in streaming delta extraction (extract_harmony_streaming_delta)
tests/entrypoints/openai/parser/test_harmony_utils.py TestSanitizeHarmonyName (7 cases), TestSanitizeHarmonyRecipient (10 cases), TestResilientStreamableParser (7 cases incl. message recipient sanitization + last_consumed_token tracking)
tests/entrypoints/openai/responses/test_harmony_utils.py TestHarmonyOutputSanitization (2 cases: contaminated recipient → message, contaminated function name → cleaned)

Test plan

  • TestSanitizeHarmonyName — 7 cases: clean passthrough, <|channel|> stripping, <|constrain|> stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespace
  • TestSanitizeHarmonyRecipient — 10 cases: clean dotted, clean simple, contaminated first part, contaminated second part, pure control token, functions dotted contaminated, empty string, container dotted contaminated, full component contamination returns empty, container full component contamination returns empty
  • TestResilientStreamableParser — 7 cases: normal sequence unchanged, missing <|start|> recovery, <|constrain|> in header skip, message recipients sanitized, last_consumed_token tracks normal processing, Pattern 3 discarded tokens not in last_consumed_token, Pattern 2 skip mode discarded tokens not in last_consumed_token
  • TestHarmonyOutputSanitization — 2 cases: <|constrain|>json recipient → message output, contaminated function name → cleaned
  • All existing parser and responses unit tests pass (no regressions)
  • Integration test with live GPT-OSS model (needs model access)

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust, multi-layered defense mechanism to sanitize leaked Harmony protocol control tokens, which is a significant improvement for handling malformed outputs from GPT-OSS models. The introduction of sanitize_harmony_name and the ResilientStreamableParser are well-designed solutions. My review includes a couple of suggestions to further enhance the robustness of the new parser and ensure the completeness of the test suite.

if state == StreamState.EXPECT_START and token_id == _TOK_CHANNEL:
# Inject <|start|> + assistant role token
self._inner.process(_TOK_START)
role_tokens = self._encoding.encode("assistant", allowed_special="all")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In ResilientStreamableParser.process, the role "assistant" is hardcoded when injecting a missing <|start|> token. This makes the wrapper less reusable and tightly coupled to being used for the assistant role only. To improve robustness, you should use the role attribute from the wrapped StreamableParser instance (self._inner.role).

Suggested change
role_tokens = self._encoding.encode("assistant", allowed_special="all")
role_tokens = self._encoding.encode(self._inner.role, allowed_special="all")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — now uses self._inner.role instead of the hardcoded string. Fixed in 6ef1b2f.

Comment on lines +959 to +960
assert len(parser.messages) == 2
assert parser.messages[0].content[0].text == "First."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The test test_constrain_in_header_skipped verifies that two messages are produced after recovering from a malformed sequence, but it only asserts the content of the first message. To ensure the recovery logic is fully correct and the second message is parsed as expected, you should also add an assertion for the content of the second message.

Suggested change
assert len(parser.messages) == 2
assert parser.messages[0].content[0].text == "First."
assert len(parser.messages) == 2
assert parser.messages[0].content[0].text == "First."
assert parser.messages[1].content[0].text == "Second."

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the missing assertion for the second message content. Fixed in 6ef1b2f.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 18, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @will-deines.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 18, 2026
@will-deines will-deines force-pushed the harmony-token-sanitization branch from 0968fea to 80c897a Compare March 18, 2026 13:15
will-deines pushed a commit to will-deines/vllm that referenced this pull request Mar 18, 2026
…rmony_name

- ruff: reformat harmony.py (auto-fixed)
- Add sanitize_harmony_name() and _HARMONY_SPECIAL_TOKEN_STRS to
  harmony_utils.py so the import in harmony.py resolves. This duplicates
  the definition that PR vllm-project#35906 (harmony-token-sanitization) adds; the
  duplicate will be removed on rebase once that PR merges.
- Add sanitize_harmony_name to harmony.py import block

Signed-off-by: Will Deines <will@garr.io>
…and recipients

GPT-OSS models generate Harmony protocol control tokens (<|channel|>,
<|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions
during output generation, causing tool name contamination, recipient
misrouting, and parser crashes.

Three layers of defense:

1. sanitize_harmony_name() — pure string function that strips leaked
   control token strings from tool/recipient names.

2. ResilientStreamableParser — wrapper around StreamableParser that
   recovers from missing <|start|> tokens between messages and
   malformed <|constrain|> tokens in headers.

3. Routing-level fallback — sanitized-to-empty recipients fall through
   to _parse_message_no_recipient() instead of being misrouted.

Applied at all input parsing, output dispatching, tool routing, and
streaming delta extraction sites.

Signed-off-by: Will Deines <will@garr.io>
…ents, remove redundancy

- Add sanitize_harmony_recipient() that splits on '.', sanitizes each
  part, and rejoins to preserve dotted structure (e.g. browser<|channel|>.search
  becomes browser.search instead of being truncated to browser)
- Sanitize recipients on messages returned by ResilientStreamableParser.messages
  to prevent control token injection in multi-turn conversation history
- Remove redundant sanitization in parser_state_to_response_output since
  ResilientStreamableParser.current_recipient already handles it
- Use sanitize_harmony_recipient for full recipient strings in context.py
  and harmony.py routing logic

Signed-off-by: Will Deines <will@garr.io>
…line

Signed-off-by: Will Deines <will@garr.io>
… test assertion

- Use self._inner.role instead of hardcoded "assistant" in
  ResilientStreamableParser.process for correctness with non-assistant roles
- Add assertion for second message content in test_constrain_in_header_skipped

Signed-off-by: Will Deines <will@garr.io>
The triggered_tags grammar's sub-dispatch loop allows all tokens between
triggered tags. The model can generate trailing text after a <|end|> before
EOS (e.g. restating the answer as plain text after a tool call). These
free-text tokens arrive in EXPECT_START state, causing HarmonyError.

Add Pattern 3 to ResilientStreamableParser: silently discard any token
in EXPECT_START state that is not <|start|>. This preserves all completed
messages while ignoring inter-message garbage tokens.

Signed-off-by: Will Deines <will@garr.io>
@will-deines will-deines force-pushed the harmony-token-sanitization branch from 80c897a to 714ad90 Compare March 18, 2026 14:01
@mergify mergify bot removed the needs-rebase label Mar 18, 2026
will-deines and others added 2 commits March 18, 2026 10:58
…ttribute

Track inner parser's current_role during process() calls and use the
cached value for Pattern 1 recovery, fixing the broken self._inner.role
reference. Pattern 1 only fires after <|end|> (EXPECT_START state), so
at least one message has been processed and _last_known_role is
guaranteed non-None.

Signed-off-by: Will Deines <will@garr.io>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b871dfc6e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +140 to +141
if state == StreamState.EXPECT_START and token_id != _TOK_START:
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid dropping EXPECT_START garbage without fixing last_tok

When a streaming Harmony response emits stray plain text after a completed message, this branch discards the token so it never becomes part of parser.messages, but StreamingHarmonyContext.append_output() still records that raw token as self.last_tok. On the next built-in-tool turn, vllm/entrypoints/openai/responses/context.py:920-924 walks backward through the re-rendered prompt until it finds last_tok; because the dropped token is no longer present, that loop runs off the front of rendered_tokens and raises IndexError. So the new Pattern 3 recovery path can crash exactly the malformed outputs it is supposed to tolerate.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — validated and fixed in 8c100e4.

P1 fix: ResilientStreamableParser now tracks _last_consumed_token, updated only when a token is actually forwarded to the inner parser (not on Pattern 2/3 discards). StreamingHarmonyContext.append_output() reads parser.last_consumed_token instead of the raw loop variable. Also added a bounds check in render_for_completion() as a safety net. Three new tests cover normal tracking, Pattern 3 discard, and Pattern 2 skip mode.

Comment on lines +77 to +78
sanitized_parts = [p for p in sanitized_parts if p]
return ".".join(sanitized_parts)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return empty recipient when a dotted component sanitizes away

If a leaked control token wipes out an entire dotted component, for example functions.<|constrain|>json or container.<|channel|>commentary, filtering empty parts here turns the recipient into functions or container instead of an empty string. Downstream code then treats those as real recipients: harmony_to_response_output() falls through to _parse_mcp_call for functions, while HarmonyContext.need_builtin_tool_call() no longer recognizes container as a tool invocation. That means malformed tool calls are silently misrouted instead of hitting the intended empty-recipient fallback.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — validated and fixed in 8c100e4.

P2 fix: sanitize_harmony_recipient() now returns "" when any dotted component sanitizes to empty, instead of filtering out empty parts and rejoining. This ensures functions.<|constrain|>json becomes "" (not "functions"), correctly triggering the no-recipient fallback instead of misrouting to _parse_mcp_call(). Two new tests cover functions.<|constrain|>json and container.<|channel|>commentary.

…x recipient misrouting

Bug 1: ResilientStreamableParser.process() silently discards tokens in
Pattern 2 (skip mode) and Pattern 3 (free text in EXPECT_START), but
StreamingHarmonyContext.append_output() unconditionally set last_tok to
the most recent token. If that token was discarded, render_for_completion()
would fail with IndexError searching for it. Now track last_consumed_token
in the parser and only update last_tok when a token was actually forwarded.
Also add a bounds check in render_for_completion() as a safety net.

Bug 2: sanitize_harmony_recipient() filtered out empty parts after
sanitization, collapsing e.g. "functions.<|constrain|>json" to "functions"
(bare), which failed startswith("functions.") checks and fell through to
incorrect routing. Now return empty string when any component sanitizes to
empty, triggering the safe no-recipient fallback.

Signed-off-by: Will Deines <will@garr.io>
…ization

Signed-off-by: Will Deines <will@garr.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend gpt-oss Related to GPT-OSS models

Projects

Status: To Triage

Development

Successfully merging this pull request may close these issues.

2 participants