[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35906
[Responses API] Sanitize leaked Harmony control tokens in tool names and recipients#35906will-deines wants to merge 9 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a robust, multi-layered defense mechanism to sanitize leaked Harmony protocol control tokens, which is a significant improvement for handling malformed outputs from GPT-OSS models. The introduction of sanitize_harmony_name and the ResilientStreamableParser are well-designed solutions. My review includes a couple of suggestions to further enhance the robustness of the new parser and ensure the completeness of the test suite.
| if state == StreamState.EXPECT_START and token_id == _TOK_CHANNEL: | ||
| # Inject <|start|> + assistant role token | ||
| self._inner.process(_TOK_START) | ||
| role_tokens = self._encoding.encode("assistant", allowed_special="all") |
There was a problem hiding this comment.
In ResilientStreamableParser.process, the role "assistant" is hardcoded when injecting a missing <|start|> token. This makes the wrapper less reusable and tightly coupled to being used for the assistant role only. To improve robustness, you should use the role attribute from the wrapped StreamableParser instance (self._inner.role).
| role_tokens = self._encoding.encode("assistant", allowed_special="all") | |
| role_tokens = self._encoding.encode(self._inner.role, allowed_special="all") |
There was a problem hiding this comment.
Good point — now uses self._inner.role instead of the hardcoded string. Fixed in 6ef1b2f.
| assert len(parser.messages) == 2 | ||
| assert parser.messages[0].content[0].text == "First." |
There was a problem hiding this comment.
The test test_constrain_in_header_skipped verifies that two messages are produced after recovering from a malformed sequence, but it only asserts the content of the first message. To ensure the recovery logic is fully correct and the second message is parsed as expected, you should also add an assertion for the content of the second message.
| assert len(parser.messages) == 2 | |
| assert parser.messages[0].content[0].text == "First." | |
| assert len(parser.messages) == 2 | |
| assert parser.messages[0].content[0].text == "First." | |
| assert parser.messages[1].content[0].text == "Second." |
There was a problem hiding this comment.
Added the missing assertion for the second message content. Fixed in 6ef1b2f.
6ef1b2f to
1f6f5f7
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
0968fea to
80c897a
Compare
…rmony_name - ruff: reformat harmony.py (auto-fixed) - Add sanitize_harmony_name() and _HARMONY_SPECIAL_TOKEN_STRS to harmony_utils.py so the import in harmony.py resolves. This duplicates the definition that PR vllm-project#35906 (harmony-token-sanitization) adds; the duplicate will be removed on rebase once that PR merges. - Add sanitize_harmony_name to harmony.py import block Signed-off-by: Will Deines <will@garr.io>
…and recipients GPT-OSS models generate Harmony protocol control tokens (<|channel|>, <|constrain|>, <|start|>, <|end|>, <|message|>) in unexpected positions during output generation, causing tool name contamination, recipient misrouting, and parser crashes. Three layers of defense: 1. sanitize_harmony_name() — pure string function that strips leaked control token strings from tool/recipient names. 2. ResilientStreamableParser — wrapper around StreamableParser that recovers from missing <|start|> tokens between messages and malformed <|constrain|> tokens in headers. 3. Routing-level fallback — sanitized-to-empty recipients fall through to _parse_message_no_recipient() instead of being misrouted. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites. Signed-off-by: Will Deines <will@garr.io>
…ents, remove redundancy - Add sanitize_harmony_recipient() that splits on '.', sanitizes each part, and rejoins to preserve dotted structure (e.g. browser<|channel|>.search becomes browser.search instead of being truncated to browser) - Sanitize recipients on messages returned by ResilientStreamableParser.messages to prevent control token injection in multi-turn conversation history - Remove redundant sanitization in parser_state_to_response_output since ResilientStreamableParser.current_recipient already handles it - Use sanitize_harmony_recipient for full recipient strings in context.py and harmony.py routing logic Signed-off-by: Will Deines <will@garr.io>
…line Signed-off-by: Will Deines <will@garr.io>
… test assertion - Use self._inner.role instead of hardcoded "assistant" in ResilientStreamableParser.process for correctness with non-assistant roles - Add assertion for second message content in test_constrain_in_header_skipped Signed-off-by: Will Deines <will@garr.io>
The triggered_tags grammar's sub-dispatch loop allows all tokens between triggered tags. The model can generate trailing text after a <|end|> before EOS (e.g. restating the answer as plain text after a tool call). These free-text tokens arrive in EXPECT_START state, causing HarmonyError. Add Pattern 3 to ResilientStreamableParser: silently discard any token in EXPECT_START state that is not <|start|>. This preserves all completed messages while ignoring inter-message garbage tokens. Signed-off-by: Will Deines <will@garr.io>
80c897a to
714ad90
Compare
…ttribute Track inner parser's current_role during process() calls and use the cached value for Pattern 1 recovery, fixing the broken self._inner.role reference. Pattern 1 only fires after <|end|> (EXPECT_START state), so at least one message has been processed and _last_known_role is guaranteed non-None. Signed-off-by: Will Deines <will@garr.io>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b871dfc6e6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if state == StreamState.EXPECT_START and token_id != _TOK_START: | ||
| return |
There was a problem hiding this comment.
Avoid dropping EXPECT_START garbage without fixing
last_tok
When a streaming Harmony response emits stray plain text after a completed message, this branch discards the token so it never becomes part of parser.messages, but StreamingHarmonyContext.append_output() still records that raw token as self.last_tok. On the next built-in-tool turn, vllm/entrypoints/openai/responses/context.py:920-924 walks backward through the re-rendered prompt until it finds last_tok; because the dropped token is no longer present, that loop runs off the front of rendered_tokens and raises IndexError. So the new Pattern 3 recovery path can crash exactly the malformed outputs it is supposed to tolerate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — validated and fixed in 8c100e4.
P1 fix: ResilientStreamableParser now tracks _last_consumed_token, updated only when a token is actually forwarded to the inner parser (not on Pattern 2/3 discards). StreamingHarmonyContext.append_output() reads parser.last_consumed_token instead of the raw loop variable. Also added a bounds check in render_for_completion() as a safety net. Three new tests cover normal tracking, Pattern 3 discard, and Pattern 2 skip mode.
| sanitized_parts = [p for p in sanitized_parts if p] | ||
| return ".".join(sanitized_parts) |
There was a problem hiding this comment.
Return empty recipient when a dotted component sanitizes away
If a leaked control token wipes out an entire dotted component, for example functions.<|constrain|>json or container.<|channel|>commentary, filtering empty parts here turns the recipient into functions or container instead of an empty string. Downstream code then treats those as real recipients: harmony_to_response_output() falls through to _parse_mcp_call for functions, while HarmonyContext.need_builtin_tool_call() no longer recognizes container as a tool invocation. That means malformed tool calls are silently misrouted instead of hitting the intended empty-recipient fallback.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — validated and fixed in 8c100e4.
P2 fix: sanitize_harmony_recipient() now returns "" when any dotted component sanitizes to empty, instead of filtering out empty parts and rejoining. This ensures functions.<|constrain|>json becomes "" (not "functions"), correctly triggering the no-recipient fallback instead of misrouting to _parse_mcp_call(). Two new tests cover functions.<|constrain|>json and container.<|channel|>commentary.
…x recipient misrouting
Bug 1: ResilientStreamableParser.process() silently discards tokens in
Pattern 2 (skip mode) and Pattern 3 (free text in EXPECT_START), but
StreamingHarmonyContext.append_output() unconditionally set last_tok to
the most recent token. If that token was discarded, render_for_completion()
would fail with IndexError searching for it. Now track last_consumed_token
in the parser and only update last_tok when a token was actually forwarded.
Also add a bounds check in render_for_completion() as a safety net.
Bug 2: sanitize_harmony_recipient() filtered out empty parts after
sanitization, collapsing e.g. "functions.<|constrain|>json" to "functions"
(bare), which failed startswith("functions.") checks and fell through to
incorrect routing. Now return empty string when any component sanitizes to
empty, triggering the safe no-recipient fallback.
Signed-off-by: Will Deines <will@garr.io>
8c100e4 to
3ce8294
Compare
…ization Signed-off-by: Will Deines <will@garr.io>
Summary
GPT-OSS models leak Harmony protocol control tokens (
<|channel|>,<|constrain|>,<|start|>,<|end|>,<|message|>) into tool names and recipient fields during generation. This causes:manage_cart<|channel|>commentaryinstead ofmanage_cart, corrupting function call routing and causing infinite tool-call loops<|constrain|>as recipient — e.g.<|constrain|>jsonmatches no routing pattern, falls through to MCP handler or raises errors<|start|>between channels — model omits start token between consecutive outputs, causingStreamableParserto throwHarmonyError<|constrain|>in headers — produces garbage in recipient or content_type fields<|end|>before starting the next channel message, causingHarmonyErrorin EXPECT_START stateThree layers of defense
sanitize_harmony_name()/sanitize_harmony_recipient()— Pure string functions that strip leaked control tokens.sanitize_harmony_name()finds the earliest control token and returns only the text before it.sanitize_harmony_recipient()extends this for dotted recipients (e.g.browser.search) by splitting on., sanitizing each part individually, and rejoining — preserving the dotted structure while cleaning each component (e.g.browser<|channel|>.search→browser.search). If any dotted component is entirely consumed by control tokens (e.g.functions.<|constrain|>json), the whole recipient is considered corrupt and returns empty string, triggering the safe no-recipient fallback rather than misrouting. Applied at all input parsing, output dispatching, tool routing, and streaming delta extraction sites.ResilientStreamableParser— Drop-in wrapper aroundStreamableParserthat intercepts three malformed token patterns:<|start|>recovery — when parser expects<|start|>but gets<|channel|>, inject the missing<|start|>+ role tokens. Role is tracked dynamically fromself._inner.current_roleduring processing (not hardcoded), so it works correctly for any role.<|constrain|>in headers — skip tokens until<|message|>or<|end|><|start|>. Thetriggered_tagsgrammar allows free tokens in the sub-dispatch loop, so the model may generate trailing text after a<|end|>that isn't part of any channel. Discarding these tokens preserves all completed messages while ignoring inter-message garbage.last_consumed_tokentracking — tracks which tokens were actually forwarded to the inner parser, so that callers (e.g.StreamingHarmonyContext.append_output()) don't record discarded tokens inlast_tok. Without this,render_for_completion()couldIndexErrorsearching for a token that was never rendered.Routing-level fallback — After sanitization, if a recipient becomes empty string, treat it as
Noneso it falls through to_parse_message_no_recipient()(produces a user-visible message instead of a misrouted MCP call).Related Issues & PRs
<|constrain|>misroutingResilientStreamableParserhandles the parser-level recovery<|channel|>from recipients; our approach is broader (all token types, structured recipients)ResilientStreamableParserfor reliable parsing of forced tool callsDecisions to debate
Wrapper vs. monkey-patch for
StreamableParser: We chose a wrapper class (ResilientStreamableParser) that delegates all properties to the inner parser, rather than monkey-patching or subclassing. This meansget_streamable_parser_for_assistant()returns our wrapper instead of a rawStreamableParser. All existing consumers work unchanged, butisinstance(parser, StreamableParser)checks would fail — we haven't found any such checks in the codebase, but reviewers should flag if they know of one.String-level vs. token-level sanitization:
sanitize_harmony_name()operates on strings, not token IDs. This is intentional — by the time we have amessage.recipientorfunction_name, it's already a string. Token-level recovery is handled separately byResilientStreamableParser.process(). The two layers are complementary, not redundant.Hardcoded token IDs (200003, 200005–200008): The
ResilientStreamableParserreferences specific GPT-OSS encoding token IDs. These are stable across theharmony-gpt-ossencoding but would break if a different encoding were used. We could look these up dynamically from the encoding, but the IDs are well-established constants and dynamic lookup adds complexity for no current benefit.Sanitization applied broadly (defense in depth): We sanitize at input parsing, output dispatch, tool routing, AND streaming — even though the
ResilientStreamableParsershould catch most issues at the token level. This is intentional defense-in-depth: if a code path bypasses the resilient parser (e.g. directMessageconstruction in tests or fromprevious_input_messages), the string-level sanitization still catches leaked tokens.Empty-after-sanitization →
Nonefallback: When sanitizing a recipient produces an empty string, we convert it toNonerather than raising an error. This causes the message to be treated as a "no-recipient" message (preamble), which is the safest fallback — the user sees the text content rather than getting a routing error. This is a design choice that could mask other bugs; an alternative would be to log a warning.Structured recipient sanitization (
sanitize_harmony_recipient): Rather than applyingsanitize_harmony_name()to the full dotted string (which would truncatefunctions.get_weather<|channel|>commentaryto justfunctions), we split on., sanitize each part, and rejoin. If any component sanitizes to empty (e.g.functions.<|constrain|>jsonwhere the second component is entirely a control token), the whole recipient is treated as corrupt and returns empty string — this prevents partial recipients like"functions"from failingstartswith("functions.")checks and falling through to incorrect routing. The downside is slightly more complexity; the upside is that dotted names likebrowser.searchorcontainer.execsurvive contamination in any component, while fully-corrupt recipients safely fall back to no-recipient handling.Files changed
vllm/entrypoints/openai/parser/harmony_utils.pysanitize_harmony_name(),sanitize_harmony_recipient()(returns empty when any component is fully consumed),ResilientStreamableParser(3 patterns + dynamic role tracking +last_consumed_tokenproperty), wrapget_streamable_parser_for_assistant(), sanitize input parsingvllm/entrypoints/openai/responses/harmony.pyharmony_to_response_output) + input parsing (response_input_to_harmony,_parse_chat_format_message) + function name extraction (_parse_function_call)vllm/entrypoints/openai/responses/context.pyneed_builtin_tool_call,call_tool,call_search_tool,call_container_tool); fixappend_output()to only updatelast_tokfrom consumed tokens; add bounds check inrender_for_completion()to preventIndexErrorvllm/entrypoints/openai/chat_completion/stream_harmony.pyextract_harmony_streaming_delta)tests/entrypoints/openai/parser/test_harmony_utils.pyTestSanitizeHarmonyName(7 cases),TestSanitizeHarmonyRecipient(10 cases),TestResilientStreamableParser(7 cases incl. message recipient sanitization +last_consumed_tokentracking)tests/entrypoints/openai/responses/test_harmony_utils.pyTestHarmonyOutputSanitization(2 cases: contaminated recipient → message, contaminated function name → cleaned)Test plan
TestSanitizeHarmonyName— 7 cases: clean passthrough,<|channel|>stripping,<|constrain|>stripping, pure token → empty, multiple tokens → earliest wins, empty input, trailing whitespaceTestSanitizeHarmonyRecipient— 10 cases: clean dotted, clean simple, contaminated first part, contaminated second part, pure control token, functions dotted contaminated, empty string, container dotted contaminated, full component contamination returns empty, container full component contamination returns emptyTestResilientStreamableParser— 7 cases: normal sequence unchanged, missing<|start|>recovery,<|constrain|>in header skip, message recipients sanitized,last_consumed_tokentracks normal processing, Pattern 3 discarded tokens not inlast_consumed_token, Pattern 2 skip mode discarded tokens not inlast_consumed_tokenTestHarmonyOutputSanitization— 2 cases:<|constrain|>jsonrecipient → message output, contaminated function name → cleaned