Skip to content

Commit 714ad90

Browse files
committed
fix(harmony): discard free text between harmony channel messages
The triggered_tags grammar's sub-dispatch loop allows all tokens between triggered tags. The model can generate trailing text after a <|end|> before EOS (e.g. restating the answer as plain text after a tool call). These free-text tokens arrive in EXPECT_START state, causing HarmonyError. Add Pattern 3 to ResilientStreamableParser: silently discard any token in EXPECT_START state that is not <|start|>. This preserves all completed messages while ignoring inter-message garbage tokens. Signed-off-by: Will Deines <will@garr.io>
1 parent 599f35c commit 714ad90

1 file changed

Lines changed: 8 additions & 0 deletions

File tree

vllm/entrypoints/openai/parser/harmony_utils.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,14 @@ def process(self, token_id: int) -> None:
122122
self._inner.process(token_id)
123123
return
124124

125+
# Pattern 3: free text between harmony messages (e.g. model outputs plain
126+
# text after a <|end|> before starting the next channel message).
127+
# The triggered_tags grammar allows free tokens in the sub-dispatch loop,
128+
# so the model may generate trailing text that isn't part of any channel.
129+
# Silently discard these tokens rather than crashing with HarmonyError.
130+
if state == StreamState.EXPECT_START and token_id != _TOK_START:
131+
return
132+
125133
# Pattern 2: <|constrain|> during HEADER → enter skip mode
126134
if state == StreamState.HEADER and token_id == _TOK_CONSTRAIN:
127135
self._skip_until_message_or_end = True

0 commit comments

Comments
 (0)