fix(harmony): discard free text between harmony channel messages

garrio-1 · garrio-1 · commit 714ad9026602 · 2026-03-18T10:00:34.000-04:00
The triggered_tags grammar's sub-dispatch loop allows all tokens between
triggered tags. The model can generate trailing text after a &lt;|end|&gt; before
EOS (e.g. restating the answer as plain text after a tool call). These
free-text tokens arrive in EXPECT_START state, causing HarmonyError.

Add Pattern 3 to ResilientStreamableParser: silently discard any token
in EXPECT_START state that is not &lt;|start|&gt;. This preserves all completed
messages while ignoring inter-message garbage tokens.

Signed-off-by: Will Deines &lt;will@garr.io&gt;
diff --git a/vllm/entrypoints/openai/parser/harmony_utils.py b/vllm/entrypoints/openai/parser/harmony_utils.py
@@ -122,6 +122,14 @@ def process(self, token_id: int) -> None:
             self._inner.process(token_id)
             return
 
+        # Pattern 3: free text between harmony messages (e.g. model outputs plain
+        # text after a <|end|> before starting the next channel message).
+        # The triggered_tags grammar allows free tokens in the sub-dispatch loop,
+        # so the model may generate trailing text that isn't part of any channel.
+        # Silently discard these tokens rather than crashing with HarmonyError.
+        if state == StreamState.EXPECT_START and token_id != _TOK_START:
+            return
+
         # Pattern 2: <|constrain|> during HEADER → enter skip mode
         if state == StreamState.HEADER and token_id == _TOK_CONSTRAIN:
             self._skip_until_message_or_end = True