[Bugfix] Sanitize malformed tool call recipients in Harmony parser#31677
[Bugfix] Sanitize malformed tool call recipients in Harmony parser#31677eous wants to merge 1 commit intovllm-project:mainfrom
Conversation
Some GPT-OSS base models occasionally generate malformed Harmony format sequences like `to=functions.bash<|channel|>commentary` instead of the correct `to=functions.bash <|constrain|>json`. This causes the function name to be parsed incorrectly as `bash<|channel|>commentary` instead of `bash`. This fix sanitizes the recipient string by stripping `<|channel|>` and everything after it before extracting the function name. The fix is applied in three locations to cover all API endpoints: - `harmony_utils.py`: /v1/responses (non-streaming) - `openai_tool_parser.py`: /v1/chat/completions (non-streaming) - `serving_chat_stream_harmony.py`: /v1/chat/completions (streaming) The /v1/responses streaming endpoint already worked correctly because it captures the recipient before malformed tokens can corrupt it. - Before: ~35% failure rate - After: 0% failure rate
There was a problem hiding this comment.
Code Review
This pull request addresses a bug where malformed tool call recipients from certain models were causing parsing failures. The fix involves sanitizing the recipient string in three different locations to handle these malformed sequences. The changes are correct and a test case has been added to verify the fix. My main feedback is to refactor the duplicated sanitization logic into a single shared utility function to improve code maintainability. I've left comments in the relevant files with suggestions on how to achieve this.
| # Sanitize recipient: the model sometimes outputs malformed sequences | ||
| # like "to=functions.bash<|channel|>commentary" instead of the correct | ||
| # "to=functions.bash <|constrain|>json". Strip the malformed part. | ||
| if "<|channel|>" in recipient: | ||
| recipient = recipient.split("<|channel|>")[0].strip() |
There was a problem hiding this comment.
This sanitization logic is duplicated in vllm/tool_parsers/openai_tool_parser.py and vllm/entrypoints/openai/serving_chat_stream_harmony.py. To improve maintainability and avoid code duplication, consider extracting this logic into a shared utility function. You could add a sanitize_recipient function in this file and then use it in all three places.
For example, you could add:
def sanitize_recipient(recipient: str) -> str:
"""Sanitizes a malformed tool call recipient by stripping `<|channel|>` and anything after it."""
if "<|channel|>" in recipient:
return recipient.split("<|channel|>")[0].strip()
return recipientAnd then replace this block with recipient = sanitize_recipient(recipient).
| # Sanitize recipient: the model sometimes outputs malformed sequences | ||
| # like "functions.bash<|channel|>commentary" instead of "functions.bash". | ||
| # Strip the malformed part. | ||
| sanitized_recipient = cur_recipient | ||
| if "<|channel|>" in sanitized_recipient: | ||
| sanitized_recipient = sanitized_recipient.split("<|channel|>")[0].strip() |
| # Sanitize recipient: the model sometimes outputs malformed | ||
| # sequences like "functions.bash<|channel|>commentary" | ||
| # instead of "functions.bash". Strip the malformed part. | ||
| recipient = msg.recipient | ||
| if "<|channel|>" in recipient: | ||
| recipient = recipient.split("<|channel|>")[0].strip() |
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
|
Hi @eous, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
I have seen this in the wild specifically with I agree with the Gemini review bot suggestion to centralize the logic to sanitize the recipient instead of doing it in three separate places. And, longer-term, this may be something we want to try to push down to the https://github.com/openai/harmony library instead of doing it directly in vLLM, as ultimately it's that library that is parsing wrong here and ended up with the channel tokens as part of the recipient. |
|
This pull request has merge conflicts that must be resolved before it can be |
|
Is there any possibility of merging this into the main branch? I’m experiencing the same issues and would appreciate being able to remove my workarounds. |
Sorry I gave up on this route and just ended up doing a full sft of gpt oss 20b to fix the issue and improve tool use in general (https://huggingface.co/eousphoros/persona_theta_20b_131k). If there is interest in this still I can clean it up and rebase but really OpenAI should just fix their base model to adhere to their own specification for harmony. |
|
I opened openai/harmony#97 to attempt to fix this in openai/harmony library itself. That would take a new release of that library and a bump of it to get this fix in vLLM. So, it could be reasonable to do something in the interim directly in vLLM. I regularly see this type of failure with gpt-oss-20b models in many deep multi-turn tool calling scenario. It often doesn't show up until at least 10 turns in with various tool calls. As far as I can tell so far, the model is being prompted properly and it's just struggling to not emit additional channel markers when it's constructing recipients. |
Some GPT-OSS base models occasionally generate malformed Harmony format sequences like
to=functions.bash<|channel|>commentaryinstead of the correctto=functions.bash <|constrain|>json. This causes the function name to be parsed incorrectly asbash<|channel|>commentaryinstead ofbash.This fix sanitizes the recipient string by stripping
<|channel|>and everything after it before extracting the function name. The fix is applied in three locations to cover all API endpoints:harmony_utils.py: /v1/responses (non-streaming)openai_tool_parser.py: /v1/chat/completions (non-streaming)serving_chat_stream_harmony.py: /v1/chat/completions (streaming)The /v1/responses streaming endpoint already worked correctly because it captures the recipient before malformed tokens can corrupt it.
Purpose
Improve tool usage success rate with gpt-oss-20b
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.