-
Notifications
You must be signed in to change notification settings - Fork 53
Remote empty text content for tool calls & support kimi-k2-thinking
#1093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Coverage Report •
|
||||||||||||||||||||
| # tool call keys are added in to_chat_dict to centralize behavior | ||
| return message_dict | ||
|
|
||
| def _normalize_tool_call_content(self, content: Any) -> Any | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it to a util function outside the LLM class
Also let's improve the return type annotation, should be str | list[dict[str, str]] | None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that point, I thought so too, but it's actually in Message... 🤔
More importantly, I seem to recall we've been here before... Now we remove the content if it's empty, for all LLMs. I don't remember exactly, but I have a bad feeling that some LLMs may fail without content 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good point @enyst
I have a bad feeling that some LLMs may fail without content 😢
in this case, maybe we just want to:
- add kimi k2 thinking to regression test
- and when other LLM breaks this, we fix them and add them to integration tests as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands can you add kimi-k2-thinking to the regression test workflow too? Should be litellm_proxy/kimi-k2-thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I've successfully added kimi-k2-thinking to the regression test workflow as requested!
Changes Made:
File Modified: .github/workflows/integration-runner.yml
Added a new job configuration to the integration test matrix:
- name: Kimi K2 Thinking
run-suffix: kimi_k2_run
llm-config:
model: litellm_proxy/kimi-k2-thinkingThis configuration follows the same pattern as the existing models (Claude Sonnet 4.5, GPT-5 Mini, and Deepseek Chat) and will now run the regression tests against the kimi-k2-thinking model via the LiteLLM proxy.
Commit Details:
- Commit: a1d7f6d
- Message: "Add kimi-k2-thinking to regression test workflow"
- Branch: ht/support-kimi-k2-thinking (already pushed to remote)
The kimi-k2-thinking model will now be included in:
- Scheduled nightly integration test runs
- Integration tests triggered via workflow_dispatch
- Integration tests triggered when the 'integration-test' label is added to a PR
Co-authored-by: openhands <[email protected]>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
|
| if self.role == "assistant" and self.tool_calls: | ||
| message_dict["tool_calls"] = [tc.to_chat_dict() for tc in self.tool_calls] | ||
| if "content" in message_dict: | ||
| normalized_content = self._normalize_tool_call_content( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands Read this PR diff carefully. First, I don't really like the name "normalize", it means everything so nothing, name this... remove_empty_content_strings or remove_content_if_empty, so it says what it does.
Second, look at the whole file message.py and what we do, maybe it's better to refactor a little bit so we do the removal inside the helper method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! enyst can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of changes applied to address your comment on openhands-sdk/openhands/sdk/llm/message.py:
What I changed
- Renamed the helper to a clearer, explicit name:
- _normalize_tool_call_content → _remove_content_if_empty
- Refactored to perform the removal inside the helper:
- The helper now mutates message_dict in-place, removing:
- content if it’s an empty/whitespace-only string
- empty “text” items from a list; drops content entirely if the list becomes empty
- Called from to_chat_dict only when role == "assistant" and tool_calls are present, keeping other flows unchanged
- The helper now mutates message_dict in-place, removing:
Why
- The new name is explicit (“remove_content_if_empty”) and avoids vague terminology
- Encapsulating the removal logic in a helper keeps to_chat_dict simpler and centralizes behavior around the tool-calls case
Checklist
- Rename is explicit and aligned with your suggestion ✔
- Removal happens inside the helper, called from to_chat_dict ✔
- Behavior unchanged for non-tool-call messages ✔
- Changes are minimal and scoped to a single file ✔
- Pre-commit (ruff format/lint, pycodestyle, pyright) passed on the file ✔
- Changes pushed to branch ht/support-kimi-k2-thinking ✔
Notes
- File changed: openhands-sdk/openhands/sdk/llm/message.py
- If you prefer the alternative name remove_empty_content_strings, I can switch it—let me know your preference.
- Rename helper to _remove_content_if_empty (more explicit than 'normalize') - Do removal in-place inside helper and call it from to_chat_dict This keeps provider compatibility (e.g., kimi-k2-thinking rejects empty text blocks) while leaving other message flows unchanged. Co-authored-by: openhands <[email protected]>
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_deepseek_deepseek_chat
litellm_proxy_claude_sonnet_4_5_20250929
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 95.2% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5_mini_2025_08_07
Failed Tests:
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
litellm_proxy_gpt_5_mini_2025_08_07
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_chat
litellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_claude_sonnet_4_5_20250929
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_chat
litellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_gpt_5_mini_2025_08_07
|
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_claude_sonnet_4_5_20250929
litellm_proxy_deepseek_deepseek_chat
litellm_proxy_gpt_5_mini_2025_08_07
|
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
@xingyaoww I think it's a bit tricky to test this PR via labels due to forked PR support, I ran it manually in the Actions tab and the results LGTM! https://github.com/OpenHands/software-agent-sdk/actions/runs/19239903002
|
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall lgtm
Instead of silently converting non-string text values to strings, raise a ValueError when a text content item has a non-string text value. This ensures we catch invalid message states early rather than attempting to handle them gracefully. Co-authored-by: openhands <[email protected]>
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM

This PR removes empty text content block, which causes 400 errors for
kimi-k2-thinking:Issue: #1053
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:0eb1d8c-pythonRun
All tags pushed for this build
About Multi-Architecture Support
0eb1d8c-python) is a multi-arch manifest supporting both amd64 and arm640eb1d8c-python-amd64) are also available if needed