Skip to content

Conversation

@ryanhoangt
Copy link
Collaborator

@ryanhoangt ryanhoangt commented Nov 7, 2025

This PR removes empty text content block, which causes 400 errors for kimi-k2-thinking:

litellm.BadRequestError: Error                    
                             code: 400 - {'error': {'message':                     
                             'litellm.BadRequestError:                             
                             MoonshotException - Invalid request:                  
                             text content is empty. Received Model                 
                             Group=moonshot/kimi-k2-thinking\nAvai                 
                             lable Model Group Fallbacks=None',                    
                             'type': 'invalid_request_error',                      
                             'param': None, 'code': '400'}} 

Issue: #1053


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:0eb1d8c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-0eb1d8c-python \
  ghcr.io/openhands/agent-server:0eb1d8c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:0eb1d8c-golang-amd64
ghcr.io/openhands/agent-server:0eb1d8c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:0eb1d8c-golang-arm64
ghcr.io/openhands/agent-server:0eb1d8c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:0eb1d8c-java-amd64
ghcr.io/openhands/agent-server:0eb1d8c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:0eb1d8c-java-arm64
ghcr.io/openhands/agent-server:0eb1d8c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:0eb1d8c-python-amd64
ghcr.io/openhands/agent-server:0eb1d8c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:0eb1d8c-python-arm64
ghcr.io/openhands/agent-server:0eb1d8c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:0eb1d8c-golang
ghcr.io/openhands/agent-server:0eb1d8c-java
ghcr.io/openhands/agent-server:0eb1d8c-python

About Multi-Architecture Support

  • Each variant tag (e.g., 0eb1d8c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 0eb1d8c-python-amd64) are also available if needed

@ryanhoangt ryanhoangt requested a review from xingyaoww November 7, 2025 09:42
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm
   message.py27211956%43, 48, 50, 67–69, 71–74, 76, 97, 99, 104, 177, 181, 198–203, 240, 247, 250, 267, 286, 289, 292, 306–307, 323, 344, 349–351, 357–358, 364, 366, 376, 385–391, 405, 407–408, 410–417, 420, 433, 435, 437–442, 450–451, 453–454, 464–465, 469–473, 476–479, 481–482, 485, 490–492, 499, 501, 518, 532, 548, 573–575, 577–578, 582–585, 589–591, 594–598, 600–602, 604, 612–613, 630–631
TOTAL11932552853% 

# tool call keys are added in to_chat_dict to centralize behavior
return message_dict

def _normalize_tool_call_content(self, content: Any) -> Any | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it to a util function outside the LLM class

Also let's improve the return type annotation, should be str | list[dict[str, str]] | None?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that point, I thought so too, but it's actually in Message... 🤔

More importantly, I seem to recall we've been here before... Now we remove the content if it's empty, for all LLMs. I don't remember exactly, but I have a bad feeling that some LLMs may fail without content 😢

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point @enyst

I have a bad feeling that some LLMs may fail without content 😢
in this case, maybe we just want to:

  1. add kimi k2 thinking to regression test
  2. and when other LLM breaks this, we fix them and add them to integration tests as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands can you add kimi-k2-thinking to the regression test workflow too? Should be litellm_proxy/kimi-k2-thinking

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I've successfully added kimi-k2-thinking to the regression test workflow as requested!

Changes Made:

File Modified: .github/workflows/integration-runner.yml

Added a new job configuration to the integration test matrix:

- name: Kimi K2 Thinking
  run-suffix: kimi_k2_run
  llm-config:
      model: litellm_proxy/kimi-k2-thinking

This configuration follows the same pattern as the existing models (Claude Sonnet 4.5, GPT-5 Mini, and Deepseek Chat) and will now run the regression tests against the kimi-k2-thinking model via the LiteLLM proxy.

Commit Details:

  • Commit: a1d7f6d
  • Message: "Add kimi-k2-thinking to regression test workflow"
  • Branch: ht/support-kimi-k2-thinking (already pushed to remote)

The kimi-k2-thinking model will now be included in:

  • Scheduled nightly integration test runs
  • Integration tests triggered via workflow_dispatch
  • Integration tests triggered when the 'integration-test' label is added to a PR

View full conversation

@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.42
Models Tested: 3
Timestamp: 2025-11-10 15:02:50 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.37
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02

📋 Detailed Results

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_a1d7f6d_gpt5_mini_run_N7_20251110_145839

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.37
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_a1d7f6d_sonnet_run_N7_20251110_145841

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_a1d7f6d_deepseek_run_N7_20251110_145843

if self.role == "assistant" and self.tool_calls:
message_dict["tool_calls"] = [tc.to_chat_dict() for tc in self.tool_calls]
if "content" in message_dict:
normalized_content = self._normalize_tool_call_content(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands Read this PR diff carefully. First, I don't really like the name "normalize", it means everything so nothing, name this... remove_empty_content_strings or remove_content_if_empty, so it says what it does.

Second, look at the whole file message.py and what we do, maybe it's better to refactor a little bit so we do the removal inside the helper method?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! enyst can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of changes applied to address your comment on openhands-sdk/openhands/sdk/llm/message.py:

What I changed

  • Renamed the helper to a clearer, explicit name:
    • _normalize_tool_call_content → _remove_content_if_empty
  • Refactored to perform the removal inside the helper:
    • The helper now mutates message_dict in-place, removing:
      • content if it’s an empty/whitespace-only string
      • empty “text” items from a list; drops content entirely if the list becomes empty
    • Called from to_chat_dict only when role == "assistant" and tool_calls are present, keeping other flows unchanged

Why

  • The new name is explicit (“remove_content_if_empty”) and avoids vague terminology
  • Encapsulating the removal logic in a helper keeps to_chat_dict simpler and centralizes behavior around the tool-calls case

Checklist

  • Rename is explicit and aligned with your suggestion ✔
  • Removal happens inside the helper, called from to_chat_dict ✔
  • Behavior unchanged for non-tool-call messages ✔
  • Changes are minimal and scoped to a single file ✔
  • Pre-commit (ruff format/lint, pycodestyle, pyright) passed on the file ✔
  • Changes pushed to branch ht/support-kimi-k2-thinking ✔

Notes

  • File changed: openhands-sdk/openhands/sdk/llm/message.py
  • If you prefer the alternative name remove_empty_content_strings, I can switch it—let me know your preference.

View full conversation

- Rename helper to _remove_content_if_empty (more explicit than 'normalize')
- Do removal in-place inside helper and call it from to_chat_dict

This keeps provider compatibility (e.g., kimi-k2-thinking rejects empty text blocks) while leaving other message flows unchanged.

Co-authored-by: openhands <[email protected]>
@ryanhoangt ryanhoangt added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.40
Models Tested: 3
Timestamp: 2025-11-10 15:41:19 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.34

📋 Detailed Results

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_5f413c9_gpt5_mini_run_N7_20251110_153848

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_5f413c9_deepseek_run_N7_20251110_153840

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.34
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_5f413c9_sonnet_run_N7_20251110_153840

@ryanhoangt ryanhoangt added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 95.2%
Total Cost: $0.40
Models Tested: 3
Timestamp: 2025-11-10 15:51:33 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_gpt_5_mini_2025_08_07 85.7% 6/7 7 $0.04
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.34
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02

📋 Detailed Results

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 85.7% (6/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_c3ec892_gpt5_mini_run_N7_20251110_154902

Failed Tests:

  • t02_add_bash_hello: Shell script is not executable (Cost: $0.0031)

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.34
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_c3ec892_sonnet_run_N7_20251110_154901

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_c3ec892_deepseek_run_N7_20251110_154901

@ryanhoangt ryanhoangt added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.42
Models Tested: 3
Timestamp: 2025-11-10 16:01:02 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.30
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.08
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04

📋 Detailed Results

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.30
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_c187634_sonnet_run_N7_20251110_155549

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.08
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_c187634_deepseek_run_N7_20251110_155557

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_c187634_gpt5_mini_run_N7_20251110_155547

@ryanhoangt ryanhoangt added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.40
Models Tested: 3
Timestamp: 2025-11-10 16:31:05 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.05
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.33

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_c4a7a75_deepseek_run_N7_20251110_162806

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.05
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_c4a7a75_gpt5_mini_run_N7_20251110_162805

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.33
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_c4a7a75_sonnet_run_N7_20251110_162809

@ryanhoangt ryanhoangt removed the integration-test Runs the integration tests and comments the results label Nov 10, 2025
@ryanhoangt ryanhoangt added the integration-test Runs the integration tests and comments the results label Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.45
Models Tested: 3
Timestamp: 2025-11-10 17:01:26 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.03
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.38
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.03
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_0bc52b2_deepseek_run_N7_20251110_165759

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.38
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_0bc52b2_sonnet_run_N7_20251110_165754

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_0bc52b2_gpt5_mini_run_N7_20251110_165800

@ryanhoangt ryanhoangt added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 10, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.40
Models Tested: 3
Timestamp: 2025-11-10 17:11:25 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.34
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04

📋 Detailed Results

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.34
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_2fb7339_sonnet_run_N7_20251110_170831

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_2fb7339_deepseek_run_N7_20251110_170836

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_2fb7339_gpt5_mini_run_N7_20251110_170836

@openhands-ai
Copy link

openhands-ai bot commented Nov 10, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1093 at branch `ht/support-kimi-k2-thinking`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@ryanhoangt
Copy link
Collaborator Author

ryanhoangt commented Nov 10, 2025

@xingyaoww I think it's a bit tricky to test this PR via labels due to forked PR support, I ran it manually in the Actions tab and the results LGTM!

https://github.com/OpenHands/software-agent-sdk/actions/runs/19239903002

image

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm

openhands-agent and others added 2 commits November 10, 2025 19:53
Instead of silently converting non-string text values to strings,
raise a ValueError when a text content item has a non-string text value.
This ensures we catch invalid message states early rather than
attempting to handle them gracefully.

Co-authored-by: openhands <[email protected]>
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xingyaoww xingyaoww enabled auto-merge (squash) November 10, 2025 20:08
@xingyaoww xingyaoww merged commit 6b2b671 into main Nov 10, 2025
15 checks passed
@xingyaoww xingyaoww deleted the ht/support-kimi-k2-thinking branch November 10, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants