-
Notifications
You must be signed in to change notification settings - Fork 53
Add tool validator to enable tool calls for glm-4.6 #1017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@OpenHands /codereview-roasted read the diff and tell me if _decode_json_strings can be improved |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
SummaryI've completed a critical code review of the Review SummaryOverall Assessment: 🟡 Acceptable - The function solves a real problem (GLM-4 returning JSON-encoded strings instead of native arrays/objects) but has some complexity issues that could be improved. Key FindingsCritical Issues:
Improvement Opportunities:
Core InsightThe function does too much type introspection to detect unions. Instead of asking "is this a union?" and branching, it should just ask "what types could this field accept?" using Verdict: ✅ Worth merging after addressing the Union type handling brittleness and removing the redundant isinstance check after json.loads(). |
- Remove fragile Union type detection that checked typing.Union and types.UnionType separately - Use get_args() uniformly for both union and simple types, eliminating branching logic - Consolidate early exit conditions into single check for better readability - Remove unnecessary list() copy when iterating over data.items() - Remove unused imports: types module and Union from typing - Improve code comments to clarify json.loads() return types - Reduce function from 46 lines to 34 lines (-26%) while maintaining functionality This addresses code review feedback about treating normal cases as special cases and eliminates brittleness around Python version differences in union handling. Co-authored-by: openhands <[email protected]>
…correctly This commit addresses two critical issues identified in code review: 1. **Annotated type handling**: The validator now properly unwraps Annotated types before checking for list/dict, ensuring fields like `Annotated[list[str], Field(...)]` are correctly decoded from JSON strings. 2. **Field alias support**: Changed from using __annotations__ to model_fields, which properly handles field aliases and inherited fields. The validator now checks both field names and their aliases when looking up values in the input data. 3. **Union type detection fix**: Fixed a critical bug where we were checking the origin of type arguments instead of the type itself. For example, with `list[str]`, we were incorrectly extracting `str` instead of `list`. Changes: - Import Annotated, Union, and types module - Use cls.model_fields instead of cls.__annotations__ - Add logic to unwrap Annotated types before type checking - Fix Union type detection to check the type's origin, not its args - Properly handle both old-style Union and new-style (|) union syntax The validator now correctly handles: - `Annotated[list[str], Field(description='...')]` - Fields with aliases: `my_list: list[int] = Field(alias='myList')` - Union types: `list[str] | None` - Inherited fields from parent classes Co-authored-by: openhands <[email protected]>
|
Integrations tests ran locally for glm-4.6 |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_chat
litellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_claude_sonnet_4_5_20250929
|
|
@neubig And the MiniMax-M2 integration tests ran locally |
neubig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it'd be good to have a unit test to make sure we don't accidentally regress this later.
…ator This commit adds 16 unit tests to verify the _decode_json_strings validator works correctly for all edge cases: **Basic functionality:** - JSON strings decoded to native lists/dicts - Native lists/dicts pass through unchanged - Regular string fields not affected by JSON decoding **Advanced type handling:** - Annotated types (e.g., Annotated[list[str], Field(...)]) - Field aliases (e.g., Field(alias='myList')) - Optional/Union types (e.g., list[str] | None) - Nested structures (e.g., list[list[int]]) **Error handling:** - Invalid JSON strings rejected with ValidationError - JSON strings with wrong types rejected - Empty collections handled correctly **Edge cases:** - Unicode characters in JSON strings - Extra whitespace in JSON strings - Mixed native and JSON string values in same model All tests pass and validate that the validator correctly handles both JSON-encoded strings from models like GLM-4 and regular native objects from other models. Co-authored-by: openhands <[email protected]>
|
@OpenHands /codereview-roasted read the diff. I wonder if we can do this in tool.action_from_arguments inside agent.py instead this low-level data structure? |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
SummaryI've completed a comprehensive architectural code review of PR #1017 as requested. No code changes were made - this was a pure analysis task. Your QuestionYou asked whether the JSON string decoding fix should be:
My AnswerThe current low-level solution in Key Reasoning:✅ Current Solution (Schema validator) PROS:
❌ Alternative Solution (Agent preprocessing) CONS:
Addressing Your ConcernYour concern that "making the change here would potentially affect other part of the system in unintended ways" is valid, but mitigated by:
The "Good Taste" PerspectiveThe current solution eliminates special cases rather than creating them. Instead of scattering "LLM quirk handling" across multiple locations, it solves the data normalization problem once in the data validation layer where it belongs. Recommendation: Keep the current implementation. The broader scope is a feature, not a bug - it ensures consistent behavior across the entire SDK. |
@xingyaoww wdyt? |
|
@simonrosenberg, how about we create a helper function and use it in one place (i.e., when you json.loads the result returned by the model)? then the json.loads results are then passed to action_from_arguments -- in this way we still only need to modify one place. @OpenHands what do you think? /codereview-roasted |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
SummaryI've completed a code review analysis of PR #1017 in response to @simonrosenberg's question about whether the JSON string decoding logic should be moved from the low-level My Answer: Keep the Current Pydantic Validator ApproachKey Reasoning:
Addressing the "Unintended Impact" Concern: The validator only affects Schema creation from dicts, which is exactly where this LLM quirk manifests. It's actually beneficial that tests and other integrations automatically get this fix if they encounter the same issue. Key InsightThe Pydantic validator approach is data-structure-driven validation (elegant, type-safe), while the helper function approach would be heuristic-driven transformation (fragile, duplicates type knowledge). The validator follows the principle that data structures should validate themselves at boundaries. No code changes were made - this was a review-only analysis providing technical feedback on architectural decisions. |
Didn't we have a We did in V0 a looong time ago. (V0.00... 😅) |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
Summary of my work and recommendation
Comparison of approaches (brief):
Implementation sketch (if you want me to proceed):
No code changes were made; this was an analysis-only response. If you agree with the direction, I can implement the helper and refactor the call site(s). |
|
[Automatic Post]: This PR seems to be currently waiting for review. @neubig, @enyst, @xingyaoww, could you please take a look when you have a chance? |
This PR adds a Tool Pydantic validator that handles cases where LLMs (like GLM-4) return array/object values
as JSON strings instead of native JSON arrays/objects i.e.
<parameter=view_range>"[1, 100]" instead of
<parameter=view_range>[1, 100].
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:3444f22-pythonRun
All tags pushed for this build
About Multi-Architecture Support
3444f22-python) is a multi-arch manifest supporting both amd64 and arm643444f22-python-amd64) are also available if needed