Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Nov 4, 2025

Summary

  • Extend telemetry _safe_json to serialize any Pydantic BaseModel via model_dump(mode="json", exclude_none=True) to avoid circular-reference and Pydantic serialization warnings when logging complex objects (e.g., tools/specs) that may appear in the log context for remote agent/server scenarios
  • Add targeted test ensuring logging succeeds when the log context includes a self-referencing Pydantic model with excluded runtime-only fields

Problem
When running the agent remotely (Docker/remote runtime), enabling log_completions caused telemetry logging to fail:

  • If the default log directory wasn't present inside the container: log_dir does not exist: logs/completions
  • After creating a directory, log files were created but empty, with warnings like Circular reference detected.
    This happened because Telemetry.log_llm_call attempted to JSON-serialize arbitrary objects from the logging context (e.g., Pydantic-based tool/config objects) using generic fallbacks (dict), which can include self-references or non-serializable fields (like executors), leading to circular references and failed writes.

Changes

  • telemetry.py: update _safe_json
    • Handle ModelResponse and ResponsesAPIResponse (unchanged) via model_dump(mode="json", exclude_none=True)
    • NEW: handle any Pydantic BaseModel via model_dump(mode="json", exclude_none=True) to ensure safe, cycle-free serialization respecting field exclusions
    • Fallback to dict or str(obj) remains for plain objects
  • tests/sdk/llm/test_llm_telemetry.py: add test_log_completion_with_pydantic_objects_in_context
    • Simulates a Pydantic model with a self-referencing, excluded field (executor)
    • Verifies no circular reference warnings and that a valid JSON log is produced

Compatibility / Impact

  • Backward compatible: only improves robustness of telemetry serialization. Existing behavior for non-Pydantic objects remains unchanged.
  • No public API changes.

Testing

  • Ran pre-commit hooks on changed files
  • Ran telemetry tests (35/35 pass) and the new test passes
  • Existing integration tests for log_completions also pass locally

Docs

  • No docs changes required; this is an internal robustness fix

Fixes #1027

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:523e655-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-523e655-python \
  ghcr.io/openhands/agent-server:523e655-python

All tags pushed for this build

ghcr.io/openhands/agent-server:523e655-golang-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-amd64
ghcr.io/openhands/agent-server:523e655-golang-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-arm64
ghcr.io/openhands/agent-server:523e655-java-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-amd64
ghcr.io/openhands/agent-server:523e655-java-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-arm64
ghcr.io/openhands/agent-server:523e655-python-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-amd64
ghcr.io/openhands/agent-server:523e655-python-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-arm64
ghcr.io/openhands/agent-server:523e655-golang
ghcr.io/openhands/agent-server:523e655-java
ghcr.io/openhands/agent-server:523e655-python

About Multi-Architecture Support

  • Each variant tag (e.g., 523e655-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 523e655-python-amd64) are also available if needed

- Extend _safe_json to handle any Pydantic BaseModel via model_dump(mode="json", exclude_none=True)
  to avoid circular references when logging tool/context objects in remote agent/server scenarios
- Add test that simulates a self-referencing Pydantic object in log context to ensure no warnings

Fixes #1027

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm/utils
   telemetry.py1355757%87, 93, 102, 106, 109, 114–115, 150, 157, 179, 183–184, 192–194, 200, 217–219, 222–225, 227, 235–236, 240–242, 245–248, 253, 258, 261, 266, 269, 275, 281, 283, 286–287, 292, 297–301, 308–309, 312, 314, 317–320
TOTAL11664540753% 

enyst and others added 3 commits November 4, 2025 21:46
- Remove fallback/local import of BaseModel; rely on top-level import
- Keep robust serialization for Pydantic models via model_dump(mode="json", exclude_none=True)

Co-authored-by: openhands <[email protected]>
- Import BaseModel and Field at module top
- Remove mid-function import

Co-authored-by: openhands <[email protected]>
@blacksmith-sh blacksmith-sh bot requested a review from xingyaoww November 7, 2025 12:59
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 7, 2025

[Automatic Post]: I have assigned @xingyaoww as a reviewer based on git blame information. Thanks in advance for the help!

@enyst enyst changed the title Fix: robust LLM telemetry logging for remote agent/server runs Fix: LLM telemetry logging Nov 8, 2025
@enyst enyst merged commit 572ea21 into main Nov 8, 2025
21 checks passed
@enyst enyst deleted the openhands/fix-remote-llm-telemetry-logging branch November 8, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM logging does not work when running agent remotely

3 participants