Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Dec 6, 2025

Summary
This PR addresses review feedback on #912 (token-budget–aware LLMSummarizingCondenser) by extracting the repeated token-budget and counting logic into reusable helpers on CondenserBase, and refactoring LLMSummarizingCondenser to use them. This keeps the PR’s goal intact (token-aware condensation) while improving maintainability for future condensers.

Changes

  • Add to CondenserBase:
    • compute_token_budget(llm, token_margin_ratio): derive usable token budget from llm.max_input_tokens / max_output_tokens with margin
    • estimate_token_count(llm, events): convert events to messages and count tokens via llm.get_token_count
    • max_tail_within_budget(view, llm, keep_first, budget): binary-search longest tail under budget
  • Update LLMSummarizingCondenser to use these helpers in should_condense and get_condensation
  • No behavior change when model limits are unknown; maintains fallback to event-count logic

Why

Compatibility

  • Backward compatible: if budget computation/counting fails, behavior falls back to the existing event-count logic (max_size and unhandled_condensation_request handling unchanged).

Testing

  • Pre-commit hooks (ruff format/lint, pyright, etc.) pass locally.
  • Existing condenser tests continue to pass.

Open questions

  • Should the token margin default (10%) be made configurable globally or on Agent presets?

Closes #1340 by making condenser path more robust and future-proof to token-aware logic (together with the exception mapping already in place).

Co-authored-by: openhands [email protected]

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:76c408b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-76c408b-python \
  ghcr.io/openhands/agent-server:76c408b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:76c408b-golang-amd64
ghcr.io/openhands/agent-server:76c408b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:76c408b-golang-arm64
ghcr.io/openhands/agent-server:76c408b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:76c408b-java-amd64
ghcr.io/openhands/agent-server:76c408b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:76c408b-java-arm64
ghcr.io/openhands/agent-server:76c408b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:76c408b-python-amd64
ghcr.io/openhands/agent-server:76c408b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:76c408b-python-arm64
ghcr.io/openhands/agent-server:76c408b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:76c408b-golang
ghcr.io/openhands/agent-server:76c408b-java
ghcr.io/openhands/agent-server:76c408b-python

About Multi-Architecture Support

  • Each variant tag (e.g., 76c408b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 76c408b-python-amd64) are also available if needed

enyst and others added 4 commits October 26, 2025 20:31
…nput_tokens

- Add token-aware should_condense that compares tokenized messages against a budget derived from llm.max_input_tokens, llm.max_output_tokens, and a configurable token_margin_ratio
- Choose tail size via binary search to keep as much recent context as fits, falling back to event-count heuristic when limits are unknown
- Preserve backward compatibility; default event-count behavior remains when model limits are absent

Co-authored-by: openhands <[email protected]>
…licts, preserving token-budget aware logic and unhandled request sizing.

Co-authored-by: openhands <[email protected]>
… in LLMSummarizingCondenser (addresses PR #912 review: reuse/block extract).\n\n- Add compute_token_budget, estimate_token_count, max_tail_within_budget\n- Refactor LLMSummarizingCondenser to call shared helpers\n\nCo-authored-by: openhands <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ContextWindowExceededError is not handled probably by the resolver

2 participants