Refactor: shared token-budget helpers for condensers; apply to LLMSummarizingCondenser #1342

enyst · 2025-12-06T01:03:31Z

Summary
This PR addresses review feedback on #912 (token-budget–aware LLMSummarizingCondenser) by extracting the repeated token-budget and counting logic into reusable helpers on CondenserBase, and refactoring LLMSummarizingCondenser to use them. This keeps the PR’s goal intact (token-aware condensation) while improving maintainability for future condensers.

Changes

Add to CondenserBase:
- compute_token_budget(llm, token_margin_ratio): derive usable token budget from llm.max_input_tokens / max_output_tokens with margin
- estimate_token_count(llm, events): convert events to messages and count tokens via llm.get_token_count
- max_tail_within_budget(view, llm, keep_first, budget): binary-search longest tail under budget
Update LLMSummarizingCondenser to use these helpers in should_condense and get_condensation
No behavior change when model limits are unknown; maintains fallback to event-count logic

Why

Addresses PR DRAFT: Make LLMSummarizingCondenser token-budget aware (per model max_input_tokens) #912 review request to pull the token-aware logic into a shared function reusable by condenser authors
Clarifies that the condenser may use a different LLM than the agent by having helpers take an explicit llm param
Reduces duplication and centralizes tricky budget/estimation details

Compatibility

Backward compatible: if budget computation/counting fails, behavior falls back to the existing event-count logic (max_size and unhandled_condensation_request handling unchanged).

Testing

Pre-commit hooks (ruff format/lint, pyright, etc.) pass locally.
Existing condenser tests continue to pass.

Open questions

Should the token margin default (10%) be made configurable globally or on Agent presets?

Closes #1340 by making condenser path more robust and future-proof to token-aware logic (together with the exception mapping already in place).

Co-authored-by: openhands [email protected]

@enyst can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:76c408b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-76c408b-python \
  ghcr.io/openhands/agent-server:76c408b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:76c408b-golang-amd64
ghcr.io/openhands/agent-server:76c408b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:76c408b-golang-arm64
ghcr.io/openhands/agent-server:76c408b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:76c408b-java-amd64
ghcr.io/openhands/agent-server:76c408b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:76c408b-java-arm64
ghcr.io/openhands/agent-server:76c408b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:76c408b-python-amd64
ghcr.io/openhands/agent-server:76c408b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:76c408b-python-arm64
ghcr.io/openhands/agent-server:76c408b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:76c408b-golang
ghcr.io/openhands/agent-server:76c408b-java
ghcr.io/openhands/agent-server:76c408b-python

About Multi-Architecture Support

Each variant tag (e.g., 76c408b-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 76c408b-python-amd64) are also available if needed

…nput_tokens - Add token-aware should_condense that compares tokenized messages against a budget derived from llm.max_input_tokens, llm.max_output_tokens, and a configurable token_margin_ratio - Choose tail size via binary search to keep as much recent context as fits, falling back to event-count heuristic when limits are unknown - Preserve backward compatibility; default event-count behavior remains when model limits are absent Co-authored-by: openhands <[email protected]>

…licts, preserving token-budget aware logic and unhandled request sizing. Co-authored-by: openhands <[email protected]>

… in LLMSummarizingCondenser (addresses PR #912 review: reuse/block extract).\n\n- Add compute_token_budget, estimate_token_count, max_tail_within_budget\n- Refactor LLMSummarizingCondenser to call shared helpers\n\nCo-authored-by: openhands <[email protected]>

enyst and others added 4 commits October 26, 2025 20:31

Merge branch 'main' into openhands-workspace-c4hartf7

36f18d3

Merge main into branch and resolve LLMSummarizingCondenser merge conf…

0e0e048

…licts, preserving token-budget aware logic and unhandled request sizing. Co-authored-by: openhands <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: shared token-budget helpers for condensers; apply to LLMSummarizingCondenser #1342

Refactor: shared token-budget helpers for condensers; apply to LLMSummarizingCondenser #1342

enyst commented Dec 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Refactor: shared token-budget helpers for condensers; apply to LLMSummarizingCondenser #1342

Are you sure you want to change the base?

Refactor: shared token-budget helpers for condensers; apply to LLMSummarizingCondenser #1342

Conversation

enyst commented Dec 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

enyst commented Dec 6, 2025 •

edited by github-actions bot

Loading