-
Notifications
You must be signed in to change notification settings - Fork 50
fix(condenser): When condensation is triggered by the unhandled_condensation_request condition, it will result in empty condensation. #1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… based on the view size Signed-off-by: CLFutureX <[email protected]>
|
@csmith49 hey, PTAL |
|
@xingyaoww @enyst hey,PTAL, thanks |
|
@OpenHands please make build, then run pre-commit on all files |
|
I'm on it! enyst can track my progress at all-hands.dev |
| if view.unhandled_condensation_request: | ||
| # Condensation triggered by a condensation request | ||
| # should be calculated based on the view size. | ||
| target_size = len(view) // 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CLFutureX can you please tell, when does this happen, is it when the context limit exceeeded may have been reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CLFutureX can you please tell, when does this happen, is it when the context limit exceeeded may have been reached?
sure
path: openhands-sdk\openhands\sdk\agent\agent.py
`except LLMContextWindowExceedError:
# If condenser is available and handles requests, trigger condensation
if (
self.condenser is not None
and self.condenser.handles_condensation_requests()
):
logger.warning(
"LLM raised context window exceeded error, triggering condensation"
)
on_event(CondensationRequest())
return `
If calling the LLM goes over the window limit, it’ll trigger a LLMContextWindowExceedError. That’ll then create a CondensationRequest. After that, in the view file (openhands-sdk\openhands\sdk\context\view.py), if there’s a CondensationRequest, it’ll set unhandled_condensation_request to True. So when the should_condense method gets called next time, it’ll return True—and by then, most of the events in the view will be smaller than max_size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands See this discussion. I think @CLFutureX is correct, if context limit was exceeded we shouldn’t assume the condenser will work as if max_size of events was reach, it almost certainly wasn’t. What do you think about the solution or what quick reasonable alternatives do we have here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! enyst can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of my response to the PR comment
Answer to the question
- I agree with the premise: when LLMContextWindowExceedError occurs, event-count heuristics based on max_size are the wrong proxy. Basing target_size on the current view size for unhandled_condensation_request is directionally correct.
- However, as implemented, it can still yield an empty “forgotten” slice in small views (e.g., len(view) ≤ 2*(keep_first+1)), so condensation does nothing and doesn’t relieve token pressure.
Quick, reasonable improvements
-
Minimal fix on top of the PR (recommended now):
- Keep the PR’s change: if view.unhandled_condensation_request, set target_size = len(view) // 2.
- Add a simple clamp to ensure we always forget something when structurally possible:
- head = view[:keep_first]
- events_from_tail = target_size - len(head) - 1
- available_tail = max(0, len(view) - keep_first - 1)
- tail_keep = min(max(events_from_tail, 1), available_tail)
- forgotten = view[keep_first : len(view) - tail_keep]
- If available_tail == 0, there’s no tail space; optionally force-drop the first middle event if present to guarantee progress.
-
Quick alternatives (follow-ups):
- Token-aware condensation: compute tokens with llm.get_token_count(messages) and select how many tail events to keep (e.g., via binary search) so head + summary + tail fits a safe fraction of max_input_tokens. This directly addresses cases where a few long messages blow the window.
- Adaptive keep_first: if events_from_tail ≤ 0, temporarily reduce keep_first (bounded to 0) until at least one tail event can be kept and at least one middle event can be forgotten.
- Safety fallback: if the computed “forgotten” slice is empty, force forgetting at least the first middle event (index keep_first) and keep one tail event to ensure progress.
Recommendation
- Short-term: accept the PR’s intent and add the clamping guard so condensation on request cannot be a no-op.
- Medium-term: consider the token-aware approach so condensation responds to real token pressure rather than event counts.
enyst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I think PR 912 should also help here…
…enser.py to satisfy CI Co-authored-by: openhands <[email protected]>
thats right,this is a good way to optimize. |
Background: When condensation is triggered by the unhandled_condensation_request condition, if the number of events in the current view is less than half of the max_size, it will result in empty condensation.
forExample:
Optimization: When condensation is triggered by the unhandled_condensation_request condition, adjust the current target_size to len(view) // 2.