feat(stuck): Add group-based loop detection for multi-action LLM calls #1240
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
The existing stuck detection strategies in StuckDetector are primarily designed for single-action-per-LLM-call scenarios. However, in real-world usage, LLMs may generate multiple actions in one call (e.g., batch tool executions). The original methods fail to adapt to this scenario because:
They rely on strict 1:1 action-observation pairing, which breaks when multiple actions are generated in a single LLM response.
They count individual actions/observations instead of grouping by LLM calls, leading to false negatives in multi-action loops.
Changes
Added a new detection method: _is_stuck_repeating_in_recent_group
Groups events by llm_response_id (each group represents one LLM call and its corresponding actions/observations).
Detects two types of loops for multi-action scenarios:
Action-Error Loop: Checks if the same core actions and errors repeat ≥3 times across the latest 3 LLM call groups.
Action-Observation Loop: Checks if the same core actions and observations repeat ≥4 times across the latest 4 LLM call groups.
Added trigger condition in is_stuck():
The new method is activated only when the number of events ≥12 (ensures sufficient data for reliable detection without performance overhead).
Handles multi-action semantics:
When group=3 and len(repeat_err_action_counts) > 3: Indicates at least one LLM call in the latest 3 groups generated multiple actions.
When group=4 and len(repeat_action_counts) > 4: Indicates at least one LLM call in the latest 4 groups generated multiple actions.
Grouping by llm_response_id: Aligns with the Agent's "LLM decision → multiple actions → multiple observations" workflow.
Conservative trigger threshold: Requires ≥12 events to avoid false positives from incomplete interaction sequences.