Skip to content

Conversation

@CLFutureX
Copy link
Contributor

Background

The existing stuck detection strategies in StuckDetector are primarily designed for single-action-per-LLM-call scenarios. However, in real-world usage, LLMs may generate multiple actions in one call (e.g., batch tool executions). The original methods fail to adapt to this scenario because:
They rely on strict 1:1 action-observation pairing, which breaks when multiple actions are generated in a single LLM response.
They count individual actions/observations instead of grouping by LLM calls, leading to false negatives in multi-action loops.

Changes

Added a new detection method: _is_stuck_repeating_in_recent_group
Groups events by llm_response_id (each group represents one LLM call and its corresponding actions/observations).

Detects two types of loops for multi-action scenarios:

Action-Error Loop: Checks if the same core actions and errors repeat ≥3 times across the latest 3 LLM call groups.
Action-Observation Loop: Checks if the same core actions and observations repeat ≥4 times across the latest 4 LLM call groups.

Added trigger condition in is_stuck():

The new method is activated only when the number of events ≥12 (ensures sufficient data for reliable detection without performance overhead).

Handles multi-action semantics:

When group=3 and len(repeat_err_action_counts) > 3: Indicates at least one LLM call in the latest 3 groups generated multiple actions.
When group=4 and len(repeat_action_counts) > 4: Indicates at least one LLM call in the latest 4 groups generated multiple actions.

Grouping by llm_response_id: Aligns with the Agent's "LLM decision → multiple actions → multiple observations" workflow.
Conservative trigger threshold: Requires ≥12 events to avoid false positives from incomplete interaction sequences.

@CLFutureX
Copy link
Contributor Author

@ryanhoangt @xingyaoww hey, PTAL, thanks

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please show a pattern that this is stopping, how exactly does it look like, in each of the two scenarios? (with error, or with observation)

Have you encountered an infinite loop with multiple tool calls that wasn’t stopped? Could you tell on which LLM?

@enyst
Copy link
Collaborator

enyst commented Nov 24, 2025

The reason I ask is that while you're correct, we don't really take into account multiple tool calls here, I haven't seen the LLMs with multiple tool calls ... get stuck, like practically never? I just can't remember a report.

Not saying it couldn't happen, it's just that the existing rules are empirical, we really encountered these cases in reality in the past, and everything about them is purely from experience. Even the hardcoded values are approximations of what used to hurt more or less or when.

In the past ~6 months to an year, LLM providers have become really good with this, and SOTA LLMs hardly get stuck anymore. In this context, IMHO we would continue to add rules that are needed in practice. If an infinite loop happened, could you perhaps post a log of how it happened?

@CLFutureX
Copy link
Contributor Author

The reason I ask is that while you're correct, we don't really take into account multiple tool calls here, I haven't seen the LLMs with multiple tool calls ... get stuck, like practically never? I just can't remember a report.

Not saying it couldn't happen, it's just that the existing rules are empirical, we really encountered these cases in reality in the past, and everything about them is purely from experience. Even the hardcoded values are approximations of what used to hurt more or less or when.

In the past ~6 months to an year, LLM providers have become really good with this, and SOTA LLMs hardly get stuck anymore. In this context, IMHO we would continue to add rules that are needed in practice. If an infinite loop happened, could you perhaps post a log of how it happened?

thanks for your review
Well, I'm submitting this PR based on the existing logic (where the LLM generates multiple action calls in one go). For specific scenarios, I think this issue might occur when tool calls time out—I'll find time to test it later.

@enyst
Copy link
Collaborator

enyst commented Nov 28, 2025

If they time out, I think we send the error (the timeout) back to the LLM. So it can see and won't necessarily do it again, it can change its approach? 🤔

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Dec 5, 2025

[Automatic Post]: It has been a while since there was any activity on this PR. @CLFutureX, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants