feat(stuck): Add group-based loop detection for multi-action LLM calls #1240

CLFutureX · 2025-11-24T09:41:31Z

Background

The existing stuck detection strategies in StuckDetector are primarily designed for single-action-per-LLM-call scenarios. However, in real-world usage, LLMs may generate multiple actions in one call (e.g., batch tool executions). The original methods fail to adapt to this scenario because:
They rely on strict 1:1 action-observation pairing, which breaks when multiple actions are generated in a single LLM response.
They count individual actions/observations instead of grouping by LLM calls, leading to false negatives in multi-action loops.

Changes

Added a new detection method: _is_stuck_repeating_in_recent_group
Groups events by llm_response_id (each group represents one LLM call and its corresponding actions/observations).

Detects two types of loops for multi-action scenarios:

Action-Error Loop: Checks if the same core actions and errors repeat ≥3 times across the latest 3 LLM call groups.
Action-Observation Loop: Checks if the same core actions and observations repeat ≥4 times across the latest 4 LLM call groups.

Added trigger condition in is_stuck():

The new method is activated only when the number of events ≥12 (ensures sufficient data for reliable detection without performance overhead).

Handles multi-action semantics:

When group=3 and len(repeat_err_action_counts) > 3: Indicates at least one LLM call in the latest 3 groups generated multiple actions.
When group=4 and len(repeat_action_counts) > 4: Indicates at least one LLM call in the latest 4 groups generated multiple actions.

Grouping by llm_response_id: Aligns with the Agent's "LLM decision → multiple actions → multiple observations" workflow.
Conservative trigger threshold: Requires ≥12 events to avoid false positives from incomplete interaction sequences.

Signed-off-by: CLFutureX <[email protected]>

CLFutureX · 2025-11-24T09:42:50Z

@ryanhoangt @xingyaoww hey, PTAL, thanks

enyst

Could you please show a pattern that this is stopping, how exactly does it look like, in each of the two scenarios? (with error, or with observation)

Have you encountered an infinite loop with multiple tool calls that wasn’t stopped? Could you tell on which LLM?

enyst · 2025-11-24T17:22:22Z

The reason I ask is that while you're correct, we don't really take into account multiple tool calls here, I haven't seen the LLMs with multiple tool calls ... get stuck, like practically never? I just can't remember a report.

Not saying it couldn't happen, it's just that the existing rules are empirical, we really encountered these cases in reality in the past, and everything about them is purely from experience. Even the hardcoded values are approximations of what used to hurt more or less or when.

In the past ~6 months to an year, LLM providers have become really good with this, and SOTA LLMs hardly get stuck anymore. In this context, IMHO we would continue to add rules that are needed in practice. If an infinite loop happened, could you perhaps post a log of how it happened?

CLFutureX · 2025-11-28T04:56:02Z

The reason I ask is that while you're correct, we don't really take into account multiple tool calls here, I haven't seen the LLMs with multiple tool calls ... get stuck, like practically never? I just can't remember a report.

Not saying it couldn't happen, it's just that the existing rules are empirical, we really encountered these cases in reality in the past, and everything about them is purely from experience. Even the hardcoded values are approximations of what used to hurt more or less or when.

In the past ~6 months to an year, LLM providers have become really good with this, and SOTA LLMs hardly get stuck anymore. In this context, IMHO we would continue to add rules that are needed in practice. If an infinite loop happened, could you perhaps post a log of how it happened?

thanks for your review
Well, I'm submitting this PR based on the existing logic (where the LLM generates multiple action calls in one go). For specific scenarios, I think this issue might occur when tool calls time out—I'll find time to test it later.

enyst · 2025-11-28T16:09:09Z

If they time out, I think we send the error (the timeout) back to the LLM. So it can see and won't necessarily do it again, it can change its approach? 🤔

blacksmith-sh · 2025-12-05T13:01:20Z

[Automatic Post]: It has been a while since there was any activity on this PR. @CLFutureX, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

add repeating check in recent group

2cb880d

Signed-off-by: CLFutureX <[email protected]>

enyst requested changes Nov 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(stuck): Add group-based loop detection for multi-action LLM calls #1240

feat(stuck): Add group-based loop detection for multi-action LLM calls #1240

Uh oh!

CLFutureX commented Nov 24, 2025

Uh oh!

CLFutureX commented Nov 24, 2025

Uh oh!

enyst left a comment

Uh oh!

enyst commented Nov 24, 2025

Uh oh!

CLFutureX commented Nov 28, 2025

Uh oh!

enyst commented Nov 28, 2025

Uh oh!

blacksmith-sh bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(stuck): Add group-based loop detection for multi-action LLM calls #1240

Are you sure you want to change the base?

feat(stuck): Add group-based loop detection for multi-action LLM calls #1240

Uh oh!

Conversation

CLFutureX commented Nov 24, 2025

Background

Changes

Detects two types of loops for multi-action scenarios:

Added trigger condition in is_stuck():

Handles multi-action semantics:

Uh oh!

CLFutureX commented Nov 24, 2025

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

enyst commented Nov 24, 2025

Uh oh!

CLFutureX commented Nov 28, 2025

Uh oh!

enyst commented Nov 28, 2025

Uh oh!

blacksmith-sh bot commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants