Skip to content

15 tasks stuck in exploration loops: 30 turns of reading without editing #129

@greynewell

Description

@greynewell

Problem

In our SWE-bench-verified evaluation, 15 tasks consumed all 30 iterations on exploration (reading files, calling symbol_context, grepping) without ever making a single edit. The agent gets stuck in analysis paralysis.

Data

  • 15 tasks with 30 iterations and 0 file edits
  • These tasks have tool call patterns like: symbol_contextReadGrepReadsymbol_contextRead → ... (repeating for 30 turns)
  • The agent keeps gathering more context but never transitions to the "fix" phase

Root Cause

Two contributing factors:

  1. No iteration budget awareness: The agent doesn't know it has a 30-iteration limit and doesn't pace itself
  2. Rich exploration tools encourage over-exploration: When symbol_context returns detailed context with callers, callees, and related symbols, the agent follows every lead instead of focusing

Impact

These 15 tasks are guaranteed losses. If even half transitioned to editing, that's +3-4 additional resolves.

Recommended Fixes

  1. Add budget awareness to instructions: "You have a limited number of turns. Spend no more than 10 turns exploring before making your first edit."
  2. Exploration cap hint: After the agent has made 8-10 symbol_context/Read calls without editing, include a hint in the next response: "Consider making your edit now — you've gathered significant context."
  3. Progressive brevity: Make tool responses progressively shorter as iteration count increases (if iteration count is available to the server)
  4. Structured workflow in instructions: "Phase 1 (turns 1-5): Understand the problem. Phase 2 (turns 6-20): Implement the fix. Phase 3 (turns 21-30): Test and refine."

Labels

performance, swe-bench

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions