Skip to content

Infrastructure crashes: 17 zero-iteration tasks and 29 sympy stack overflows #126

@greynewell

Description

@greynewell

Problem

In our SWE-bench-verified evaluation, a significant number of tasks fail due to infrastructure issues before the agent can even begin working, or crash mid-execution due to resource limits.

Data

Zero-iteration crashes (17 tasks):
Tasks where the agent completed 0 iterations — the MCP server or harness crashed before any work began. These are pure infrastructure losses that the baseline doesn't suffer from.

Sympy stack overflows (29 tasks):
All sympy repository tasks trigger Python RecursionError / stack overflow during the overview tool's graph construction phase. The sympy codebase has deeply recursive module structures that exceed Python's default recursion limit during static analysis.

Django OOM (3 tasks):
Three django tasks exhausted memory during precache/overview generation due to the large codebase size.

Impact

  • 17 zero-iteration tasks = 17 guaranteed losses (baseline gets ~50% of these = ~8-9 free resolves)
  • 29 sympy tasks all fail at overview stage (baseline resolves ~30% of sympy = ~9 resolves lost)
  • Estimated: +10-15 additional resolves from fixing infrastructure reliability

Recommended Fixes

  1. Sympy stack overflow: Increase Python recursion limit during graph construction, or implement iterative (non-recursive) graph traversal for large repos
  2. Graceful degradation: If overview/precache fails, the MCP server should still start and offer symbol_context with on-demand analysis rather than crashing entirely
  3. OOM protection: Implement memory limits and streaming for large repos; skip detailed analysis for modules above a size threshold
  4. Retry logic: For transient crashes, mcpbr or the server should attempt restart

Labels

bug, reliability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions