-
Notifications
You must be signed in to change notification settings - Fork 4
Infrastructure crashes: 17 zero-iteration tasks and 29 sympy stack overflows #126
Description
Problem
In our SWE-bench-verified evaluation, a significant number of tasks fail due to infrastructure issues before the agent can even begin working, or crash mid-execution due to resource limits.
Data
Zero-iteration crashes (17 tasks):
Tasks where the agent completed 0 iterations — the MCP server or harness crashed before any work began. These are pure infrastructure losses that the baseline doesn't suffer from.
Sympy stack overflows (29 tasks):
All sympy repository tasks trigger Python RecursionError / stack overflow during the overview tool's graph construction phase. The sympy codebase has deeply recursive module structures that exceed Python's default recursion limit during static analysis.
Django OOM (3 tasks):
Three django tasks exhausted memory during precache/overview generation due to the large codebase size.
Impact
- 17 zero-iteration tasks = 17 guaranteed losses (baseline gets ~50% of these = ~8-9 free resolves)
- 29 sympy tasks all fail at overview stage (baseline resolves ~30% of sympy = ~9 resolves lost)
- Estimated: +10-15 additional resolves from fixing infrastructure reliability
Recommended Fixes
- Sympy stack overflow: Increase Python recursion limit during graph construction, or implement iterative (non-recursive) graph traversal for large repos
- Graceful degradation: If overview/precache fails, the MCP server should still start and offer
symbol_contextwith on-demand analysis rather than crashing entirely - OOM protection: Implement memory limits and streaming for large repos; skip detailed analysis for modules above a size threshold
- Retry logic: For transient crashes, mcpbr or the server should attempt restart
Labels
bug, reliability