Skip to content

SWE-bench Pro 119: paused at 35/119, resume harness in internal/swebench-119-resume/ #174

Description

@asklokesh

Status: PAUSED (2026-06-12), resumable

The internal SWE-bench Pro 119-instance evaluation run is paused partway with a fully reboot-proof resume bundle. This issue is the public mirror-of-record so the work can be resumed cold later.

Where everything lives (gitignored, internal-only)

  • Resume bundle: internal/swebench-119-resume/ (self-contained: harness scripts + ledger + patches + dataset/helper_code + official eval script).
  • Full resume instructions: internal/swebench-119-resume/RESUME.md - the single authoritative doc (exact resume command, prerequisites, hardcoded paths, pacing rule, grading procedure, cost decision).
  • Survival backup: internal/swebench-119-ledger-backup/ - auto-refreshed each window close.

Why paused

Remaining instances would project total spend over the host cap, and the run follows a deliberate paced cadence (record-spend-per-window) rather than a single burst. Pausing + documenting is the correct call vs. overshooting the cap.

To resume

Follow internal/swebench-119-resume/RESUME.md -> "TL;DR RESUME COMMAND". The driver is idempotent (skips already-recorded instances), so resume never double-charges. A cost decision (raise cap / partial-N / keep paced) is required before resuming to completion - see the doc.

Reboot-proofing

Two non-tmp anchors under internal/ survive a /tmp wipe. The large per-instance work dirs in /tmp are disposable (re-extracted per instance).

Benchmark numbers and provenance are INTERNAL pending founder publication approval and are intentionally not in this public issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions