Skip to content

Fix stale external claim recovery#19

Open
JMRussas wants to merge 1 commit intomainfrom
fix/stale-claim-recovery
Open

Fix stale external claim recovery#19
JMRussas wants to merge 1 commit intomainfrom
fix/stale-claim-recovery

Conversation

@JMRussas
Copy link
Owner

@JMRussas JMRussas commented Mar 6, 2026

Summary

  • EXTERNAL_CLAIM_TIMEOUT_SECONDS was defined in config but never enforced — crashed external executors left tasks stuck RUNNING forever
  • Added _recover_stale_external_claims() to the executor tick loop: resets timed-out external claims to PENDING, increments retry_count, pushes SSE event
  • 4 new unit tests covering stale/fresh/internal/multiple claim scenarios

Test plan

  • python -m ruff check — clean
  • python -m pytest tests/unit/test_stale_claims.py — 4 passed
  • python -m pytest tests/ -m "not slow" — 783 passed, 0 failed
  • CI passes

Generated by Claude Code · Claude Opus 4.6

EXTERNAL_CLAIM_TIMEOUT_SECONDS was defined in config but never enforced.
If an external executor crashes after claiming a task, the task stays
RUNNING forever, blocking its wave and project. Now _tick() calls
_recover_stale_external_claims() each cycle to reset timed-out claims
back to PENDING with retry_count incremented and an SSE event pushed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant