Fix early-cancel live stream race#3476
Conversation
|
Reviewed the cancel-fix commit ( SummaryThe actual fix is good and correctly scoped to #3475. The previous Code referenceThe core change in with streams_lock:
stream_present = stream_id in streams
if stream_present:
q = streams.get(stream_id)
else:
with _live_config.ACTIVE_RUNS_LOCK:
active_run_entry = dict((_live_config.ACTIVE_RUNS or {}).get(stream_id) or {})
if not active_run_entry:
return False
active_run_session_id = str(active_run_entry.get("session_id") or "").strip() or Noneand the agent recovery: agent = agent_instances.get(stream_id)
if agent is None and active_run_session_id:
with _live_config.SESSION_AGENT_CACHE_LOCK:
cached = _live_config.SESSION_AGENT_CACHE.get(active_run_session_id)
if cached and _cached_agent_matches_session(cached[0], active_run_session_id):
agent = cached[0]The VerificationI confirmed One blocker before mergeThe branch ( As-is this can't merge without also pulling #3401's redesign. Since the cancel fix doesn't depend on #3401 (verified above), I'd recommend rebasing |
cd23dfa to
079cc9d
Compare
|
Rebased this PR onto a clean Current state:
|
|
Confirmed the rebase. I re-fetched No #3401 live-to-final commits remain, so the narrow cancel fix can now be reviewed and landed on its own merits. What survived the rebase (verified against origin/master)The core The routes.py half is also preserved, and slightly cleaner than the description suggested: active_stream_ids = _active_stream_ids()
try:
compact_session = s.compact(
include_runtime=True,
active_stream_ids=active_stream_ids,
)
except TypeError:
compact_session = s.compact()
...
journal_active = bool(original_stream_id in active_stream_ids)I confirmed the symbols this leans on all exist on State
|
|
Shipped in v0.51.237 (Release HE) via release PR #3492 — thank you @franksong2702! 🙏 Your early-cancel reconciliation fix (fall back to Two adjustments absorbed on the way in (both surfaced by the Codex regression gate, fixed + regression-tested):
Both are pinned by regression tests verified to fail against the buggy versions ( Authorship preserved via |
Closes #3475
Thinking Path
What Changed
api/streaming.py:cancel_stream()now falls back toACTIVE_RUNSand the session agent cache whenSTREAMShas already detached, so the worker still receivesinterrupt("Cancelled by user")and the session is cleaned up.api/routes.py:/api/sessionnow reports run-journal active state from the live active-run registry instead of treating any persistedactive_stream_idas proof that the worker is still alive.tests/test_cancel_interrupt.py: adds a regression for canceling afterSTREAMShas been removed but the worker is still registered as active.tests/test_run_journal_routes.py: locks the session payload invariant for the runtime journal active flag.CHANGELOG.md: records the user-visible behavior fix.Why It Matters
Verification
./.venv/bin/python -m pytest -q tests/test_cancel_interrupt.py tests/test_run_journal_routes.pygit diff --check8787where cancel was pressed immediately after send.Risks / Follow-ups
Contract Routing
Task type: bugfix for live-stream cancellation / session-state reconciliation
Touched areas:
api/streaming.py,api/routes.py,tests/test_cancel_interrupt.py,tests/test_run_journal_routes.py,CHANGELOG.mdRelevant public docs:
AGENTS.mdCONTRIBUTING.mddocs/CONTRACTS.mddocs/rfcs/webui-run-state-consistency-contract.mdScope boundaries: startup-race cancel/state reconciliation only; no live-stream redesign, no double-streaming transport change
Evidence needed before claiming done: targeted regression tests, diff check, and the user-visible cancel path on the running session
Model Used
gh, and targeted pytest to validate the fix.