fix(runtime): make cancelStream() owner-aware and close its SSE source#3345
fix(runtime): make cancelStream() owner-aware and close its SSE source#3345franksong2702 wants to merge 6 commits into
Conversation
20af25d to
5ba0d7a
Compare
5ba0d7a to
dd37d4c
Compare
cancelStream() in static/boot.js cleared S.activeStreamId, setBusy(false),
and the composer status unconditionally after issuing /api/chat/cancel.
It did not call closeLiveStream(sid, streamId), did not read the cancel
response, and did not guard the local-state clear against the original
streamId, so a new turn that started while the cancel request was in
flight could be wiped.
This change brings cancelStream() in line with cancelSessionStream():
- snapshot (sid, streamId) at entry
- read r.json() into respBody (ignore non-JSON)
- closeLiveStream(sid, streamId) for the snapshot pair, even on
cancelled:false and on network error
- only clear local busy state when S.activeStreamId still matches
the snapshot (owner guard)
- short toast 'Stream is no longer active' on cancelled:false
No backend or contract change; this is defensive conformance with
docs/rfcs/webui-run-state-consistency-contract.md (Invariants #2 and
dd37d4c to
7409247
Compare
|
| Filename | Overview |
|---|---|
| static/boot.js | cancelStream() rewritten to defer local cleanup to the SSE cancel event on the happy path (cancelled:true), and clear state directly only when cancelled:false; logic is sound but the inversion is non-obvious |
| static/messages.js | Adds _bailOutOfTerminalEventsFromStaleStream() guard to terminal SSE events and owner guards in _clearOwnerInflightState/_restoreSettledSession/_handleStreamError; tightly coupled to boot.js not clearing S.activeStreamId on cancelled:true |
| tests/test_cancel_stream_owner_guard.py | 13 tests (7 structural regex + 6 runtime via node subprocess) covering key behavioral paths; some structural assertions are weak (bare 'catch' presence check) |
| CHANGELOG.md | New Unreleased Fixed entry accurately describes the keep-SSE-open behavior, but the PR description's What Changed bullets about closeLiveStream for the snapshot pair are inaccurate |
Sequence Diagram
sequenceDiagram
participant User
participant cancelStream as cancelStream() [boot.js]
participant Backend as /api/chat/cancel
participant SSE as SSE EventSource
participant handlers as attachLiveStream handlers [messages.js]
User->>cancelStream: clicks Stop
note over cancelStream: snapshot sid + streamId
cancelStream->>Backend: "fetch /api/chat/cancel?stream_id=X"
Backend-->>cancelStream: cancelled:true
note over cancelStream: do nothing locally — S.activeStreamId stays set
Backend->>SSE: emits cancel terminal event
SSE->>handlers: cancel event fires
note over handlers: _bailOutOfTerminalEventsFromStaleStream false — proceed
handlers->>handlers: _clearOwnerInflightState clears INFLIGHT
handlers->>handlers: "S.activeStreamId=null, setBusy(false)"
alt SSE dies before cancel event
SSE->>handlers: error event fires
handlers->>handlers: _handleStreamError — cleanup + Connection interrupted
end
alt Owner rotated during fetch
note over cancelStream: S.activeStreamId !== streamId
cancelStream->>handlers: closeLiveStream(sid, streamId)
end
alt cancelled:false
Backend-->>cancelStream: cancelled:false
note over cancelStream: S.activeStreamId=null, setBusy(false), showToast
end
Reviews (4): Last reviewed commit: "Let active cancel settle over SSE" | Re-trigger Greptile
|
Updated with merge commit 4718fd7 to bring the branch onto latest master and resolve conflicts. Also clarified the conditional EventSource close comment and replaced the silent cancel request catch with debug logging. Local verification: git diff --check; node --check static/boot.js static/commands.js static/i18n.js static/panels.js static/terminal.js; pytest tests/test_sprint36.py -q (11 passed). |
|
Follow-up commit 3427f2c fixes the CI regression from the previous debug-log change: cancelStream() now uses console.warn instead of console.debug so Node runtime tests keep JSON on stdout. Local verification: git diff --check; node --check static/boot.js; pytest tests/test_cancel_stream_owner_guard.py tests/test_sprint36.py -q (24 passed). |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
|
Thanks @franksong2702 — the owner-guard intent is right and the diagnosis (cancelStream() was the odd one out vs cancelSessionStream()) is correct. Our pre-merge review (Codex regression gate, which I verified against the code) found two things; one is a quick fix, the other needs a rethink, so holding: 1. Owner guard has a null-window clobber (small — but pin it)
2. Closing the SSE on the ACTIVE session skips the cancel settle/render (the bigger one)This is the part that needs a rethink before merge. So mirroring Suggested directionFor the active session ( Marking |
|
Pushed follow-up What changed:
Verification:
AI assistance: Codex coordinated the fix with a sub-agent, found and corrected the active SSE settle regression during review, then reran focused verification before pushing. |
|
Shipped in v0.51.265 (Release IG) — thank you @franksong2702! 🙏 The owner-aware + terminal-settle rework resolved both findings from the earlier holds (null-window clobber + active-session SSE-settle). Codex confirmed the backend reliably emits the terminal cancel event the new design relies on. Closing as merged-via-release-stage. |
Closes #3344
Thinking Path
no build step; the cancel/stop path is one of the few places where
the browser talks back to the runtime and also has to keep its local
state in sync with the worker.
cancelStream()instatic/boot.jsreadsS.activeStreamId, fires/api/chat/cancel, and then unconditionallyclears
S.activeStreamId, callssetBusy(false), and resets thecomposer status. It does not look at the response, does not call
closeLiveStream(sid, streamId), and does not guard the clear againstthe original
streamId.cancelSessionStream()(used by the sidebar)already does owner-guarded cleanup and explicitly closes the SSE
source. The user-facing path was the odd one out.
cancelStream()in line withcancelSessionStream()while staying purely frontend. The backend
(
api/streaming.py:6486 cancel_stream()) already returns{ok, cancelled, stream_id}; the only thing missing is a frontendthat reads it and acts on it.
is still running, and the old SSE source cannot leak tokens into a
subsequent turn's transcript.
What Changed
static/boot.js—cancelStream()rewritten:(sid, streamId)at entry.r.json()intorespBody; ignore non-JSON bodies.closeLiveStream(sid, streamId)for the snapshot pair(mirrors
cancelSessionStream()).S.activeStreamIdis still thesnapshot value (owner guard). If a new turn has rotated
S.activeStreamIdduring the cancel fetch, the new turn's stateis left alone.
respBody.cancelled === false, show a short toastStream is no longer active(2 s, generic — the response shapecannot distinguish reasons).
run-state consistency RFC.
tests/test_cancel_stream_owner_guard.py— new file. 13 tests:snapshot / read-response / closeLiveStream / owner-guard /
cancelled-false-toast / network-error-doesn't-throw / no-op-on-empty
invariants are present in source.
evalthe function with mockedS,fetch,closeLiveStream,setBusy,setComposerStatus,showToastvia anode --input-type=modulesubprocess, andverify:
cancelled:true): state cleared AND old SSEclosed.
cancelled:false: toast surfaced AND old SSE closed ANDlocal state cleared.
SSE still closed.
S.activeStreamIdrotates to a new turnduring fetch — old SSE is still closed for the original
(sid, streamId), but the new turn'sS.activeStreamIdandbusy state are NOT cleared.
test_cancel_stream_owner_guard.py; if reviewers prefer thetest_issueNNNN_*convention, a follow-up rename is fine.CHANGELOG.md—[Unreleased] / Fixedentry describing thethree behaviours added (close SSE, owner guard, cancelled-false
toast).
Why It Matters
even if the cancel was silently skipped on the backend (e.g. a
new turn had already started in the same session). The new
turn would then run with no busy state, so the user had no
way to tell the worker was still doing work.
EventSourcewas left inLIVE_STREAMS[sid]. A subsequent turn in the same sessioncould see its
metering/tokenevents for one tick, thenflip back, then have a stale
closefire later. Closing thesource on the cancel path makes the contract between
cancelStream()and the SSE registry symmetric.cancelled:false(already finalised / stale writeback / session rotated), the
user now sees a short toast. They used to see nothing.
Verification
Run on the clean worktree based on
origin/master@1fcd81e3:node --check static/boot.js— passes.pytest tests/test_cancel_stream_owner_guard.py -v—13 passed (7 structural + 6 runtime).
pytest tests/test_sprint36.py tests/test_1466_sidebar_cancel_clarify.py tests/test_issue1298_cancel_and_activity.py tests/test_cancelled_turn_status.py tests/test_real_steer.py tests/test_issue2157_sessions_list_stale_stream_state.py -q— 60 passed, 0 failed.pytest tests/ -q --timeout=30(full suite, excluding twopre-existing failures unrelated to this PR):
tests/test_issue1144_session_time_sync.py::test_message_footer_timestamp_uses_server_tz— pre-existing locale failure on this macOS Python build
(
3月29日 10:00instead of10:00 AM); reproduces onunchanged
masterwith no diff inboot.js. Not caused bythis PR.
tests/test_ctl_script.py::test_start_can_ignore_repo_dotenv_for_authoritative_test_env— pre-existing test-ordering flake; passes in isolation,
also fails on unchanged
masterwhen run aftertest_cancel_stream_owner_guard.pyin the full suite.xpassed that are unrelated.
Not captured as a video because the fix is a control-plane
change; the only user-visible delta is a 2 s toast on the rare
cancelled:falsepath. No before/after images are attachedbecause the happy-path UI is byte-identical to before; the
only visual change is the rare toast (and, in the
near-impossible race where the cancel was silently dropped,
the second turn's spinner no longer disappears for ~200 ms).
Contract Routing
docs/rfcs/webui-run-state-consistency-contract.md— InvariantsHermes Web UI — Sprints 11-14: multi-provider models, settings, sessi… #2 (Live stream / SSE) and fix(api): resolve model provider from config to prevent misrouting #4 (Live UI scene / cache).
is closed when that turn is no longer the active turn."
Before the fix,
cancelStream()violated this when theuser clicked Stop without first switching to another
session. After the fix, the function explicitly calls
closeLiveStream(sid, streamId)for the snapshot pair.requires owning the active stream id." Before the fix,
cancelStream()cleared unconditionally. After the fix,the clear is guarded by
S.activeStreamId === snapshotStreamId./api/chat/cancelresponse shape change. No new public contract introduced.
Risks / Follow-ups
tests/test_sprint36.pyreads a 400-character window from
async function cancelStream()to find the catch block.This PR happens to keep the function compact enough that
the outer catch still falls inside that window, but a
follow-up that adds more than ~5 lines of preamble at the
top of the function will break that test. Bumping the
window to 1200 (and the matching brace-finding) is a
one-line follow-up; not bundled here to keep the PR scope
minimal.
/api/chat/cancelresponse only exposescancelled:bool.When it is
false, the toast says "no longer active"generically; we cannot tell the user whether the stream
was already finalised, the session rotated, or the
writeback was stale. A follow-up that adds a
status/reasonfield would let the UI be more specific. Not inscope for this PR.
LIVE_STREAMScross-session leak (P0 in thestreaming-audit skill) is unrelated and is being
addressed in a separate track.
only adds frontend bookkeeping; it does not change auth,
path handling, uploads, streaming-protocol parsing, or
environment handling.
Model Used
custom(Hermes self-hosted M3)MiniMax-M3separate image-generation / browser tools used in this
PR.
worktree based on
origin/master@1fcd81e3, validatedagainst the existing test suite, and the PR description
was drafted by the same agent. Final review is human.