fix(stream): replay active run activity on reconnect#3371
Conversation
|
Thanks for tackling the empty-Activity-on-late-attach problem (#1584 follow-up) — replaying journaled run activity for a browser that attaches to an already-running stream is the right idea. I held this from the v0.51.210 batch because, as written, it can deliver the same event twice to an active stream, producing visibly duplicated Activity (and, for Root cause. For a still-active WebUI-owned stream,
Any event that arrived during the no-subscriber gap is therefore in both places: the journal (replayed in step 2) and the offline buffer (drained by the live loop). The browser renders it twice. Tool events append unconditionally on the front end, so the duplicate is user-visible. Empirical repro against this branch (a single tool event that is both journaled and offline-buffered): append_run_event(session_id, stream_id, "tool", {"name":"terminal","preview":"terminal: pytest","tid":"call-1"})
stream = create_stream_channel()
stream.put_nowait(("tool", {"name":"terminal","preview":"terminal: pytest","tid":"call-1"})) # -> offline_buffer
STREAMS[stream_id] = stream
# attach a late subscriber via _handle_sse_stream, then stream_endResult on this branch: What a safe version needs: make the active-stream replay single-sourced / cursor-safe so journal replay and the offline-buffer drain can't both emit the same event. A couple of viable shapes:
The regression test should assert that an event present in both the journal and the offline buffer is delivered exactly once to a late subscriber, and that a second concurrent tab reconnecting mid-run doesn't re-render the already-streamed turn. Marking |
|
Thanks — fixed the duplicate active reattach path. Active stream subscribers now skip the StreamChannel offline-buffer preload when the handler is replaying the run journal, so a gap event that is both journaled and buffered is emitted once. Reconnect EventSource URLs now include the journal cursor params for active streams too. Verification:
|
00c3fe3 to
e0fa178
Compare
|
Thanks for the quick turnaround on the dup-event blocker, @AJV20 — the The residual race (CORE)The active path subscribes (
Repro: a journaled Why the "obvious" fix is blocked, and what I triedThe clean fix is per-frame-id dedup: skip a live frame whose journal I tried a contract-preserving content-identity dedup (snapshot
So content-identity + qsize can't close it deterministically. It needs the actual per-frame id. Suggested design (satisfies the gate AND the 2-tuple contract)Carry the per-frame journal id out-of-band, not in the queue tuple — keep
A per-stream FIFO id side-channel cleaned up in the worker's Tests to add
Happy to review as soon as you push — this is close, it's just the last bit of getting the dedup keyed on the real per-frame id while keeping the 2-tuple queue contract. The full-suite + both gates will need to come back clean (Codex SAFE) before it ships. |
|
Keeping this on The race (verified by tracing the producer path)On active-stream reconnect,
But the producers (
→ the late subscriber sees that tool/token frame twice. The frontend doesn't save it from this: The two dedupe tests in the PR cover the already-buffered-before-subscribe case, but not this produced-during-the-replay-window case. Suggested fix (cross-file — why this is a kick-back, not an inline tweak)
(Don't lean on When picking this back up: note that a straight cherry-pick of the current branch onto fresh master silently drops the |
Summary
Tests
python3 -m pytest tests/test_issue_1584_multitab_sse.py tests/test_webui_gateway_chat_backend.py tests/test_gateway_sse_reconnect_dedupe.py -qpython3 -m py_compile api/routes.py api/run_journal.py api/gateway_chat.pygit diff --check origin/master...HEAD