Skip to content

fix: reattach SSE on session-switch return and preserve live progress (closes #2924)#3005

Closed
franksong2702 wants to merge 22 commits into
nesquena:masterfrom
franksong2702:fix/sse-reattach-on-session-switch
Closed

fix: reattach SSE on session-switch return and preserve live progress (closes #2924)#3005
franksong2702 wants to merge 22 commits into
nesquena:masterfrom
franksong2702:fix/sse-reattach-on-session-switch

Conversation

@franksong2702
Copy link
Copy Markdown
Contributor

@franksong2702 franksong2702 commented May 27, 2026

Thinking Path

  • Refs Live SSE stream does not reattach after session switch — user sees no live tokens until completion #2924. Related: fix: reattach SSE on session-switch return + close leaked stream connections #2925 and Restore visible WebUI progress contract #3015.
  • Live Stream Running Session needs two separate guarantees: the UI must preserve and replay the visible timeline correctly, and the model must emit visible progress text often enough for that timeline to be useful.
  • This PR is the UI / replay half. It does not strengthen the model prompt; that belongs in Restore visible WebUI progress contract #3015.
  • The old failure was not a single bug. Session switching could lose the live text buffers, replay could start from the wrong cursor, active replay could duplicate events, and tool Activity groups could lose their anchor to the progress text that triggered them.
  • Follow-up review found several remaining blockers: the frontend was fast-forwarding replay from /api/session.runtime_journal even though that is not an applied-SSE cursor; the backend could skip buffered queue items when replay=1 was requested but no journal replay was available; duplicate interim boundaries with no new visible text could create Activity burst ids that had no text anchor; and final settled rendering could drop live burst metadata when final messages carried their own tool metadata.
  • The current fix keeps live assistant text, reasoning text, journal cursors, and Activity burst ids aligned across stream close, reattach, active replay, message-windowed rendering, and final settled rendering.

Contract Routing

Task type: runtime / streaming / recovery fix with chat-rendering UI behavior.

Touched areas:

  • Browser INFLIGHT recovery state in static/sessions.js and static/messages.js.
  • SSE replay / run journal fallback behavior in api/routes.py.
  • Live and settled Activity placement in static/ui.js.
  • Runtime diagnostics, config prompt, and regression coverage for replay / Activity invariants.

Relevant public docs:

  • AGENTS.md
  • CONTRIBUTING.md
  • docs/CONTRACTS.md
  • docs/rfcs/README.md
  • docs/rfcs/webui-run-state-consistency-contract.md
  • docs/rfcs/turn-journal.md
  • docs/UIUX-GUIDE.md
  • DESIGN.md

Contract families exercised:

  • Runtime / streaming / state: active SSE reattach, run-journal cursoring, replay idempotence, structured degradation when replay is unavailable.
  • UI/UX: visible progress remains prose-first timeline content; Activity stays quiet metadata bound to the progress segment that produced it.
  • General PR body / verification: single logical problem, explicit state layers, tests and CI listed below.

Scope boundaries:

  • Does not change the model prompt; Restore visible WebUI progress contract #3015 is the prompt-side companion.
  • Does not implement a new runtime adapter or a new turn-journal design slice.
  • Does not claim final UI readiness without manual 8787 dogfood and before/after visual evidence.

Evidence needed before Ready for review:

  • Green CI on the current head.
  • Manual long-running 8787 session-switch dogfood showing progress text and Activity remain interleaved.
  • Screenshot or short video evidence for the live Activity timeline after manual validation.

What Changed

  • Persist and restore running-stream state across session switches:
    • live assistant text
    • live reasoning text
    • applied run-journal replay cursor
    • Activity burst id and text anchors
  • Reattach active streams with run-journal replay parameters so reconnects can request only the missing event gap.
  • Make active SSE replay replay the journal snapshot before subscribing to new live events, then skip events already covered by the replay cutoff.
  • Only apply that replay cutoff when journal replay actually succeeded; if no journal is available, buffered live queue items remain deliverable.
  • Keep stale Response interrupted diagnostics out of active stream replay while preserving dead-stream behavior.
  • Bind live tool Activity groups to the progress segment that ended immediately before the tool burst.
  • Normalize live and settled Activity placement so text and Activity remain interleaved instead of collapsing all Activity at the top or bottom.
  • Fix the burst-id anchor mismatch where recordActivityBoundary() incremented the next tool burst id but left the current text DOM stamped with the old id.
  • Prevent duplicate or replayed interim boundaries with no new visible text from creating empty Activity bursts.
  • During reattach projection, alias any persisted empty Activity burst back to the previous visible text segment so existing localStorage state can recover instead of leaving tool groups anchorless.
  • Preserve live activityBurstId, duration, and started_at as auxiliary metadata when the final message owns the settled tool metadata, without violating message-windowing ownership.
  • Fold empty assistant messages that contain only tool_calls back to the previous visible assistant progress segment, so final settled rendering and session-switch reattach do not split one tool burst into multiple orphaned Activity groups.
  • Resume reconnecting multi-segment live turns from the full assistant accumulator, not the latest projected segment, so repeated session switches do not truncate older process-text anchors and push earlier Activity groups to the tail.
  • Persist interim progress boundaries even when the stream event arrives during an inactive-pane session switch, so tools produced in that window keep a valid progress-text burst anchor on reattach.
  • Keep reconnect rendering scoped to the live tail segment: seed segmentStart from the last recorded burst anchor, select the last projected live segment, and force a fresh segment when reconnect resumes exactly at a boundary with no tail yet.
  • Do not advance INFLIGHT.lastRunJournalSeq from /api/session.runtime_journal; that summary is diagnostic state, not proof that the frontend applied every SSE event through that seq.
  • Update static/regression tests to lock the new replay queue shape, anchored Activity restore invariants, empty-burst normalization, message-windowing compatibility, final settled metadata merge, and replay-cursor/cutoff boundaries.

Why It Matters

A long-running WebUI turn should keep the same visible timeline while the user switches sessions: progress text, the tools that followed that text, more progress text, then later tools. Reattaching should not blank the interim text, duplicate it, skip not-yet-consumed journal events, or pile all Activity groups into one unrelated block.

Verification

  • node --check static/ui.js static/messages.js static/sessions.js
  • pytest -q tests/test_live_activity_timeline.py tests/test_tool_call_persistence.py tests/test_streaming_markdown.py tests/test_inflight_stream_reuse.py tests/test_run_journal_routes.py tests/test_ui_tool_call_cleanup.py tests/test_issue734_message_windowing.py — local clean head e15f8270: 131 passed.
  • python3 -m pytest tests/test_live_activity_timeline.py tests/test_inflight_stream_reuse.py tests/test_streaming_race_fix.py -q — local clean head 1a8379d9: 41 passed.
  • python3 -m pytest tests/test_live_activity_timeline.py tests/test_inflight_stream_reuse.py tests/test_streaming_race_fix.py -q — local clean head 5c2c83a2: 42 passed.
  • python3 -m pytest tests/test_inflight_stream_reuse.py tests/test_live_activity_timeline.py tests/test_streaming_race_fix.py -q — local clean head 886e7548: 46 passed.
  • node --check static/messages.js static/ui.js static/sessions.js — local clean head 886e7548: passed.
  • pytest -q --ignore=tests/test_passkey_auth.py — local machine before the final cursor/empty-burst/final-metadata follow-up patches: 6624 passed, 62 skipped, 3 xpassed; 2 unrelated tests/test_workspace_git.py failures caused by local git default branch not being master.
  • GitHub Actions test (3.11), test (3.12), and test (3.13) pass on commit e15f8270.
  • GitHub Actions for 886e7548 are pending after the latest follow-up push.

Risks / Follow-ups

  • Still Draft. This needs manual 8787 dogfood before it should be marked ready for review.
  • This PR cannot make a model emit visible progress text; if the model runs many tool batches without ordinary assistant text, the UI can only anchor Activity to the latest available visible segment. Restore visible WebUI progress contract #3015 is the prompt-side companion.
  • Manual validation should stress rapid session switching during long, tool-heavy runs before publishing.
  • UI/UX publication evidence is still pending: before/after screenshot or short video after 8787 dogfood.

Model Used

OpenAI GPT-5 Codex. AI assistance was used to inspect runtime evidence, split PR scope, implement the focused fixes, run contract verification, and verify the clean branch.

@franksong2702 franksong2702 force-pushed the fix/sse-reattach-on-session-switch branch from 13ec8ef to e15f827 Compare May 28, 2026 13:40
@franksong2702
Copy link
Copy Markdown
Contributor Author

Closing this draft in favor of the cleaner follow-up path.

#2924 was closed by #3038, and the remaining live-to-final / reattach / replay / Activity lifecycle work has been repackaged into #3401 with a smaller current-master diff, updated tests, manual screenshot evidence, and green CI.

This PR is now stale/conflicting and too broad to remain the active review vehicle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant