Skip to content

fix(streaming): preserve restored live tool cards on reconnect (#3707)#3724

Closed
rodboev wants to merge 7 commits into
nesquena:masterfrom
rodboev:pr/live-reconnect-tool-restore
Closed

fix(streaming): preserve restored live tool cards on reconnect (#3707)#3724
rodboev wants to merge 7 commits into
nesquena:masterfrom
rodboev:pr/live-reconnect-tool-restore

Conversation

@rodboev

@rodboev rodboev commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Thinking Path

  • The v0.51.291 snapshot fix restored live thinking/text when switching back to an active session, but the follow-up report shows live tool cards are still stripped until another stream update arrives.
  • loadSession() can successfully restore the saved live turn and then immediately reattach the stream; reconnect cleanup still treats those restored tool cards as stale live chrome.
  • The fix keeps reconnect cleanup reconciled with the restored snapshot, replaying known tool calls only when needed and avoiding a visible empty-card gap.

What Changed

  • static/sessions.js: pass restored-live-turn state into reconnect setup.
  • static/messages.js: preserve or immediately replay restored live tool cards/text during reconnect.
  • static/ui.js: add a small helper if needed so live tool-card replay stays centralized.
  • tests/test_live_reconnect_restore.py: add regression coverage for restored snapshot plus reconnect cleanup.

Why It Matters

Users switching between long-running sessions should see the same live progress they already saw, not a partially reset transcript. This closes the remaining residual from the live-turn restore bug without waiting for broad live-to-final redesign work.

Verification

python -m pytest tests/test_live_reconnect_restore.py -q --timeout=60
node --check static/messages.js
node --check static/sessions.js
node --check static/ui.js
git diff --check origin/master...HEAD

Risks / Follow-ups

Model Used

GPT-5.5 via Codex CLI; implementation and adversarial review assisted by Codex/Claude agents.

@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown

Greptile Summary

This PR preserves restored live tool cards and assistant text when switching back to an active session and then experiencing an SSE reconnect, closing the remaining gap from the v0.51.291 snapshot fix.

  • sessions.js passes the restoredLiveTurn flag into attachLiveStream, which uses it to skip the "clear and restart from scratch" reconnect path and instead keep the snapshot DOM visible until newly arriving tokens actually diverge from it.
  • messages.js adds _replayRestoredLiveToolCardsIfMissing (called at initial attach and at each reconnect error recovery site) and _renderRestoredReconnectDisplay (guards the smd/fade render paths so the existing snapshot HTML is not blanked on the first post-reconnect frame).
  • ui.js adds the idempotent replayLiveToolCardsFromState helper with per-tid dedup, and tests/test_live_reconnect_restore.py adds structural regression coverage for each invariant.

Confidence Score: 3/5

The reconnect restore path has a dedup gap for tid-less tool calls that produces duplicate visible cards whenever the stream reconnects after initial attach.

The core reconnect-restore logic is sound and the markdown comparison fix correctly handles formatted responses. However, replayLiveToolCardsFromState skips the dedup query when tc.tid is falsy, and _replayRestoredLiveToolCardsIfMissing is invoked at both the initial wire-up and each subsequent error-recovery reconnect, producing a duplicate tool card per reconnect for tid-less calls.

static/ui.js (replayLiveToolCardsFromState dedup for tid-less calls) and static/messages.js (the matching fallback loop in _replayRestoredLiveToolCardsIfMissing)

Important Files Changed

Filename Overview
static/messages.js Adds restoredLiveTurn flag, _renderRestoredReconnectDisplay helper, and _replayRestoredLiveToolCardsIfMissing; the markdown render inside _normalizedRestoredMarkdownText is called on every hot-path frame without memoisation, and the replay helper fallback loop lacks dedup for tid-less tool calls.
static/ui.js Adds replayLiveToolCardsFromState with per-tid dedup, but the tid&&inner short-circuit means tid-less tool calls are never deduplicated, causing duplicate cards on any subsequent _replayRestoredLiveToolCardsIfMissing call.
static/sessions.js One-line change passes restoredLiveTurn from loadSession into attachLiveStream options; straightforward and correct.
tests/test_live_reconnect_restore.py Source-level structural assertions verify the key invariants; consistent with the project testing approach for browser-side JS.

Reviews (6): Last reviewed commit: "fix(streaming): compare restored reconne..." | Re-trigger Greptile

Comment thread static/messages.js
Comment thread static/messages.js
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

I pulled the branch and read the static/messages.js diff against origin/master. The tool-card replay half (_replayRestoredLiveToolCardsIfMissing, replayLiveToolCardsFromState in ui.js) looks correct and idempotent. But I can independently confirm the duplication bug greptile flagged in _renderRestoredReconnectDisplay, and I think it's slightly broader than the review states.

The mechanism

On a restored reconnect, _smdReconnect is forced false (messages.js:871):

let _smdReconnect=reconnecting&&!restoredLiveTurn;

When the first new token makes the normalized text mismatch, _renderRestoredReconnectDisplay (messages.js:1116-1126) deactivates and writes a full markdown re-render straight into assistantBody, then returns false:

_restoredReconnectDisplayActive=false;
assistantBody.classList.remove('stream-fade-active');
assistantBody.innerHTML=renderMd ? renderMd(target) : esc(target);
_sanitizeSmdLinks(assistantBody);
return false;

Control then falls into the non-fade smd-init branch (messages.js:1808-1816):

} else if(!_smdParser&&window.smd){
  if(_smdReconnect){assistantBody.innerHTML='';_smdReconnect=false;}  // _smdReconnect is false → no clear
  _smdNewParser(assistantBody);
}
if(_smdParser){ _smdWrite(displayText); }

Because _smdReconnect is false, the innerHTML='' clear is skipped, and _smdWritewindow.smd.parser_write appends the smd-rendered output on top of the renderMd(target) HTML the helper just wrote. Result: the restored prefix is rendered twice until the done event calls renderMessages() and rebuilds the DOM. _smdWrite's own self-heal (messages.js:1384) only clears when displayText doesn't start with _smdWrittenText, and here _smdWrittenText is still '', so it doesn't trigger.

The fade path shares it

greptile scoped this to the non-fade branch, but _renderStreamingFadeMarkdown (messages.js:1672-1689) has the same shape — it also gates the clear on if(_smdReconnect) before _smdNewParser(assistantBody,true) and then _smdWrite(next.text,true). So a restored session rendered through the fade path looks equally exposed. Worth confirming rather than assuming only the non-fade path regresses.

Minimal fix

Both branches already have the right clear logic — they're just disarmed. Re-arm _smdReconnect at the moment the restored display deactivates, in _renderRestoredReconnectDisplay:

_restoredReconnectDisplayActive=false;
assistantBody.classList.remove('stream-fade-active');
_smdReconnect=true;   // re-arm: next smd init must clear the restored renderMd DOM
assistantBody.innerHTML=renderMd ? renderMd(target) : esc(target);

_smdReconnect is a mutable let in the same closure (messages.js:871), so this is in scope. With it set, the very next _smdNewParser call (fade or non-fade) runs assistantBody.innerHTML='' first, and smd starts from a clean container. The renderMd(target) write then becomes redundant within the same frame (no paint between), so you could also just drop it and rely on the clear+_smdWrite, but setting the flag is the smaller, lower-risk change.

On the tests

tests/test_live_reconnect_restore.py is all read_text() + .index() string assertions — test_restored_reconnect_does_not_arm_first_streaming_markdown_clear literally asserts the !restoredLiveTurn shape that produces this bug. None of these can observe a runtime DOM doubling. They'll stay green through the regression. A jsdom-style test that mounts assistantBody, restores a markdown prefix, then feeds one SSE delta and asserts assistantBody contains the text exactly once would actually guard this path.

@rodboev

rodboev commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

CI failures are all the same pre-existing test: test_load_session_reattach_path_uses_attach_live_stream_for_running_sessions in test_inflight_stream_reuse.py. The assertion error actually shows the expected string is present in the slice, so the test contradicts its own output. Fails identically across Python 3.11, 3.12, and 3.13 on shard 2. Unrelated to this PR's changes.

Comment thread static/messages.js
@rodboev

rodboev commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pushed the narrow follow-up in rodboev/hermes-webui@67424f17. _renderRestoredReconnectDisplay now only advances _streamFadeVisibleText while the restored DOM still matches the stream text; on divergence it clears the fade marker, re-arms _smdReconnect, and clears assistantBody so both fade and non-fade smd init paths restart from a clean container. I also extended the existing source guard to pin that matching/divergent split.

Comment thread static/messages.js
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Thanks for chasing this one @rodboev. Closing as superseded by #3401 (shipped in v0.51.294, "live-to-final assistant reply redesign") — but with a note to re-verify the underlying #3707 first.

Why superseded

This PR was built on v0.51.293 (pre-#3401) and targets the old reconnect architecture: it threads a restoredLiveTurn option into attachLiveStream and adds _replayRestoredLiveToolCardsIfMissing() to re-append tool cards on the reconnect path (and references replayLiveToolCardsFromState, which doesn't exist on current master). #3401 then rewrote that entire surface, so the PR no longer applies — all three static files (messages.js, sessions.js, ui.js) conflict.

More importantly, #3401 addresses the same #3707 root cause with a more integrated mechanism:

  • snapshotLiveTurnHtmlForSession() (ui.js:5082) snapshots the entire live-turn outerHTML — tool cards included — on switch-away.
  • loadSession() (sessions.js:977-995) restores it via restoreLiveTurnHtmlForSession() and, when the restore brings back worklog/tool-card content (it explicitly checks for .tool-card-row in the live worklog/tool-call-group, sessions.js:983-990), sets restoredLiveTurn=true and skips the clearLiveToolCards() strip (sessions.js:996) — which is exactly the strip bug(streaming): restored live tool cards re-stripped on reconnect after session switch (#3668 residual) #3707 reported.
  • _mergeRestoredLiveAssistantSegment() (ui.js:5099) preserves .tool-call-group/.tool-card-row nodes during the merge.

So the "restore tool cards, then immediately re-strip them on reconnect" path you fixed here is structurally guarded in the new code, by restoring the whole turn (cards and all) rather than replaying tool calls after the fact.

Action for #3707

I'm leaving #3707 open for now rather than auto-closing it — the original report called the behavior "erratic," and #3401 rewrote the surface enough that it deserves a fresh look on v0.51.298. @b3nw, if you can re-test the switch-away/switch-back-to-an-active-stream flow on the latest build and the tool cards now persist, we'll close #3707. If any residual remains, it'll need a fresh fix against the new snapshot/restore code (the entry point is loadSession()restoreLiveTurnHtmlForSession() in the post-#3401 architecture), not this PR's approach.

Credit for the diagnosis here carries over — thank you.

nesquena-hermes added a commit that referenced this pull request Jun 7, 2026
…ds on reconnect, fixes #3707) (#3766)

* fix(streaming): replay restored live tool cards on reconnect (#3763, fixes #3707)

Post-#3401 (#3400 live-to-final epic) recovery residual. When a running session
is restored from its in-memory live-turn snapshot and then reattached to the SSE
stream, the restore-success path skipped replaying persisted live tool calls,
leaving restored live text/thinking but an EMPTY Worklog until a later SSE event
or the final render rebuilt the turn.

- Extract the persisted-tool-card replay into replayPersistedLiveToolCards()
  (reads S.toolCalls or INFLIGHT[sid].toolCalls); run it on restoredLiveTurn &&
  didReconnect, not only the !restoredLiveTurn fallback.
- Dedup safety: restore-success replay passes {skipUnkeyedRestoredDuplicates:true}
  — when the restored snapshot already has .tool-card-row rows, an UNKEYED
  persisted tool is skipped to avoid a duplicate; keyed cards still replay and
  appendLiveToolCard's tid-dedup replaces the correct restored row.
- appendLiveToolCard() and the new liveToolReplayId() both key on
  tid||id||tool_call_id||tool_use_id||call_id (consistent 5-alias set), so the
  dedup covers all known id shapes.
- Both replay sites pass {sessionId, streamId} so the ownership guard applies.
- Regression coverage: restore-success+reconnect replays tools; unkeyed-restored
  duplicates skipped; all-id-alias dedup; prior ordering invariants preserved.

Correct post-#3401 fix for #3707 (supersedes the closed #3724).

Co-authored-by: franksong2702 <[email protected]>

* docs(changelog): stamp v0.51.309 — Release JY (stage-a5b #3763)

---------

Co-authored-by: nesquena-hermes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants