Skip to content

fix(streaming): replay restored live tool cards on reconnect#3763

Closed
franksong2702 wants to merge 1 commit into
nesquena:masterfrom
franksong2702:franksong2702/fix-3707-reconnect-tool-cards
Closed

fix(streaming): replay restored live tool cards on reconnect#3763
franksong2702 wants to merge 1 commit into
nesquena:masterfrom
franksong2702:franksong2702/fix-3707-reconnect-tool-cards

Conversation

@franksong2702

@franksong2702 franksong2702 commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Thinking Path

#3707 is a post-#3401 live-to-final recovery residual. When a running session is restored from the in-memory live turn snapshot and then reattached to the SSE stream, the restore-success path could skip replaying persisted live tool calls. That left users with restored live text/thinking but missing tool cards until a later SSE event or final render rebuilt the turn.

The smallest safe fix is to keep the existing fallback replay behavior, but also replay persisted live tool cards after a successful restore when the session is reconnecting.

What Changed

  • Added a replayPersistedLiveToolCards() helper inside the loadSession() INFLIGHT path.
  • Replayed persisted live tool calls after restoredLiveTurn && didReconnect, not only on the fallback path.
  • Kept the existing stream/session ownership guard by passing sessionId and streamId into appendLiveToolCard().
  • Addressed the restore-success duplicate edge case by skipping unkeyed persisted tool replay when the restored snapshot already contains visible tool rows.
  • Let live replay replace restored cards for common id aliases (id, tool_call_id, tool_use_id, call_id), not only tid.
  • Updated targeted regression coverage for restore-success + reconnect, unkeyed restored duplicate prevention, id-alias replay, and the existing active-stream-before-replay invariant.
  • Added a changelog entry for the user-visible reconnect recovery fix.

Why It Matters

Users can switch away from a long-running live session and come back during reconnect. The page should rebuild the same live Worklog they had before switching away. Without this, the turn can look like it lost tool history even though the in-flight state still knows about those tool calls.

Fixes #3707.

Verification

  • node --check static/messages.js static/ui.js static/sessions.js
  • git diff --check
  • uv run --python 3.12 --with pytest --with pytest-timeout --with pytest-asyncio --with pyyaml --with cryptography python -m pytest tests/test_inflight_stream_reuse.py tests/test_ui_tool_call_cleanup.py tests/test_issue3668_prompt_card_reshow_on_switch.py tests/test_regressions.py -q
    • Result: 126 passed, 1 skipped
  • python3 scripts/scope_undef_gate.py
    • Local result: skipped because eslint is not available on PATH.

Risks / Follow-ups

  • This intentionally stays front-end focused and does not pull in Preserve live stream output across session switches #3427's broader server-side session snapshot/replay scope.
  • Unkeyed legacy tool calls restored from old snapshots are preserved rather than re-appended when visible tool rows already exist; current live tool calls still receive generated tids from upsertLiveToolCall().
  • The local scope_undef_gate.py run could not execute without eslint; CI should run the actual gate.

Contract Routing

Model Used

AI assisted: OpenAI GPT-5 Codex in coordinator/worker mode, with local shell verification and GitHub CLI checks.

@greptile-apps

greptile-apps Bot commented Jun 7, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a reconnect bug (#3707) where switching back to a live session after reconnection would show restored text/thinking but missing tool cards. The restore-success path previously skipped replaying persisted live tool calls, leaving them absent until a later SSE event or final render rebuilt the turn.

  • Adds replayPersistedLiveToolCards() helper inside the loadSession INFLIGHT path, called for both the new restoredLiveTurn&&didReconnect path and the existing fallback path, with a skipUnkeyedRestoredDuplicates guard to prevent duplicate appends when the restored DOM already contains tool rows.
  • Expands appendLiveToolCard's tid resolution to include all common ID aliases (id, tool_call_id, tool_use_id, call_id) so in-place card replacement works for Anthropic-style and OpenAI-style tool events, not just events that carry tc.tid.
  • Adds targeted regression coverage for the reconnect replay, the unkeyed-duplicate skip, and the id-alias deduplication paths.

Confidence Score: 4/5

Safe to merge; the reconnect replay path is narrowly scoped and the duplicate-prevention guard correctly handles both keyed and unkeyed tool calls.

The reconnect replay path is narrowly scoped and correct. S.activeStreamId is restored before any replay calls, appendLiveToolCard ownership guards prevent cross-session writes, and skipUnkeyedRestoredDuplicates prevents visible duplicates on the restore-success path. The only gap found is an unreachable liveToolCalls fallback branch that does not affect runtime behavior.

static/sessions.js — the unreachable liveToolCalls fallback branch inside replayPersistedLiveToolCards is worth cleaning up before this pattern is copied elsewhere.

Important Files Changed

Filename Overview
static/sessions.js Core fix: adds replayPersistedLiveToolCards() helper and the restoredLiveTurn&&didReconnect replay path; liveToolCalls fallback branch inside the helper is unreachable dead code.
static/ui.js Expands tid resolution in appendLiveToolCard to include all common id aliases (tool_call_id, tool_use_id, call_id); one-line change, correct and well-tested.
tests/test_inflight_stream_reuse.py Adds three new targeted tests for the reconnect replay path; the re.sub whitespace-stripped block check adequately verifies syntactic nesting of the replay call.
tests/test_regressions.py Updates the S.activeStreamId-before-replay invariant test to anchor on the helper function definition rather than a call site; invariant still holds but proxy is weaker.
tests/test_ui_tool_call_cleanup.py Adds assertion verifying the full id-alias chain in appendLiveToolCard; straightforward test addition.
CHANGELOG.md Adds user-visible changelog entry for the reconnect tool card fix; accurate and appropriately scoped.

Sequence Diagram

sequenceDiagram
    participant U as User (returns to session)
    participant LS as loadSession()
    participant DOM as DOM / liveAssistantTurn
    participant ALS as attachLiveStream()
    participant ATC as appendLiveToolCard()

    U->>LS: loadSession(sid)
    LS->>LS: "S.activeStreamId = activeStreamId"
    LS->>LS: define replayPersistedLiveToolCards()
    alt "INFLIGHT[sid].reattach && activeStreamId"
        LS->>LS: "didReconnect = true"
        LS->>ALS: "attachLiveStream(sid, activeStreamId, {reconnecting:true})"
    end
    LS->>LS: renderMessages()
    alt hasStructuredLiveState
        alt hasCurrentWorklogContent
            LS->>LS: "restoredLiveTurn = true"
        else
            LS->>DOM: restoreLiveTurnHtmlForSession(sid)
            DOM-->>LS: restoredLiveTurn (element or falsy)
        end
    else
        LS->>DOM: restoreLiveTurnHtmlForSession(sid)
        DOM-->>LS: restoredLiveTurn (element or falsy)
    end
    alt "restoredLiveTurn && didReconnect"
        Note over LS,ATC: NEW PATH
        LS->>DOM: getElementById('liveAssistantTurn')
        DOM-->>LS: hasRestoredLiveToolRows?
        loop each tc in S.toolCalls
            alt "skipUnkeyed && hasRows && !liveToolReplayId(tc)"
                LS->>LS: skip duplicate
            else
                LS->>ATC: "appendLiveToolCard(tc, {sessionId, streamId})"
                ATC->>DOM: replaceWith or appendChild
            end
        end
    else not restoredLiveTurn
        Note over LS,ATC: EXISTING FALLBACK PATH
        LS->>DOM: clearLiveToolCards()
        loop each tc in S.toolCalls
            LS->>ATC: "appendLiveToolCard(tc, {sessionId, streamId})"
        end
    end
Loading

Reviews (4): Last reviewed commit: "fix(streaming): replay restored live too..." | Re-trigger Greptile

Comment thread static/sessions.js
Comment thread tests/test_inflight_stream_reuse.py Outdated
@franksong2702 franksong2702 force-pushed the franksong2702/fix-3707-reconnect-tool-cards branch 2 times, most recently from c9aa748 to 1892e39 Compare June 7, 2026 02:53
@franksong2702

Copy link
Copy Markdown
Contributor Author

Follow-up on the Greptile duplicate-risk note:

  • Added a restore-success guard so reconnect replay does not append unkeyed legacy tool calls over an already-restored snapshot that still has visible tool rows.
  • Expanded live tool replacement to use the common id aliases (id, tool_call_id, tool_use_id, call_id) in addition to tid.
  • Added targeted regression coverage for both cases.

Updated local verification:

  • node --check static/messages.js static/ui.js static/sessions.js
  • git diff --check
  • uv run --python 3.12 --with pytest --with pytest-timeout --with pytest-asyncio --with pyyaml --with cryptography python -m pytest tests/test_inflight_stream_reuse.py tests/test_ui_tool_call_cleanup.py tests/test_issue3668_prompt_card_reshow_on_switch.py tests/test_regressions.py -q126 passed, 1 skipped
  • python3 scripts/scope_undef_gate.py → skipped locally because eslint is not on PATH; CI lint covers it.

Comment thread static/sessions.js Outdated
@franksong2702 franksong2702 force-pushed the franksong2702/fix-3707-reconnect-tool-cards branch from 1892e39 to f031c0c Compare June 7, 2026 02:56
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Review — looks good; holding briefly while you iterate

Thanks @franksong2702 — this is the correct post-#3401 fix for #3707 (and the right successor to the closed #3724, whose pre-refactor replayLiveToolCardsFromState mechanism no longer exists). I took it through the full release gate on top of current master and it comes back clean. I'm going to hold it short-term rather than slot it into a batch right now — you've pushed twice today and it looks like you may still be refining, so I'd rather pick it up once it's settled than ship a version that's immediately superseded. Ping me (or just let it sit a bit) and I'll re-gate the final state and release it.

What I verified (on your latest commit, rebased onto master)

  • Root cause is right. loadSession()'s INFLIGHT path only replayed persisted tool cards on the !restoredLiveTurn fallback; the restore-success + reconnect path left the Worklog empty until a later SSE event. Replaying on restoredLiveTurn && didReconnect closes that.
  • No double-render. Two independent guards: keyed cards are deduped by data-live-tid in appendLiveToolCard (replace, not append), and skipUnkeyedRestoredDuplicates skips an unkeyed persisted tool when the restored snapshot already has .tool-card-row rows. The id-alias widening (tid||id||tool_call_id||tool_use_id||call_id) means restored rows match across the known aliases.
  • Ownership guard. Both replay sites pass {sessionId, streamId}, so a switched-away session can't repaint stale tools — a tightening vs. the old bare call, not a regression.
  • Fallback path unchanged (clear → host → shell → replay); only the inner loop became the shared helper.

Gate results (combined diff, current master)

  • Full suite: 8161 passed, 0 failed
  • Codex (regression): SAFE TO SHIP — traced the dedup chain end-to-end incl. messages.js populating tc.tid upstream so restored snapshots carry the replace key
  • Opus (correctness): SAFE TO SHIP
  • ESLint runtime + scope-undef + ruff + browser-smoke: CLEAN

One optional, non-blocking nit (worth folding in if you push again)

liveToolReplayId (static/sessions.js) lists 4 id aliases (tid||id||tool_call_id||call_id) but appendLiveToolCard (static/ui.js) lists 5 (adds tool_use_id). It's inert today because tc.tid is always populated upstream — but if anything ever puts a tool_use_id-only entry into S.toolCalls, the skipUnkeyedRestoredDuplicates guard could misclassify it as unkeyed and skip it. Aligning liveToolReplayId to the same 5 aliases would make the two fully consistent. Not required for correctness as the code stands.

Solid work — this is in good shape. Holding only to avoid releasing mid-iteration. 🙏

nesquena-hermes added a commit that referenced this pull request Jun 7, 2026
…ds on reconnect, fixes #3707) (#3766)

* fix(streaming): replay restored live tool cards on reconnect (#3763, fixes #3707)

Post-#3401 (#3400 live-to-final epic) recovery residual. When a running session
is restored from its in-memory live-turn snapshot and then reattached to the SSE
stream, the restore-success path skipped replaying persisted live tool calls,
leaving restored live text/thinking but an EMPTY Worklog until a later SSE event
or the final render rebuilt the turn.

- Extract the persisted-tool-card replay into replayPersistedLiveToolCards()
  (reads S.toolCalls or INFLIGHT[sid].toolCalls); run it on restoredLiveTurn &&
  didReconnect, not only the !restoredLiveTurn fallback.
- Dedup safety: restore-success replay passes {skipUnkeyedRestoredDuplicates:true}
  — when the restored snapshot already has .tool-card-row rows, an UNKEYED
  persisted tool is skipped to avoid a duplicate; keyed cards still replay and
  appendLiveToolCard's tid-dedup replaces the correct restored row.
- appendLiveToolCard() and the new liveToolReplayId() both key on
  tid||id||tool_call_id||tool_use_id||call_id (consistent 5-alias set), so the
  dedup covers all known id shapes.
- Both replay sites pass {sessionId, streamId} so the ownership guard applies.
- Regression coverage: restore-success+reconnect replays tools; unkeyed-restored
  duplicates skipped; all-id-alias dedup; prior ordering invariants preserved.

Correct post-#3401 fix for #3707 (supersedes the closed #3724).

Co-authored-by: franksong2702 <[email protected]>

* docs(changelog): stamp v0.51.309 — Release JY (stage-a5b #3763)

---------

Co-authored-by: nesquena-hermes <[email protected]>
pull Bot pushed a commit to soitun/hermes-webui that referenced this pull request Jun 10, 2026
Adds a server-side run-journal live snapshot (_run_journal_live_snapshot) returned
in GET /api/session as runtime_journal_snapshot, so a FRESH client (another device,
or a tab with no in-memory snapshot) opening an in-progress session immediately sees
the already-streamed assistant text + tool cards rebuilt from the server. Composes
with the existing _replay_run_journal cursor path (seeds lastRunJournalSeq so replay
resumes from the snapshot cutoff, not duplicating it) and keys tool cards by the same
5 id aliases (tid/id/tool_call_id/tool_use_id/call_id) as nesquena#3763 so SSE replay replaces
rather than duplicates snapshot cards. Payload values truncated; redaction test added.

Co-authored-by: t3chn0pr13st <technopriest@live.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(streaming): restored live tool cards re-stripped on reconnect after session switch (#3668 residual)

2 participants