fix(ui+session): never render a blank transcript for empty-content recovered turns (#3875)#3898
fix(ui+session): never render a blank transcript for empty-content recovered turns (#3875)#3898nesquena-hermes wants to merge 1 commit into
Conversation
|
| Filename | Overview |
|---|---|
| api/models.py | Adds a backward scan in ensure_assistant_anchor to reuse an existing empty recovered anchor for the same stream rather than appending a fresh one on every recovery retry; fixes unbounded accumulation of empty-content assistant rows. |
| static/ui.js | Adds a legacy-mode fallback in renderMessages that reads _assistantReasoningPayloadText(m) as the Thinking-card source for empty-content turns with no inline thinking, so those turns are never silently collapsed to a hidden anchor. |
| tests/test_issue3875_blank_transcript_failsafe.py | Extends the existing #3875 test file with four new assertions: reasoning field surfaces for empty-content turns, fallback is scoped correctly, reasoning stays out of the #2565-guarded inline extraction block, and _assistantReasoningPayloadText reads the right fields. |
| tests/test_issue3875_recovery_anchor_dedup.py | New test file covering the data-side dedup: single anchor on first recovery, 20 repeated recoveries yield one anchor, distinct streams keep distinct anchors, token-bearing recovery still appends real content. |
| CHANGELOG.md | Adds a detailed changelog entry for #3875 under [Unreleased]. |
Sequence Diagram
sequenceDiagram
participant GS as get_session()
participant JAP as _append_journaled_partial_output
participant EAA as ensure_assistant_anchor
participant SM as session.messages
Note over GS,SM: Bug: each get_session() retry appended a new empty anchor
GS->>JAP: "call (dedupe_existing=True)"
JAP->>EAA: flush_assistant() returns None (tool-first, no text)
EAA->>SM: backward scan for existing empty anchor (stream_id match)
alt Empty anchor for stream already exists
SM-->>EAA: found at index N
EAA-->>JAP: return N (reuse, no append)
else No existing anchor
EAA->>SM: append new empty anchor
EAA-->>JAP: return new index
end
JAP-->>GS: return appended_any
Note over GS,SM: Render fix (legacy mode only)
participant RM as renderMessages()
participant ARP as _assistantReasoningPayloadText(m)
RM->>RM: inline thinkingText extraction (content tags)
RM->>RM: check all six guards pass
alt All guards pass (empty-content recovered turn)
RM->>ARP: call
ARP-->>RM: m.reasoning_content or m.reasoning
RM->>RM: "thinkingText = payload, insert Thinking card"
else Guard fails (normal answer-bearing message)
RM->>RM: render content as-is (unchanged)
end
Reviews (2): Last reviewed commit: "fix(ui+session): never render a blank tr..." | Re-trigger Greptile
nesquena
left a comment
There was a problem hiding this comment.
Review — end-to-end ✅ (clean approval, no fixes needed)
Independent review of the two-part root-cause fix for #3875 — the reporter's session rendered as a bare stack of SUNDAY/SATURDAY date separators with zero message content. This is the layered follow-up to the #3889 fail-safe (shipped in Release LF): that DOM-sweep couldn't rescue the reporter because there was genuinely no visible content to reveal — every row was content=str/0 (empty) + reasoning + _recovered_from_run_journal. This PR fixes the actual root cause on both the render and data sides.
What this ships
static/ui.js (renderMessages, +18), api/models.py (ensure_assistant_anchor, +24), CHANGELOG.md, plus an extended tests/test_issue3875_blank_transcript_failsafe.py and a new tests/test_issue3875_recovery_anchor_dedup.py. Closes #3875. Authored by the agent (nesquena-hermes).
Traced against upstream hermes-agent
Pulled a fresh nousresearch/hermes-agent tarball at review time. _recovered_from_run_journal / _recovered_stream_id are WebUI-internal session conventions — the agent never sees them. The recovered anchors feed conversation_history only through _sanitize_messages_for_api, which excludes reasoning-only assistant rows (streaming.py:3113-3129, _is_reasoning_only_assistant_message): empty content + reasoning/reasoning_content → skipped, so strict providers never get a blank assistant turn from these rows. The fix adds no new message-schema fields and reduces the number of empty rows, so it can only help the agent side. No cross-tool change needed. ✓
Root cause — verified against the reporter's shape dump
The issue thread is decisive: the reporter's python3 shape dump showed thousands of consecutive role=assistant content=str/0 (empty) keys=[…_recovered_from_run_journal, _recovered_stream_id, reasoning] rows, and the reporter confirmed pruning them restored the transcript. Two compounding bugs:
- Render: the per-segment inline Thinking extraction only mines
<think>/channel/turn tags out ofcontentand (per #2565) must not read the separatereasoningfield. So an empty-content turn carrying onlyreasoningextracted nothinkingText, rendered no Thinking card, and collapsed to a hiddenassistant-segment-anchor(display:none). A whole session of such rows painted as nothing but date dividers. - Data: run-journal recovery appends an empty assistant "anchor" to host a tool-first stream's recovered tool cards. Recovery re-runs across
get_session()retries and across each distinct interrupted stream, and a tool-first stream has no text to dedup on, so anchors accumulated without bound.
End-to-end trace — render side
The fallback at ui.js:8257-8260 sits after the #2565-guarded inline-extraction block (let thinkingText='' at 8110 → const isUser=… at 8141) and after the per-segment setup:
if(!thinkingText&&!String(content||'').trim()&&!filesHtml&&!statusHtml){
const _reasoningPayload=_assistantReasoningPayloadText(m);
if(_reasoningPayload) thinkingText=_reasoningPayload;
}It then feeds the same worklog-vs-inline routing at 8261-8263. I traced both modes:
- Legacy mode (
!isSimplifiedToolCalling()):seg.insertAdjacentHTML('beforeend', _thinkingCardHtml(thinkingText))→ Thinking card inline; the anchor-class branch at ui.js:8270 is then skipped (its!(thinkingText && _showThinking && !simplified)is false), so the segment is visible, not a hidden anchor. - Simplified mode:
assistantThinking.set(rawIdx, thinkingText)→ reasoning routed to the Worklog card (low-priority detail), exactly as #2565 requires.
_assistantReasoningPayloadText (ui.js:6304-6333) reads m.reasoning_content||m.reasoning||m.thinking||m._reasoning (and thinking/reasoning content parts) — pure text extraction. thinkingText's only later use is the anchor-class decision at 8270, so no reasoning-sourced text leaks anywhere unexpected.
Behavioural harness (extracted helper + guard, real row shapes):
PASS | reporter empty+reasoning => thinkingText="Let me think about X"
PASS | empty + reasoning_content => thinkingText="planning"
PASS | answer-bearing (must be untouched) => thinkingText=""
PASS | empty, no reasoning (stays blank) => thinkingText=""
PASS | array content thinking part => thinkingText=""
PASS | user row (not assistant) => thinkingText=""
6/6 cases as expected
Confirms the reporter's exact shape surfaces a card, and answer-bearing messages are untouched.
End-to-end trace — data side
The dedup guard at models.py:1436-1447 scans backward for an existing empty _recovered_from_run_journal anchor with the same _recovered_stream_id + role=='assistant' + empty content, and reuses its index instead of appending. Verified the surrounding mechanics:
ensure_assistant_anchorcallsflush_assistant()first (models.py:1418) — content-bearing recovery still appends real rows; only content-less anchors reach the scan.- The anchor stays empty-content for its whole life (recovered tool cards live in
session.tool_callswithassistant_msg_idx, not in the anchor'scontent), so it remains matchable across retries — the dedup converges to one anchor per stream. - Reached from every recovery path: the lazy retry (models.py:1751,
dedupe_existing=True) and the cold-load / cache-miss repair (models.py:1871/1898/1950). The guard runs unconditionally (not gated ondedupe_existing), which is correct — one-anchor-per-stream is always the goal. - Per-stream scoping keeps genuinely distinct interrupted streams' anchors separate.
The historical reasoning-bearing empty rows predate this code (current recovery explicitly does not restore reasoning — models.py:1363); the render fix is what makes an already-poisoned session render, while the dedup stops new accumulation. The reporter's existing bloat was separately cleared via the prune snippet on the issue (confirmed working).
Other audit — things that are correct already
- Security / XSS: the reasoning payload is rendered via
_thinkingCardHtmlas<pre>${esc(clean)}</pre>(ui.js:6413-6417) — escaped. Same routing existing reasoning/thinking already used; zero new surface. - #2565 preserved: the fallback lives at the segment-emission point, not in the inline-extraction block;
test_reasoning_fallback_stays_out_of_inline_extraction_block_2565locks the block (slice betweenlet thinkingText='';andconst isUser=…) free ofm.reasoning/_assistantReasoningPayloadText. - Interaction with the #3889 fail-safe: complementary. Empty-content+reasoning turns now carry a Thinking card → the end-of-render blank-turn sweep sees visible content and skips them. No conflict, no double-reveal.
- Thread-safety: recovery mutates
session.messagesunder the sameget_session()flow as existing recovery; no new shared-state access. The dedup scan is read-then-reuse on the same list being built — no new race.
Edge-case matrix
| Scenario | Behavior |
|---|---|
Empty-content turn + reasoning (reporter's case), legacy mode |
Thinking card rendered inline; turn not blank ✅ |
| Same, simplified/Compact mode | Reasoning routed to Worklog card ✅ (reporter had Compact OFF; author verified both) |
| Answer-bearing assistant message | Guard !content.trim() false → unchanged ✅ |
| Empty content, no reasoning anywhere | Stays blank (genuinely nothing to show; #3889 sweep applies) ✅ |
| Tool-first stream, first recovery | One empty anchor created to host tool cards ✅ |
| Same stream, 20 repeated recoveries | Reuses the one anchor (was: 20 rows) ✅ |
| Three distinct interrupted streams | One anchor each, distinct _recovered_stream_id ✅ |
| Token-bearing recovery | Still appends real content row (dedup doesn't suppress) ✅ |
| Recovered anchor → conversation_history | Excluded as reasoning-only; agent never sees a blank turn ✅ |
Tests
tests/test_issue3875_recovery_anchor_dedup.py— 4/4 pass (single anchor on first recovery; 20 repeated recoveries → 1 anchor, the unbounded-accumulation bug; distinct streams stay distinct; token-bearing recovery still appends content).tests/test_issue3875_blank_transcript_failsafe.py— 6/6 pass (3 prior #3889 sweep tests + 3 new: reasoning surfaces for empty-content turns, fallback scoped + out of the #2565 block,_assistantReasoningPayloadTextreads the reasoning fields).- Full suite: 8323 passed / 125 skipped / 1 environmental flake (deselected the pre-existing darwin CRLF flake
test_workspace_git.py::test_git_status_ignores_crlf_only_worktree_noise; ignoredtest_passkey_auth.py, which fails to collect on this box from a missing localcryptographydep, unrelated to a render/recovery change and green in CI). The one flake —test_issue2863_session_index_prime.py::test_missing_index_starts_background_rebuild_while_preserving_first_scan— is a background-thread timing test in the session-index-prime subsystem (#2863), which this PR does not touch (diff is only CHANGELOG +api/models.py/ensure_assistant_anchor+static/ui.js/renderMessages+ the two #3875 test files). It passes 6/6 in isolation here and is green across all 9 CI shards — a load-sensitive flake under the 160s parallel run, not a regression. - ESLint runtime guard: clean (exit 0);
node --checkclean. - Behavioural harness confirms the render guard fires only for empty-content turns and surfaces the reporter's exact shape.
- CI on the PR head: all green — browser-smoke + lint + 9 test shards.
Minor observations (non-blocking)
- The render guard keys on
String(content||'').trim(), so an assistant turn whose content is an array of thinking-only parts (String([obj])→"[object Object]", non-empty) won't trip the fallback. That isn't the reporter's shape (theirs is an empty string), and such turns are handled by the normal display-content path; pre-existing, out of scope. - The dedup's backward scan is O(messages) worst-case per anchor creation, but returns on the first (most recent) match — effectively O(1) in practice since a stream's anchor is near the tail. Anchor creation is rare (recovery only). Fine.
- An empty anchor with no reasoning (a fresh tool-first anchor) isn't caught by
_is_reasoning_only_assistant_message, so it could in principle reach the API as a blank assistant turn — but that is pre-existing behavior and this PR strictly reduces such rows. Worth a separate look if blank-anchor 400s ever surface, but not this PR's concern.
Recommendation
✅ Approved clean. Parked at approval — ready for the release agent's merge/tag pipeline.
A textbook layered root-cause fix: the prior #3889 fail-safe addressed the symptom for the recoverable case; this PR fixes the two underlying bugs (render-side reasoning surfacing + data-side anchor dedup) that left the reporter's all-empty session unrescuable. Both fixes are tightly scoped (empty-content turns / one-anchor-per-stream), preserve #2565, are XSS-safe, add no schema or cross-tool surface, and are covered by behavioural Python tests plus structural JS guards. Ship.
bcd9e33 to
1473206
Compare
…covered turns (#3875) A user's session rendered as a bare stack of date separators (SUNDAY/SATURDAY) with zero message content. Root-caused with the reporter to two compounding bugs. RENDER (static/ui.js renderMessages): the per-segment Thinking-trace extraction only mines inline <think>/channel/turn tags out of `content`; it must not read the separate `reasoning` field (#2565 keeps reasoning metadata as low-priority Worklog detail, never inline-content extraction). So an assistant turn with empty visible content but a populated `reasoning` field (e.g. a run-journal-recovered anchor: empty content + reasoning + _recovered_from_run_journal) extracted no thinkingText, rendered no Thinking card, and collapsed to an empty hidden anchor. A session of all such rows painted blank. Fix: AFTER the #2565-guarded inline extraction block, fall back to _assistantReasoningPayloadText(m) as the Thinking card source ONLY when there's no inline thinkingText AND no visible content/files/ status. Feeds the same worklog-vs-inline routing, so reasoning stays Worklog detail in simplified mode and a Thinking card in legacy mode; answer-bearing messages are unchanged. DATA (api/models.py ensure_assistant_anchor): run-journal recovery created an empty assistant anchor to host recovered tool cards for a tool-first stream, but the lazy read-side retry path re-ran recovery on every get_session() and a tool-first stream has no text to dedup on, so each retry + each distinct interrupted stream appended ANOTHER empty anchor -> unbounded accumulation (the reporter's session had thousands). Fix: reuse an existing empty recovered anchor for the same stream instead of appending a fresh one (one anchor/stream). Verified RED->GREEN in a live browser (Compact-tool-activity ON and OFF); normal answer-bearing messages unchanged (guard). Full suite 8506 passed; ESLint runtime gate + ruff clean; respects #2565 (5 prior structural tests still green). Closes #3875
1473206 to
497abb5
Compare
Gate results — ready for independent reviewBoth gates run against this PR; one MUST-FIX from Codex applied + verified. Codex (regression/breakage) — SHIP ONLY WITH FIXES → fixed. Caught a SILENT bug: the reasoning fallback originally ran in simplified/Worklog mode too, where the existing Opus (architecture/correctness) — SAFE to ship, no must-fix. Independently verified all five review areas against the code:
One behavior note Opus flagged (intended, not a bug): the dedup's reuse branch deliberately does not set Verification: RED→GREEN in a live browser (Compact-tool-activity ON and OFF) + normal-message guard + echo guard; full suite 8506 passed; ESLint runtime gate + Holding for independent review — not self-merging. |
…#3892 #3898 #3885 #3882 #3868) (#3902) * stage v0.51.347: render/stream cluster (#3892 #3898 #3885 #3882 #3868) + 2 Opus SHOULD-FIX * stage v0.51.347: trim #3885 error-guard comment to fit diagnostic-test window * Stamp v0.51.347 — Release LK (streaming & render reliability cluster) * Remove stray uv.lock accidentally staged (not part of any cluster PR) --------- Co-authored-by: nesquena-hermes <[email protected]>
|
Shipped in v0.51.347 (Release LK — streaming & render reliability cluster) 🎉 — live now. Merged via the combined release PR #3902 alongside #3892, #3898, #3885, #3882, and #3868 (all in the live-to-final streaming/render family). Each was applied onto fresh master, gated through Codex (SAFE TO SHIP) + Opus (ship, no MUST-FIX) + the full suite (8532 passing), with two Opus SHOULD-FIX folded in before merge. Authorship preserved via co-authored commit + CHANGELOG credit. Thanks for the fix! 🙏 |
Summary
Closes #3875.
A user's chat transcript rendered as a bare stack of
SUNDAY/SATURDAYdate separators with zero message content. Root-caused interactively with the reporter (local install, Firefox private window, reproduced across Chrome/Firefox/mobile, survived service-worker + cache clears). Their session's shape dump was decisive: thousands of assistant messages allcontent="" (empty)+reasoning+_recovered_from_run_journal. Two compounding bugs.1. Render side (
static/ui.jsrenderMessages)The per-segment Thinking-trace extraction only mines inline
<think>/channel/turn tags out ofcontent. It deliberately must not read the separatereasoningfield — that constraint is enforced by #2565 (reasoning metadata is low-priority Worklog detail, never inline-content extraction). So an assistant turn with empty visible content but a populatedreasoningfield (a run-journal-recovered anchor) extracted nothinkingText, rendered no Thinking card, and collapsed to an empty hiddenassistant-segment-anchor(display:none). A session made entirely of such rows painted as nothing but date dividers.Fix: after the #2565-guarded inline extraction block (so #2565 is preserved), fall back to
_assistantReasoningPayloadText(m)as the Thinking-card source only when there is no inlinethinkingTextAND no visible content/files/status. It feeds the same worklog-vs-inline routing, so reasoning stays low-priority Worklog detail in simplified mode and a Thinking card in legacy mode — never promoted into visible answer/Worklog prose. Scoped to empty-content turns, so answer-bearing messages are completely unchanged.2. Data side (
api/models.pyensure_assistant_anchor)Run-journal recovery creates an empty assistant "anchor" to host recovered tool cards for a tool-first stream (tools before any text). The lazy read-side retry path (
_retry_journal_recovery_in_place) re-runs recovery on everyget_session()while a pending marker is armed, and a tool-first stream has no text to dedup on (flush_assistantreturns early on empty) — so each retry, and each distinct interrupted stream over a session's life, appended another empty anchor. A heavily-interrupted session accumulated thousands of empty rows.Fix: before appending a fresh empty anchor, reuse an existing empty
_recovered_from_run_journalanchor with the same_recovered_stream_id. One anchor per stream; per-stream scoping keeps distinct streams' anchors separate, and token-bearing recovery still appends real content.Verification
test_reasoning_fallback_stays_out_of_inline_extraction_block_2565) locks this in.node --checkclean,ruffE9/F/B clean on changed lines (the lonecollectionsfinding is pre-existing onorigin/master).Tests
tests/test_issue3875_blank_transcript_failsafe.py(extended): reasoning surfaces for empty-content turns; fallback scoped correctly; stays out of the Thinking/reasoning display bugs: accumulation, wrong field preference, live think-tag gap #2565-guarded block;_assistantReasoningPayloadTextreads the reasoning fields.tests/test_issue3875_recovery_anchor_dedup.py(new): single anchor created on first recovery; 20 repeated recoveries reuse one anchor (the unbounded-accumulation bug); distinct streams keep distinct anchors; token-bearing recovery still appends real content.The reporter's already-bloated session is separately recoverable via a prune snippet posted on the issue; this PR stops new accumulation and makes the transcript render even for an already-poisoned session.