Skip to content

fix(ui+session): never render a blank transcript for empty-content recovered turns (#3875)#3898

Closed
nesquena-hermes wants to merge 1 commit into
masterfrom
fix-3875-blank-transcript
Closed

fix(ui+session): never render a blank transcript for empty-content recovered turns (#3875)#3898
nesquena-hermes wants to merge 1 commit into
masterfrom
fix-3875-blank-transcript

Conversation

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Summary

Closes #3875.

A user's chat transcript rendered as a bare stack of SUNDAY/SATURDAY date separators with zero message content. Root-caused interactively with the reporter (local install, Firefox private window, reproduced across Chrome/Firefox/mobile, survived service-worker + cache clears). Their session's shape dump was decisive: thousands of assistant messages all content="" (empty) + reasoning + _recovered_from_run_journal. Two compounding bugs.

1. Render side (static/ui.js renderMessages)

The per-segment Thinking-trace extraction only mines inline <think>/channel/turn tags out of content. It deliberately must not read the separate reasoning field — that constraint is enforced by #2565 (reasoning metadata is low-priority Worklog detail, never inline-content extraction). So an assistant turn with empty visible content but a populated reasoning field (a run-journal-recovered anchor) extracted no thinkingText, rendered no Thinking card, and collapsed to an empty hidden assistant-segment-anchor (display:none). A session made entirely of such rows painted as nothing but date dividers.

Fix: after the #2565-guarded inline extraction block (so #2565 is preserved), fall back to _assistantReasoningPayloadText(m) as the Thinking-card source only when there is no inline thinkingText AND no visible content/files/status. It feeds the same worklog-vs-inline routing, so reasoning stays low-priority Worklog detail in simplified mode and a Thinking card in legacy mode — never promoted into visible answer/Worklog prose. Scoped to empty-content turns, so answer-bearing messages are completely unchanged.

2. Data side (api/models.py ensure_assistant_anchor)

Run-journal recovery creates an empty assistant "anchor" to host recovered tool cards for a tool-first stream (tools before any text). The lazy read-side retry path (_retry_journal_recovery_in_place) re-runs recovery on every get_session() while a pending marker is armed, and a tool-first stream has no text to dedup on (flush_assistant returns early on empty) — so each retry, and each distinct interrupted stream over a session's life, appended another empty anchor. A heavily-interrupted session accumulated thousands of empty rows.

Fix: before appending a fresh empty anchor, reuse an existing empty _recovered_from_run_journal anchor with the same _recovered_stream_id. One anchor per stream; per-stream scoping keeps distinct streams' anchors separate, and token-bearing recovery still appends real content.

Verification

Tests

  • tests/test_issue3875_blank_transcript_failsafe.py (extended): reasoning surfaces for empty-content turns; fallback scoped correctly; stays out of the Thinking/reasoning display bugs: accumulation, wrong field preference, live think-tag gap #2565-guarded block; _assistantReasoningPayloadText reads the reasoning fields.
  • tests/test_issue3875_recovery_anchor_dedup.py (new): single anchor created on first recovery; 20 repeated recoveries reuse one anchor (the unbounded-accumulation bug); distinct streams keep distinct anchors; token-bearing recovery still appends real content.

The reporter's already-bloated session is separately recoverable via a prune snippet posted on the issue; this PR stops new accumulation and makes the transcript render even for an already-poisoned session.

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a blank chat transcript bug (#3875) caused by two compounding issues: run-journal-recovered assistant turns with empty content but populated reasoning fields were silently hidden in legacy render mode, and the lazy read-side recovery path accumulated duplicate empty anchors per stream on every get_session() call.

  • Render fix (static/ui.js): After the Thinking/reasoning display bugs: accumulation, wrong field preference, live think-tag gap #2565-guarded inline thinking-extraction block, a new fallback reads _assistantReasoningPayloadText(m) to populate thinkingText only when a turn has no inline thinking, no visible content, no files, and no status — scoped to legacy mode only to preserve the simplified/Worklog echo-strip path.
  • Data fix (api/models.py): ensure_assistant_anchor now backward-scans existing messages before appending a new empty anchor, reusing any existing empty recovered anchor for the same stream_id to prevent unbounded accumulation across repeated get_session() calls or distinct interrupted streams.
  • Tests: Two test files cover both the render fallback invariants (including the Thinking/reasoning display bugs: accumulation, wrong field preference, live think-tag gap #2565 non-regression gate) and the anchor dedup contract (single anchor per stream, distinct anchors for distinct streams, token-bearing recovery still appends real content).

Confidence Score: 5/5

Safe to merge. Both fixes are tightly scoped: the data-side change only affects the empty-anchor creation path during journal recovery, and the render-side change only fires for assistant turns with no visible content, no inline thinking, and only in legacy (non-simplified) mode.

The backward scan in ensure_assistant_anchor is guarded by four specific conditions (_recovered_from_run_journal, matching _recovered_stream_id, assistant role, and empty content), making false-positive reuse essentially impossible for normal message types. The render fallback is gated by six independent conditions, including !isSimplifiedToolCalling(), so the simplified/Worklog path's echo-strip is fully preserved. Tests cover repeated recovery (20 iterations), distinct-stream isolation, token-bearing recovery, and the #2565 structural non-regression. No production callsite behavior changes for answer-bearing messages.

No files require special attention.

Important Files Changed

Filename Overview
api/models.py Adds a backward scan in ensure_assistant_anchor to reuse an existing empty recovered anchor for the same stream rather than appending a fresh one on every recovery retry; fixes unbounded accumulation of empty-content assistant rows.
static/ui.js Adds a legacy-mode fallback in renderMessages that reads _assistantReasoningPayloadText(m) as the Thinking-card source for empty-content turns with no inline thinking, so those turns are never silently collapsed to a hidden anchor.
tests/test_issue3875_blank_transcript_failsafe.py Extends the existing #3875 test file with four new assertions: reasoning field surfaces for empty-content turns, fallback is scoped correctly, reasoning stays out of the #2565-guarded inline extraction block, and _assistantReasoningPayloadText reads the right fields.
tests/test_issue3875_recovery_anchor_dedup.py New test file covering the data-side dedup: single anchor on first recovery, 20 repeated recoveries yield one anchor, distinct streams keep distinct anchors, token-bearing recovery still appends real content.
CHANGELOG.md Adds a detailed changelog entry for #3875 under [Unreleased].

Sequence Diagram

sequenceDiagram
    participant GS as get_session()
    participant JAP as _append_journaled_partial_output
    participant EAA as ensure_assistant_anchor
    participant SM as session.messages

    Note over GS,SM: Bug: each get_session() retry appended a new empty anchor

    GS->>JAP: "call (dedupe_existing=True)"
    JAP->>EAA: flush_assistant() returns None (tool-first, no text)
    EAA->>SM: backward scan for existing empty anchor (stream_id match)
    alt Empty anchor for stream already exists
        SM-->>EAA: found at index N
        EAA-->>JAP: return N (reuse, no append)
    else No existing anchor
        EAA->>SM: append new empty anchor
        EAA-->>JAP: return new index
    end
    JAP-->>GS: return appended_any

    Note over GS,SM: Render fix (legacy mode only)

    participant RM as renderMessages()
    participant ARP as _assistantReasoningPayloadText(m)

    RM->>RM: inline thinkingText extraction (content tags)
    RM->>RM: check all six guards pass
    alt All guards pass (empty-content recovered turn)
        RM->>ARP: call
        ARP-->>RM: m.reasoning_content or m.reasoning
        RM->>RM: "thinkingText = payload, insert Thinking card"
    else Guard fails (normal answer-bearing message)
        RM->>RM: render content as-is (unchanged)
    end
Loading

Reviews (2): Last reviewed commit: "fix(ui+session): never render a blank tr..." | Re-trigger Greptile

Comment thread api/models.py

@nesquena nesquena left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean approval, no fixes needed)

Independent review of the two-part root-cause fix for #3875 — the reporter's session rendered as a bare stack of SUNDAY/SATURDAY date separators with zero message content. This is the layered follow-up to the #3889 fail-safe (shipped in Release LF): that DOM-sweep couldn't rescue the reporter because there was genuinely no visible content to reveal — every row was content=str/0 (empty) + reasoning + _recovered_from_run_journal. This PR fixes the actual root cause on both the render and data sides.

What this ships

static/ui.js (renderMessages, +18), api/models.py (ensure_assistant_anchor, +24), CHANGELOG.md, plus an extended tests/test_issue3875_blank_transcript_failsafe.py and a new tests/test_issue3875_recovery_anchor_dedup.py. Closes #3875. Authored by the agent (nesquena-hermes).

Traced against upstream hermes-agent

Pulled a fresh nousresearch/hermes-agent tarball at review time. _recovered_from_run_journal / _recovered_stream_id are WebUI-internal session conventions — the agent never sees them. The recovered anchors feed conversation_history only through _sanitize_messages_for_api, which excludes reasoning-only assistant rows (streaming.py:3113-3129, _is_reasoning_only_assistant_message): empty content + reasoning/reasoning_content → skipped, so strict providers never get a blank assistant turn from these rows. The fix adds no new message-schema fields and reduces the number of empty rows, so it can only help the agent side. No cross-tool change needed. ✓

Root cause — verified against the reporter's shape dump

The issue thread is decisive: the reporter's python3 shape dump showed thousands of consecutive role=assistant content=str/0 (empty) keys=[…_recovered_from_run_journal, _recovered_stream_id, reasoning] rows, and the reporter confirmed pruning them restored the transcript. Two compounding bugs:

  1. Render: the per-segment inline Thinking extraction only mines <think>/channel/turn tags out of content and (per #2565) must not read the separate reasoning field. So an empty-content turn carrying only reasoning extracted no thinkingText, rendered no Thinking card, and collapsed to a hidden assistant-segment-anchor (display:none). A whole session of such rows painted as nothing but date dividers.
  2. Data: run-journal recovery appends an empty assistant "anchor" to host a tool-first stream's recovered tool cards. Recovery re-runs across get_session() retries and across each distinct interrupted stream, and a tool-first stream has no text to dedup on, so anchors accumulated without bound.

End-to-end trace — render side

The fallback at ui.js:8257-8260 sits after the #2565-guarded inline-extraction block (let thinkingText='' at 8110 → const isUser=… at 8141) and after the per-segment setup:

if(!thinkingText&&!String(content||'').trim()&&!filesHtml&&!statusHtml){
  const _reasoningPayload=_assistantReasoningPayloadText(m);
  if(_reasoningPayload) thinkingText=_reasoningPayload;
}

It then feeds the same worklog-vs-inline routing at 8261-8263. I traced both modes:

  • Legacy mode (!isSimplifiedToolCalling()): seg.insertAdjacentHTML('beforeend', _thinkingCardHtml(thinkingText)) → Thinking card inline; the anchor-class branch at ui.js:8270 is then skipped (its !(thinkingText && _showThinking && !simplified) is false), so the segment is visible, not a hidden anchor.
  • Simplified mode: assistantThinking.set(rawIdx, thinkingText) → reasoning routed to the Worklog card (low-priority detail), exactly as #2565 requires.

_assistantReasoningPayloadText (ui.js:6304-6333) reads m.reasoning_content||m.reasoning||m.thinking||m._reasoning (and thinking/reasoning content parts) — pure text extraction. thinkingText's only later use is the anchor-class decision at 8270, so no reasoning-sourced text leaks anywhere unexpected.

Behavioural harness (extracted helper + guard, real row shapes):

PASS | reporter empty+reasoning            => thinkingText="Let me think about X"
PASS | empty + reasoning_content           => thinkingText="planning"
PASS | answer-bearing (must be untouched)  => thinkingText=""
PASS | empty, no reasoning (stays blank)   => thinkingText=""
PASS | array content thinking part         => thinkingText=""
PASS | user row (not assistant)            => thinkingText=""
6/6 cases as expected

Confirms the reporter's exact shape surfaces a card, and answer-bearing messages are untouched.

End-to-end trace — data side

The dedup guard at models.py:1436-1447 scans backward for an existing empty _recovered_from_run_journal anchor with the same _recovered_stream_id + role=='assistant' + empty content, and reuses its index instead of appending. Verified the surrounding mechanics:

  • ensure_assistant_anchor calls flush_assistant() first (models.py:1418) — content-bearing recovery still appends real rows; only content-less anchors reach the scan.
  • The anchor stays empty-content for its whole life (recovered tool cards live in session.tool_calls with assistant_msg_idx, not in the anchor's content), so it remains matchable across retries — the dedup converges to one anchor per stream.
  • Reached from every recovery path: the lazy retry (models.py:1751, dedupe_existing=True) and the cold-load / cache-miss repair (models.py:1871/1898/1950). The guard runs unconditionally (not gated on dedupe_existing), which is correct — one-anchor-per-stream is always the goal.
  • Per-stream scoping keeps genuinely distinct interrupted streams' anchors separate.

The historical reasoning-bearing empty rows predate this code (current recovery explicitly does not restore reasoning — models.py:1363); the render fix is what makes an already-poisoned session render, while the dedup stops new accumulation. The reporter's existing bloat was separately cleared via the prune snippet on the issue (confirmed working).

Other audit — things that are correct already

  • Security / XSS: the reasoning payload is rendered via _thinkingCardHtml as <pre>${esc(clean)}</pre> (ui.js:6413-6417) — escaped. Same routing existing reasoning/thinking already used; zero new surface.
  • #2565 preserved: the fallback lives at the segment-emission point, not in the inline-extraction block; test_reasoning_fallback_stays_out_of_inline_extraction_block_2565 locks the block (slice between let thinkingText=''; and const isUser=…) free of m.reasoning / _assistantReasoningPayloadText.
  • Interaction with the #3889 fail-safe: complementary. Empty-content+reasoning turns now carry a Thinking card → the end-of-render blank-turn sweep sees visible content and skips them. No conflict, no double-reveal.
  • Thread-safety: recovery mutates session.messages under the same get_session() flow as existing recovery; no new shared-state access. The dedup scan is read-then-reuse on the same list being built — no new race.

Edge-case matrix

Scenario Behavior
Empty-content turn + reasoning (reporter's case), legacy mode Thinking card rendered inline; turn not blank ✅
Same, simplified/Compact mode Reasoning routed to Worklog card ✅ (reporter had Compact OFF; author verified both)
Answer-bearing assistant message Guard !content.trim() false → unchanged ✅
Empty content, no reasoning anywhere Stays blank (genuinely nothing to show; #3889 sweep applies) ✅
Tool-first stream, first recovery One empty anchor created to host tool cards ✅
Same stream, 20 repeated recoveries Reuses the one anchor (was: 20 rows) ✅
Three distinct interrupted streams One anchor each, distinct _recovered_stream_id
Token-bearing recovery Still appends real content row (dedup doesn't suppress) ✅
Recovered anchor → conversation_history Excluded as reasoning-only; agent never sees a blank turn ✅

Tests

  • tests/test_issue3875_recovery_anchor_dedup.py4/4 pass (single anchor on first recovery; 20 repeated recoveries → 1 anchor, the unbounded-accumulation bug; distinct streams stay distinct; token-bearing recovery still appends content).
  • tests/test_issue3875_blank_transcript_failsafe.py6/6 pass (3 prior #3889 sweep tests + 3 new: reasoning surfaces for empty-content turns, fallback scoped + out of the #2565 block, _assistantReasoningPayloadText reads the reasoning fields).
  • Full suite: 8323 passed / 125 skipped / 1 environmental flake (deselected the pre-existing darwin CRLF flake test_workspace_git.py::test_git_status_ignores_crlf_only_worktree_noise; ignored test_passkey_auth.py, which fails to collect on this box from a missing local cryptography dep, unrelated to a render/recovery change and green in CI). The one flake — test_issue2863_session_index_prime.py::test_missing_index_starts_background_rebuild_while_preserving_first_scan — is a background-thread timing test in the session-index-prime subsystem (#2863), which this PR does not touch (diff is only CHANGELOG + api/models.py/ensure_assistant_anchor + static/ui.js/renderMessages + the two #3875 test files). It passes 6/6 in isolation here and is green across all 9 CI shards — a load-sensitive flake under the 160s parallel run, not a regression.
  • ESLint runtime guard: clean (exit 0); node --check clean.
  • Behavioural harness confirms the render guard fires only for empty-content turns and surfaces the reporter's exact shape.
  • CI on the PR head: all green — browser-smoke + lint + 9 test shards.

Minor observations (non-blocking)

  • The render guard keys on String(content||'').trim(), so an assistant turn whose content is an array of thinking-only parts (String([obj])"[object Object]", non-empty) won't trip the fallback. That isn't the reporter's shape (theirs is an empty string), and such turns are handled by the normal display-content path; pre-existing, out of scope.
  • The dedup's backward scan is O(messages) worst-case per anchor creation, but returns on the first (most recent) match — effectively O(1) in practice since a stream's anchor is near the tail. Anchor creation is rare (recovery only). Fine.
  • An empty anchor with no reasoning (a fresh tool-first anchor) isn't caught by _is_reasoning_only_assistant_message, so it could in principle reach the API as a blank assistant turn — but that is pre-existing behavior and this PR strictly reduces such rows. Worth a separate look if blank-anchor 400s ever surface, but not this PR's concern.

Recommendation

Approved clean. Parked at approval — ready for the release agent's merge/tag pipeline.

A textbook layered root-cause fix: the prior #3889 fail-safe addressed the symptom for the recoverable case; this PR fixes the two underlying bugs (render-side reasoning surfacing + data-side anchor dedup) that left the reporter's all-empty session unrescuable. Both fixes are tightly scoped (empty-content turns / one-anchor-per-stream), preserve #2565, are XSS-safe, add no schema or cross-tool surface, and are covered by behavioural Python tests plus structural JS guards. Ship.

@nesquena-hermes nesquena-hermes force-pushed the fix-3875-blank-transcript branch from bcd9e33 to 1473206 Compare June 9, 2026 23:10
…covered turns (#3875)

A user's session rendered as a bare stack of date separators (SUNDAY/SATURDAY)
with zero message content. Root-caused with the reporter to two compounding bugs.

RENDER (static/ui.js renderMessages): the per-segment Thinking-trace extraction
only mines inline <think>/channel/turn tags out of `content`; it must not read
the separate `reasoning` field (#2565 keeps reasoning metadata as low-priority
Worklog detail, never inline-content extraction). So an assistant turn with empty
visible content but a populated `reasoning` field (e.g. a run-journal-recovered
anchor: empty content + reasoning + _recovered_from_run_journal) extracted no
thinkingText, rendered no Thinking card, and collapsed to an empty hidden anchor.
A session of all such rows painted blank. Fix: AFTER the #2565-guarded inline
extraction block, fall back to _assistantReasoningPayloadText(m) as the Thinking
card source ONLY when there's no inline thinkingText AND no visible content/files/
status. Feeds the same worklog-vs-inline routing, so reasoning stays Worklog
detail in simplified mode and a Thinking card in legacy mode; answer-bearing
messages are unchanged.

DATA (api/models.py ensure_assistant_anchor): run-journal recovery created an
empty assistant anchor to host recovered tool cards for a tool-first stream, but
the lazy read-side retry path re-ran recovery on every get_session() and a
tool-first stream has no text to dedup on, so each retry + each distinct
interrupted stream appended ANOTHER empty anchor -> unbounded accumulation
(the reporter's session had thousands). Fix: reuse an existing empty recovered
anchor for the same stream instead of appending a fresh one (one anchor/stream).

Verified RED->GREEN in a live browser (Compact-tool-activity ON and OFF); normal
answer-bearing messages unchanged (guard). Full suite 8506 passed; ESLint runtime
gate + ruff clean; respects #2565 (5 prior structural tests still green).

Closes #3875
@nesquena-hermes nesquena-hermes force-pushed the fix-3875-blank-transcript branch from 1473206 to 497abb5 Compare June 9, 2026 23:12
@nesquena-hermes

Copy link
Copy Markdown
Collaborator Author

Gate results — ready for independent review

Both gates run against this PR; one MUST-FIX from Codex applied + verified.

Codex (regression/breakage) — SHIP ONLY WITH FIXES → fixed. Caught a SILENT bug: the reasoning fallback originally ran in simplified/Worklog mode too, where the existing _worklogReasoningTextFromMessage path (ui.js:8149) already echo-strips reasoning that duplicates a sibling's visible answer. The raw fallback would have re-introduced that stripped text as a duplicate Worklog Thinking card. Fix: gated the fallback to legacy mode only (!isSimplifiedToolCalling()) — which is exactly where the gap was (the reporter had Compact-tool-activity OFF; simplified mode already surfaces reasoning via 8149). Re-verified live: an empty-content+reasoning session renders the reasoning in both modes, and a message whose reasoning equals its visible answer renders that text exactly once (no double-render).

Opus (architecture/correctness) — SAFE to ship, no must-fix. Independently verified all five review areas against the code:

  • Render guard correct — content is fully post-processed at the fallback point; non-assistant rows can't pick up a card; !filesHtml&&!statusHtml mirrors the hasVisibleBody check.
  • No double-render — the worklog-vs-inline routing is genuinely mutually exclusive.
  • Anchor-dedup is not O(n²) — one backward scan per recovery, recovery itself capped at _JOURNAL_RETRY_MAX_ATTEMPTS = 12; worst case on an already-bloated session is ~12 scans, microseconds.
  • No regression to single-anchor recovery; the reorder-above-marker logic is safer with reuse, not riskier.
  • The created-then-reused approach is correctly sized vs. reworking the tool-card addressing model.

One behavior note Opus flagged (intended, not a bug): the dedup's reuse branch deliberately does not set appended_any, so a retry whose only output would be another junk anchor now returns False — the interrupted marker is not falsely promoted to "recovered output," consumes attempt budget, and converges (instead of looping) on a poisoned session. This is more honest than the pre-fix behavior.

Verification: RED→GREEN in a live browser (Compact-tool-activity ON and OFF) + normal-message guard + echo guard; full suite 8506 passed; ESLint runtime gate + ruff E9/F/B clean; #2565 constraint preserved and locked by a new structural test. Rebased onto v0.51.346 (CHANGELOG-only conflict; code diff byte-identical), now MERGEABLE.

Holding for independent review — not self-merging.

nesquena-hermes added a commit that referenced this pull request Jun 10, 2026
…#3892 #3898 #3885 #3882 #3868) (#3902)

* stage v0.51.347: render/stream cluster (#3892 #3898 #3885 #3882 #3868) + 2 Opus SHOULD-FIX

* stage v0.51.347: trim #3885 error-guard comment to fit diagnostic-test window

* Stamp v0.51.347 — Release LK (streaming & render reliability cluster)

* Remove stray uv.lock accidentally staged (not part of any cluster PR)

---------

Co-authored-by: nesquena-hermes <[email protected]>
@nesquena-hermes

Copy link
Copy Markdown
Collaborator Author

Shipped in v0.51.347 (Release LK — streaming & render reliability cluster) 🎉 — live now.

Merged via the combined release PR #3902 alongside #3892, #3898, #3885, #3882, and #3868 (all in the live-to-final streaming/render family). Each was applied onto fresh master, gated through Codex (SAFE TO SHIP) + Opus (ship, no MUST-FIX) + the full suite (8532 passing), with two Opus SHOULD-FIX folded in before merge. Authorship preserved via co-authored commit + CHANGELOG credit.

Thanks for the fix! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Messages stopped displaying correctly after update to .337. Update to .340 does not fix it

2 participants