Skip to content

Redesign live-to-final assistant replies#3401

Open
franksong2702 wants to merge 2 commits into
nesquena:masterfrom
franksong2702:franksong2702/live-to-final-assistant-replies
Open

Redesign live-to-final assistant replies#3401
franksong2702 wants to merge 2 commits into
nesquena:masterfrom
franksong2702:franksong2702/live-to-final-assistant-replies

Conversation

@franksong2702
Copy link
Copy Markdown
Contributor

@franksong2702 franksong2702 commented Jun 2, 2026

Thinking Path

  • Hermes WebUI's most important interaction surface is the running agent session: users need to understand live progress, tool activity, replay/recovery state, and where the final answer begins.
  • Prior fixes covered individual symptoms such as interim progress, tool cards, compression, replay, stale streams, and session switching, but the product model still needed one coherent live-to-final assistant reply lifecycle.
  • This PR implements the first slice of Redesign live-to-final assistant replies for running agent sessions #3400: visible progress is strengthened as a prompt contract, live-only compression state is shown while useful, settled/final content stops retaining compression status text, and stream ownership/reconnect paths avoid losing the active live reply.
  • During validation, duplicate same-session stream ownership and stale reconnect/replay behavior were blocking the UX from being reliable, so those are included as supporting fixes.

Refs #3400.
Refs #3014 and supersedes #3015.

What Changed

  • Strengthened the WebUI visible-progress prompt contract, absorbing the narrow Restore visible WebUI progress contract #3015 direction into this PR:
    • long tool-running WebUI turns should not appear silent
    • visible progress must be normal assistant content, not only hidden reasoning/tool output
    • models are told not to run many independent tool batches back-to-back without visible assistant text
    • regression coverage rejects the old optional you may provide wording
  • Adjusted Automatic Compression UX:
    • live shows a centered non-interactive divider: Compressing context
    • completion shows Context auto-compressed while the run continues
    • settled/final Activity removes automatic-compression status text
    • the divider typography is muted and non-bold so it reads as lifecycle chrome, not assistant content
  • Hardened live reattach and replay:
    • active run-journal replay honors bounded cursor windows
    • stale cursor-only INFLIGHT state is discarded before reattach
    • explicit reconnect reopens stale CONNECTING EventSource instances
  • Fixed supporting stream ownership cases:
    • chat start rechecks same-session stream ownership under the per-session lock
    • duplicate starts for the same session reuse the current stream instead of creating a hidden ghost stream
  • Added regression coverage for visible progress prompt semantics, compression display, stale stream cleanup, and same-session inflight stream reuse.
  • Updated UI/UX docs, the run-state consistency RFC, DESIGN, and CHANGELOG for the live-only compression semantics.

Why It Matters

Running agent sessions are where users build trust in Hermes WebUI. The UI should make active work legible without confusing internal lifecycle state for final assistant content. This PR moves the experience closer to mature agent clients such as Codex and Claude Code: progress remains visible while work is happening, lifecycle detail is available when useful, and the final answer remains readable and distinct.

Contract Routing

  • Contract family: visible progress prompt contract, streaming/replay/run-state consistency, UI/UX assistant reply lifecycle, Automatic Compression display semantics.
  • Evidence used: docs/rfcs/webui-run-state-consistency-contract.md, docs/UIUX-GUIDE.md, DESIGN.md, focused frontend/static tests, run-journal replay tests, and manual 8788 live-session validation.
  • Contract change: visible interim progress for long tool-running WebUI turns is now firm prompt-contract language rather than optional guidance. Live-only Automatic Compression status is treated as transient running-session UI, not persistent settled transcript content. Final settled Activity keeps the Worklog, but removes automatic-compression status dividers.

Verification

  • node --check static/ui.js static/messages.js static/sessions.js static/workspace.js static/panels.js static/i18n.js
  • git diff --check origin/master
  • python3 scripts/ruff_lint.py --diff origin/master
    • Result: no changed Python files vs origin/master
  • python -m pytest -q tests/test_sprint42.py tests/test_auto_compression_card.py tests/test_stale_stream_cleanup.py tests/test_inflight_stream_reuse.py tests/test_run_journal_routes.py tests/test_run_journal_frontend_static.py
    • Result: 108 passed, 1 warning
  • python -m pytest tests/ -q --timeout=60 --shard-id=0 --num-shards=3
    • Result: 2391 passed, 6 skipped, 2 xpassed, 1 warning; one local failure in tests/test_profile_skills_stats.py::test_get_profile_skills_stats from the macOS platform fixture assumption, unrelated to this PR's diff
  • Manual 8788 validation:
    • Spark and MiniMax-M3 were available in the isolated dev runtime
    • live sessions triggered Auto Compression
    • Auto Compression showed Compressing context and Context auto-compressed as centered live dividers
    • automatic-compression dividers did not remain as final answer content
    • tool/lifecycle chrome was visually quieter than assistant prose in dark and light skins

Screenshots

Live running state with prose, muted tool rows, and the centered compression divider:

Live light theme compressing context

Compression completion while the run continues:

Live light theme context auto-compressed

Dark theme live state with prose, quiet tool row, token/timer footer:

Dark live prose tool footer

Expanded quiet tool rows remain visually subordinate to assistant prose:

Dark expanded tool rows

Final settled state keeps the folded L1 Worklog above assistant content:

Dark final L1 worklog

Risks / Follow-ups

  • This PR absorbs the narrow prompt-contract slice from Restore visible WebUI progress contract #3015 because the live-to-final assistant reply design depends on models reliably emitting visible progress prose.
  • This PR intentionally keeps the implementation slice narrower than the whole Redesign live-to-final assistant replies for running agent sessions #3400 design space.
  • Follow-up areas intentionally left out of this PR:
    • queue composer behavior during compression
    • explicit degraded/rebuild status during slow reattach
    • native SSE Last-Event-ID support
    • max tool-call iteration / compression-exhausted terminal taxonomy refinements
    • broader sidebar/session awareness improvements

Model Used

AI-assisted.

  • Provider: OpenAI / Codex
  • Model: GPT-5 Codex for implementation, debugging, merge preparation, and PR drafting
  • Additional validation model: GPT 5.3 Codex Spark was used in the local 8788 runtime to trigger running-session and Auto Compression scenarios

@franksong2702 franksong2702 marked this pull request as draft June 2, 2026 11:24
@franksong2702 franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch from d33dbce to a2bf57a Compare June 2, 2026 11:39
@franksong2702 franksong2702 marked this pull request as ready for review June 2, 2026 11:47
@franksong2702 franksong2702 marked this pull request as draft June 2, 2026 12:17
@franksong2702 franksong2702 marked this pull request as ready for review June 2, 2026 12:49
@franksong2702 franksong2702 marked this pull request as draft June 2, 2026 15:00
@franksong2702 franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch 2 times, most recently from 0cc3ff9 to c96e741 Compare June 2, 2026 20:29
@franksong2702 franksong2702 force-pushed the franksong2702/live-to-final-assistant-replies branch from c96e741 to a000b20 Compare June 2, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant