diff --git a/docs/rfcs/live-to-final-assistant-replies.md b/docs/rfcs/live-to-final-assistant-replies.md index 1fa138a6f4..68e8093dda 100644 --- a/docs/rfcs/live-to-final-assistant-replies.md +++ b/docs/rfcs/live-to-final-assistant-replies.md @@ -1,82 +1,173 @@ # Live-to-Final Assistant Replies for Long-Running Agent Sessions -- **Status:** Proposed +- **Status:** Accepted (parent contract; implementation tracked in [#3400](https://github.com/nesquena/hermes-webui/issues/3400)) - **Author:** @franksong2702 - **Created:** 2026-06-03 - **Tracking issue:** [#3400](https://github.com/nesquena/hermes-webui/issues/3400) -## Background +## Background: Long-Running Sessions Are The Anchor -This RFC is anchored on long-running agent sessions. +This RFC defines the product model for assistant replies in long-running agent +sessions. -Short conversations are useful sanity checks, but they do not exercise the -hardest browser-agent states. A long-running session can spend minutes waiting, -make many tool calls, produce a long final answer, cross context-pressure -boundaries, hit tool-call or retry limits, lose network continuity, and still -need to recover into a readable final transcript. +Short conversations are still useful sanity checks, but they do not exercise +the hardest browser-agent states. A long-running session can: -The product model should therefore be defined against the long-running case. A +- keep the user waiting for minutes, +- make many tool calls, +- produce a long final answer, +- create or update workspace artifacts, +- cross Auto Compression boundaries, +- hit tool-call, retry, or iteration limits, +- lose browser, network, or SSE continuity, +- receive a user cancel or interruption request while startup is still racing, +- switch sessions or reload before the turn settles. + +The design should therefore be judged against the long-running case first. A short conversation should be the same lifecycle with fewer events, not a separate UI model. -## Problem +The goal is not to add a Worklog widget, and it is not to make Auto Compression +or duplicate stream ownership the headline. Those are supporting slices and +edge cases. The headline is one coherent assistant reply lifecycle: live work, +supporting activity, terminal outcome, and final answer. + +## Product Problem Hermes WebUI currently uses one chat surface to represent several different -things: +meanings: - the assistant's live process text while work is still running, - tool activity and lifecycle status that support that work, - recovery or replay state after refresh, reconnect, or session switching, +- terminal outcomes such as cancel, interruption, no response, or tool limit, - the final answer after the turn settles. -Those meanings have repeatedly competed for the same visual space. The result is -that some long-running sessions feel noisy, some look silent while the agent is -working, some recover into a different shape after reconnect, and some terminal -edge cases can appear completed even when no final answer was produced. +Those meanings have repeatedly competed for the same visual space. Some +long-running sessions feel noisy, some look silent while the agent is working, +some recover into a different shape after reconnect, and some terminal edge +cases can appear completed even when no final answer was produced. -This RFC defines the product semantics for that lifecycle. +This RFC defines the product semantics that implementation PRs and follow-up +RFCs should preserve. -## Public Issue Signals +## Scope -The public issue history already shows the same problem recurring from several -directions. The table below uses representative examples; the broader inventory -lives in [#3400](https://github.com/nesquena/hermes-webui/issues/3400). +### This RFC owns -| Signal | Examples | Product implication | +- The visible lifecycle of one assistant reply from live work to final or + terminal outcome. +- The boundary between process prose, tool activity, lifecycle status, and the + final answer. +- Long-running edge-case semantics for Auto Compression, no-final answers, + tool/iteration limits, cancel/interruption, replay/reconnect/session switch, + produced artifacts/output handoff, and sidebar/session ownership. +- The classification of work into implemented slices, active PRs, confirmed + follow-ups, and child RFCs. + +### This RFC does not own + +- Pixel-level styling. +- Provider/model selection. +- A backend tool-event schema change such as a shared display-title field. +- A new runtime adapter, runner process, storage format, or SSE protocol. +- Rich artifact rendering, executable HTML, visualization plugins, or Canvas + editing surfaces. This RFC only owns how produced artifacts remain findable + from the reply lifecycle. +- The full command semantics for Queue, Steer, Stop-and-send, and Interrupt. + Those belong to the pending-intent control-surface contract tracked by + [#3058](https://github.com/nesquena/hermes-webui/issues/3058) and + [#3061](https://github.com/nesquena/hermes-webui/pull/3061). + +## Public Inventory + +This inventory groups representative public issues and PRs by the +long-running-session concern they expose. It is not a claim that every linked +item is solved by this RFC. The classification column records durable scope, +not current open/merged/superseded state: for live status, the tracking issue +[#3400](https://github.com/nesquena/hermes-webui/issues/3400) is authoritative. + +| Concern | Representative signals | Current classification | | --- | --- | --- | -| Working output and final answer can blur together | [#536](https://github.com/nesquena/hermes-webui/issues/536) | Running process and final answer need separate semantics. | -| Compression state is hard to represent cleanly | [#469](https://github.com/nesquena/hermes-webui/issues/469), [#2973](https://github.com/nesquena/hermes-webui/issues/2973), [#3079](https://github.com/nesquena/hermes-webui/issues/3079) | Context compression should be visible while useful, but not become final transcript content. | -| Replay, reconnect, and session switching can lose active context | [#2283](https://github.com/nesquena/hermes-webui/issues/2283), [#2924](https://github.com/nesquena/hermes-webui/issues/2924), [#3391](https://github.com/nesquena/hermes-webui/issues/3391) | A recovered session should rebuild the same reply lifecycle as the live render. | -| Tool, activity, thinking, and progress rendering can become noisy or silent | [#1298](https://github.com/nesquena/hermes-webui/issues/1298), [#3014](https://github.com/nesquena/hermes-webui/issues/3014), [#3015](https://github.com/nesquena/hermes-webui/issues/3015) | Process text should stay primary; tool activity should remain supporting detail. | -| Terminal turns can end without a real final answer | [#3315](https://github.com/nesquena/hermes-webui/issues/3315), [#3316](https://github.com/nesquena/hermes-webui/issues/3316) | No-final, compression-exhausted, and tool-limit outcomes need explicit terminal states. | -| Stream ownership and cancellation affect what the user sees | [#3344](https://github.com/nesquena/hermes-webui/issues/3344), [#3345](https://github.com/nesquena/hermes-webui/issues/3345) | One visible turn must own its live, terminal, and final events. | -| Session awareness affects live work visibility | [#856](https://github.com/nesquena/hermes-webui/issues/856), [#1370](https://github.com/nesquena/hermes-webui/issues/1370), [#1436](https://github.com/nesquena/hermes-webui/issues/1436) | Sidebar/session state must not contradict the visible active session. | -| Busy input changes live-session control | [#720](https://github.com/nesquena/hermes-webui/issues/720), [#965](https://github.com/nesquena/hermes-webui/pull/965), [#1062](https://github.com/nesquena/hermes-webui/pull/1062) | Queue, steer, and interrupt are adjacent controls for long-running sessions, but command-level behavior belongs to a separate control-surface contract. | - -## Goals - -- Define the product model for assistant replies in long-running sessions. -- Make live process text, tool activity, lifecycle status, and the final answer - share one coherent turn lifecycle. -- Preserve the same lifecycle through replay, reconnect, refresh, and session - switching. -- Name terminal outcomes honestly when the run does not produce a normal final - answer. -- Define which long-running edge cases belong to the first slice and which - should be handled by later slices. - -## Non-goals - -This RFC does not define: - -- pixel-level styling, -- provider/model selection, -- command-level queue/steer/interrupt behavior, -- a new runtime adapter, storage format, or SSE protocol, -- a backend tool-event schema change such as a shared display-title field. +| Live work vs final answer boundary | [#536](https://github.com/nesquena/hermes-webui/issues/536), [#3400](https://github.com/nesquena/hermes-webui/issues/3400), [#3464](https://github.com/nesquena/hermes-webui/pull/3464) | Main product scope. #3464 landed the first RFC; this document is the parent contract for follow-up slices. | +| First live-to-final reply implementation | [#3401](https://github.com/nesquena/hermes-webui/pull/3401), [#3014](https://github.com/nesquena/hermes-webui/issues/3014), [#3015](https://github.com/nesquena/hermes-webui/pull/3015) | First implementation slice. It should keep using `Refs #3400`; it does not close the umbrella. | +| Auto Compression visibility and context pressure | [#469](https://github.com/nesquena/hermes-webui/issues/469), [#2973](https://github.com/nesquena/hermes-webui/issues/2973), [#3079](https://github.com/nesquena/hermes-webui/issues/3079), [#3315](https://github.com/nesquena/hermes-webui/issues/3315), [#3316](https://github.com/nesquena/hermes-webui/pull/3316) | Supporting edge case. Running compression is live lifecycle status; compression-exhausted/no-final finalization is a terminal-state follow-up. | +| Replay, reconnect, session switch, and reattach | [#2283](https://github.com/nesquena/hermes-webui/pull/2283), [#2924](https://github.com/nesquena/hermes-webui/issues/2924), [#3391](https://github.com/nesquena/hermes-webui/pull/3391) | Supporting recovery infrastructure. The product requirement is same lifecycle after replay, or an explicit degraded/restoring state. | +| Tool, activity, thinking, and visible progress | [#1298](https://github.com/nesquena/hermes-webui/issues/1298), [#3014](https://github.com/nesquena/hermes-webui/issues/3014), [#3015](https://github.com/nesquena/hermes-webui/pull/3015) | Main reply-rendering concern. Process prose stays primary; tool/reasoning/debug detail stays supporting. | +| No-final and terminal failure outcomes | [#3315](https://github.com/nesquena/hermes-webui/issues/3315), [#3316](https://github.com/nesquena/hermes-webui/pull/3316) | Confirmed follow-up / active PR scope. A tool-tail or compression-exhausted run must not settle as normal completion without a real final answer. | +| Cancellation and stream ownership | [#3344](https://github.com/nesquena/hermes-webui/issues/3344), [#3345](https://github.com/nesquena/hermes-webui/pull/3345), [#3475](https://github.com/nesquena/hermes-webui/issues/3475), [#3476](https://github.com/nesquena/hermes-webui/pull/3476) | Supporting cancel/recovery scope. Early-cancel worker reconciliation is addressed by [#3476](https://github.com/nesquena/hermes-webui/pull/3476); frontend cancel owner-guard hardening is the remaining follow-up. | +| Produced artifacts and output handoff | [#2655](https://github.com/nesquena/hermes-webui/issues/2655), [#2673](https://github.com/nesquena/hermes-webui/pull/2673), [#2881](https://github.com/nesquena/hermes-webui/issues/2881), [#2938](https://github.com/nesquena/hermes-webui/pull/2938), [#3329](https://github.com/nesquena/hermes-webui/pull/3329), [#3348](https://github.com/nesquena/hermes-webui/pull/3348), [#3528](https://github.com/nesquena/hermes-webui/issues/3528) | Supporting session-output concern. Existing Artifacts and `workspace://` surfaces make produced files findable; long-running replay/cancel/terminal paths must not lose the tool metadata needed to recover that handoff. | +| Sidebar/session ownership and active-session awareness | [#856](https://github.com/nesquena/hermes-webui/issues/856), [#1370](https://github.com/nesquena/hermes-webui/pull/1370), [#1436](https://github.com/nesquena/hermes-webui/issues/1436) | Confirmed follow-up scope when sidebar/session metadata contradicts the visible active turn. | +| User intervention during live work | [#720](https://github.com/nesquena/hermes-webui/issues/720), [#965](https://github.com/nesquena/hermes-webui/pull/965), [#1062](https://github.com/nesquena/hermes-webui/pull/1062), [#3058](https://github.com/nesquena/hermes-webui/issues/3058), [#3061](https://github.com/nesquena/hermes-webui/pull/3061) | Child RFC scope. This parent RFC only requires that controls preserve ownership, replay, and terminal honesty. | ## Product Model +### Lifecycle flow + +The lifecycle below is a product-state model, not a backend schema or +wire-event contract. At settle time, the visible reply state should be derived +from durable transcript truth, available terminal evidence, and reply +ownership. A turn should not be marked `completed` only because live activity +or partial assistant prose existed earlier. + +```mermaid +%%{init: {"theme": "neutral"}}%% +flowchart TD + A([User sends message]) --> B["Turn created
reply ownership established"] + B --> C["Live phase
process prose + quiet tool activity"] + C --> D{Lifecycle event} + + D -- stream continues --> C + D -- reload / reconnect / session switch --> E["Recovery and replay
rebuild the same lifecycle from durable state"] + E --> F{Same turn recovered?} + F -- yes --> C + F -- not yet --> G["Restoring or degraded state
do not mark completed from missing live data"] + G --> D + + D -- user cancels --> H["Cancel requested
settle only the owned reply"] + H --> I["Settle decision
durable transcript truth + terminal evidence + reply ownership"] + D -- run ended / terminal evidence --> I + + I --> J{Event belongs to
the current visible reply?} + J -- no --> K["Ignore stale event
do not mutate the current visible reply"] + J -- yes --> L{Final assistant answer present
and terminal evidence is normal?} + + L -- yes --> M["completed
activity summary above final answer"] + L -- no --> N{Specific terminal outcome} + N -- cancelled --> O["cancelled
user stopped the turn"] + N -- interrupted --> P["interrupted
continuity lost before final answer"] + N -- compression_exhausted --> Q["compression_exhausted
compression could not continue safely"] + N -- tool_limit_reached --> R["tool_limit_reached
tool / retry / iteration ceiling hit"] + N -- no_response --> S["no_response
no usable assistant final content"] + N -- other failure --> T["error
fallback for other terminal failures"] + + M --> U["Settled reply visible
supporting activity collapsed;
artifacts and workspace outputs findable"] + O --> U + P --> U + Q --> U + R --> U + S --> U + T --> U +``` + +### Reply ownership + +One visible assistant reply belongs to one user turn and one active run/stream +identity while that run is active. + +Requirements: + +- A live event should attach to the assistant reply that owns the run. +- A later turn in the same session must not inherit stale live events from an + older stream. +- A background session can continue running, but its live stream should not + mutate the visible pane for another session. +- A terminal event should settle the same turn it belongs to, or route through + a background/error path if the user is no longer viewing that session. +- Sidebar state should not contradict the visible owner. If the sidebar says a + session is running, opening it should show live work, a restoring/degraded + state, or an honest terminal state. + ### Live phase While a turn is running, the assistant reply should read as a live process @@ -89,6 +180,8 @@ Requirements: - Tool rows and tool groups are collapsed by default. - Full commands, arguments, raw output, and large payloads stay behind deeper disclosure. +- Thinking/reasoning that is not user-facing progress should not be the only + visible signal that work is happening. - The run timer/status belongs with the active live turn, not as a top transcript artifact. - Running-only lifecycle markers are transient. @@ -104,17 +197,17 @@ Requirements: - A compact activity summary appears above the final answer. - The activity summary is collapsed by default. - Expanding it reveals readable process history and tool history. -- Raw command/output detail remains behind a deeper disclosure. +- Raw command/output detail remains behind deeper disclosure. - The final answer remains ordinary assistant prose below the summary. -- Running-only markers disappear from the settled transcript unless they explain - a visible error or recovery outcome. +- Running-only markers disappear from the settled transcript unless they + explain a visible error or recovery outcome. - Very long final answers remain complete and readable. They should not be hidden inside the activity summary or replaced by a progress/status artifact. ### Recovery and replay -Refresh, reconnect, session switching, and replay should preserve the same reply -model. +Refresh, reconnect, session switching, and replay should preserve the same +reply model. Requirements: @@ -124,43 +217,46 @@ Requirements: - If the exact live scene cannot be reconstructed immediately, the UI should show an explicit restoring or degraded state instead of an empty running shell. +- Replay must be idempotent. It should not duplicate tokens, progress prose, + reasoning, tool rows, compression rows, or terminal cards. - Old in-progress browser state must not override durable session truth. - Recovery/control events stay internal unless they describe a user-visible terminal outcome. ### Terminal outcomes -Every turn needs a terminal outcome. A turn without a final answer must not look -like a normal completed answer. +Every turn needs a terminal outcome. A turn without a final answer must not +look like a normal completed answer. Required product states: -- **completed**: the assistant produced a final answer and the turn settled - normally. -- **cancelled**: the user stopped the turn. -- **interrupted**: browser, stream, worker, or network continuity was lost - before a final answer was produced. -- **compression exhausted**: context compression could not create enough room to - continue safely. -- **tool limit reached**: the run hit a tool-call, retry, or iteration ceiling - before a final answer was produced. -- **no response**: the provider or runtime returned no usable assistant content. -- **error**: fallback for failures that do not fit the above states. - -Copy can evolve, but these semantic distinctions should stay stable in live -rendering, settled rendering, and replay. +| State | Meaning | +| --- | --- | +| `completed` | The assistant produced a final answer and the turn settled normally. | +| `cancelled` | The user stopped the turn. | +| `interrupted` | Browser, stream, worker, runtime, or network continuity was lost before a final answer was produced. | +| `compression_exhausted` | Context compression could not create enough room to continue safely. | +| `tool_limit_reached` | The run hit a tool-call, retry, or iteration ceiling before a final answer was produced. | +| `no_response` | The provider or runtime returned no usable assistant final content. | +| `error` | Fallback for failures that do not fit the above states. | + +These identifiers name product states, not a wire/enum or persisted schema +contract; consistent with Scope, this RFC does not mandate a backend field or +event shape for them. Copy can evolve, but these semantic distinctions should +stay stable in live rendering, settled rendering, and replay. When more than one terminal condition applies, the more specific condition -should win over the generic fallback. For example, cancelled, compression -exhausted, tool limit reached, and no response should not be flattened into a -plain error only because the turn also failed to produce a final answer. +should win over the generic fallback. For example, `cancelled`, +`compression_exhausted`, `tool_limit_reached`, and `no_response` should not be +flattened into a plain `error` only because the turn also failed to produce a +final answer. ## Long-Running Edge Cases ### Auto Compression -Auto Compression is a context lifecycle transition, not a tool call and not final -answer content. +Auto Compression is a context lifecycle transition, not a tool call and not +final answer content. Expected behavior: @@ -171,26 +267,24 @@ Expected behavior: remain understandable without turning compression into the main transcript. - Do not keep compression status text in the settled transcript unless it explains an error or recovery state. +- If compression fails to create enough room, surface `compression_exhausted` + or another specific terminal outcome instead of normal completion. - Compression success in the UI does not by itself prove model-facing context - was pruned; that remains a separate runtime/context invariant. - -### Very long final answers - -Long-running sessions can end with a final answer that is itself lengthy. + was pruned; that remains a runtime/context invariant covered by the run-state + consistency contract. -Expected behavior: +Confirmed follow-up scope: -- The final answer remains the primary settled assistant content. -- Supporting activity stays above it and collapsed by default. -- Streaming and settle transitions should not jump the user away from the final - answer or make the answer look like tool output. -- Any additional collapse, preview, or navigation affordance for very long final - answers should preserve the full answer as ordinary assistant prose. +- Add or standardize an explicit per-pass compression completion event if the + UI otherwise has to infer completion from later stream events. +- Keep compression-exhausted/no-final handling aligned with + [#3315](https://github.com/nesquena/hermes-webui/issues/3315) and + [#3316](https://github.com/nesquena/hermes-webui/pull/3316). -### Tool-call and retry ceilings +### Tool-call, retry, and iteration ceilings -Long-running sessions can exhaust tool-call limits, retry budgets, or iteration -ceilings before a final answer is available. +Long-running sessions can exhaust tool-call limits, retry budgets, or +iteration ceilings before a final answer is available. Expected behavior: @@ -198,10 +292,12 @@ Expected behavior: - Preserve the readable work history that led to the limit. - Keep the final area honest: show that the run stopped because a limit was reached rather than inventing a final answer. -- Do not persist internal control prompts as ordinary user-visible transcript - content. +- Internal continuation or control prompts used by the runtime must not persist + as ordinary user-authored transcript content. +- The product state should not depend on whether the limit came from provider + policy, Hermes Agent iteration budget, or WebUI adapter/runtime policy. -### No-final and provider failure +### No-final answer and provider failure Tool-heavy turns can end with tool output, provider failure, or no usable final assistant message. @@ -209,10 +305,38 @@ assistant message. Expected behavior: - Detect the absence of a final assistant answer at settle time. -- Surface a terminal state such as no response, interrupted, compression - exhausted, tool limit reached, or error. +- Surface a terminal state such as `no_response`, `interrupted`, + `compression_exhausted`, `tool_limit_reached`, or `error`. - Do not mark the turn completed only because some assistant/tool activity occurred earlier. +- Do not treat internal context-compaction reference material as a final + assistant answer. + +### Cancel and interruption + +Cancel is a user-visible terminal action, not just browser cleanup. + +Expected behavior: + +- If the user cancels before the run fully starts, the backend still reconciles + against the live worker state where possible. +- If the user cancels after live text, reasoning, or tools have appeared, + already-visible work should not be silently lost. +- The frontend cancel path should close the SSE source it owns and only clear + busy state for the stream it actually cancelled. +- A cancelled turn should settle as `cancelled`, not as provider `no_response`. +- A network or worker interruption should settle as `interrupted` or restoring, + not as normal completion. + +Classification: + +- The early startup cancel race tracked by + [#3475](https://github.com/nesquena/hermes-webui/issues/3475) is addressed by + [#3476](https://github.com/nesquena/hermes-webui/pull/3476). +- The owner-aware browser cancel cleanup tracked by + [#3344](https://github.com/nesquena/hermes-webui/issues/3344) and + [#3345](https://github.com/nesquena/hermes-webui/pull/3345) remains a + focused follow-up. ### Reconnect and session switch @@ -225,103 +349,151 @@ Expected behavior: - Slow rebuild should be visibly restoring or degraded, not blank. - Sidebar/session metadata should not point the user at a stale or wrong active session. +- Replay should use the same visible lifecycle as live rendering rather than a + flattened alternate presentation. -### User intervention +Confirmed follow-up scope: + +- A clearer restoring/degraded state during slow reattach. +- Native `Last-Event-ID` or equivalent reconnect cursor support when it is + ready to replace or complement the current replay cursor path. +- Additional tests that prove live and replay use the same lifecycle for + process prose, tool rows, compression status, and terminal states. + +### Tool-only or low-prose runs -During long-running work, the user may need to queue follow-up input, steer the -current direction, or interrupt the run. +Some valid long-running turns may produce little or no visible process prose +before the final answer, especially when the model runs a dense sequence of +tools. Expected behavior: -- These controls should not corrupt the live-to-final reply lifecycle. -- Queue/steer/interrupt command semantics should be defined in a separate - control-surface contract. -- This RFC only requires that live-session controls preserve clear ownership, - terminal outcomes, and replayable state. +- The UI should not fabricate assistant prose. +- Tool activity should remain readable enough that the turn does not look + empty or broken. +- Empty placeholders should be filtered rather than rendered as blank prose. +- If no final answer arrives, the terminal state should explain that outcome + instead of leaving only a tool list. -## Delivery Plan +### Very long final answers -### Slice 1: live-to-final reply lifecycle +Long-running sessions can end with a final answer that is itself lengthy. -The first implementation slice is represented by #3401. It should demonstrate -the core reply model: +Expected behavior: -- live process text is primary, -- tool activity is quiet and progressively disclosed, -- running-only compression status is transient, -- the settled activity summary appears above the final answer, -- settle-time rendering does not falsely present a no-final turn as completed, -- replay and reattach rebuild the same visible structure, -- stream ownership fixes are limited to preserving the visible turn's ownership, - terminal events, and replay. +- The final answer remains the primary settled assistant content. +- Supporting activity stays above it and collapsed by default. +- Streaming and settle transitions should not jump the user away from the final + answer or make the answer look like tool output. +- Any additional collapse, preview, outline, or navigation affordance for very + long final answers must preserve the full answer as ordinary assistant prose. -This slice should use `Refs #3400`; it should not close the umbrella issue. +### Produced artifacts and output handoff -### Slice 2: terminal and recovery stabilization +Long-running sessions often create or update files in the workspace, such as +plans, reports, patches, data files, generated markdown, or other artifacts. +Those artifacts are part of what the user needs from the completed work, even +when they are not the final answer text itself. -The next slice should close the edge cases that make long-running sessions look -misleading after they stop or recover: +Expected behavior: -- cancelled and interrupted final rendering, -- compression-exhausted terminal rendering, -- tool-limit / max-retry terminal rendering, -- no-final-answer provider failure classification, -- explicit restoring/degraded state during slow reattach, -- empty process placeholders that make tool-only runs look broken, -- live-vs-settled label clarity for tool activity. +- Existing artifact surfaces, such as the session Artifacts tab and + `workspace://` links, remain supporting navigation surfaces rather than + replacing the final answer. +- If a turn creates or edits workspace artifacts, the settled reply should not + hide the fact that those artifacts exist or make them impossible to find. +- Reconnect, replay, session switching, cancel, interruption, and no-final + terminal paths should preserve enough tool/artifact metadata to rebuild the + same artifact handoff. +- A terminal failure should still distinguish between "no final answer" and + "some artifacts were produced before the run stopped". +- Large generated files or rich artifact types should route through the + workspace/artifact preview model instead of being expanded into the main chat + transcript by default. + +Confirmed follow-up scope: + +- Keep artifact recoverability aligned with the session-scoped Artifacts tab + work in [#2655](https://github.com/nesquena/hermes-webui/issues/2655) and + [#2673](https://github.com/nesquena/hermes-webui/pull/2673). +- Keep final-answer artifact links aligned with the `workspace://` preview + path from [#2881](https://github.com/nesquena/hermes-webui/issues/2881) and + [#2938](https://github.com/nesquena/hermes-webui/pull/2938). +- Treat interrupted/cancelled tool-history loss, such as + [#3528](https://github.com/nesquena/hermes-webui/issues/3528), as a + live-to-final recoverability bug when it prevents artifact reconstruction. + +### Sidebar and session ownership + +Long-running sessions are not only a chat-pane concern. The sidebar and session +metadata help users find active work and later terminal outcomes. -### Slice 3: live-session control surface +Expected behavior: -The next adjacent product area is user intervention during live work: +- A session row's running indicator should reflect a real active run or a + clearly restorable state, not stale persisted metadata alone. +- Background completion, cancellation, or failure should be represented without + stealing the visible pane from the user. +- Session switching should not erase pending live context, in-flight snapshots, + tool history, or terminal outcome state. +- Maintenance writes, stale cleanup, and background repair should not make old + sessions look newly active unless meaningful user/assistant activity happened. -- queue follow-up input while a turn is running, -- steer a live turn without losing ownership of the current reply, -- interrupt a live turn and preserve the user's corrective intent, -- define busy-input defaults and prompt visibility, -- ensure these controls replay and settle into the same terminal model. +### User intervention -This slice should reference the existing busy-input / CLI-parity history, but it -should be designed as a control-surface contract rather than as a reply-content -change. +During long-running work, the user may queue follow-up input, steer the current +direction, or stop the run and send a replacement. -### Slice 4: session and protocol integration +Expected behavior: -Broader integration work should stay separate from the reply-content model: +- These controls should not corrupt the live-to-final reply lifecycle. +- Queue/Steer/Stop-and-send/Interrupt command semantics should be defined in a + separate control-surface contract. +- This RFC only requires that live-session controls preserve clear ownership, + terminal outcomes, and replayable state. -- native `Last-Event-ID` or equivalent reconnect cursor support, -- sidebar/session awareness for active long-running work, -- session-list disappearance or stale-session repair, -- shared tool display-title normalization across legacy live stream, persisted - tool calls, replay, gateway paths, and future adapter/runner paths. +The current child contract is tracked by +[#3058](https://github.com/nesquena/hermes-webui/issues/3058) and +[#3061](https://github.com/nesquena/hermes-webui/pull/3061). That child RFC +should own questions such as: -These are important follow-ups, but they should not be mixed into the first -reply-lifecycle implementation slice. +- whether Queue is browser-backed or server-backed in each slice, +- when Queue can upgrade to Steer, +- what Stop-and-send means, +- how delivered vs applied Steer is represented, +- what happens to leftover Steer after the run ends. -## Review Checklist +## Delivery And Follow-Up Map -Use this checklist when reviewing PRs against this RFC: +Use this map to keep implementation PRs and child RFCs scoped. The "vehicle" +column names a durable track, not live merge state; the tracking issue +[#3400](https://github.com/nesquena/hermes-webui/issues/3400) is authoritative +for current open/merged/superseded status. -- Does the change preserve long-running session readability? -- Does live process text stay primary over tool metadata? -- Are tool details available without becoming the main transcript? -- Does the final answer remain separate from supporting activity? -- Are compression, no-final, tool-limit, cancel, and interrupt outcomes - classified honestly? -- Does reconnect/session switch rebuild the same reply lifecycle? -- Do internal recovery or control messages stay out of ordinary chat content? -- Is the PR's slice clear: lifecycle, terminal/recovery, live controls, or - session/protocol integration? +| Track | Scope | Current vehicle | +| --- | --- | --- | +| Parent product RFC | Define the long-running live-to-final assistant reply lifecycle and review checklist. | This RFC; tracking issue [#3400](https://github.com/nesquena/hermes-webui/issues/3400). | +| First reply lifecycle implementation | Live process prose, quiet tool activity, settled activity summary above final answer, replay/reattach consistency, live-only compression status, supporting stream ownership fixes. | [#3401](https://github.com/nesquena/hermes-webui/pull/3401). | +| Terminal/no-final stabilization | Compression exhausted, tool-tail/no-final transcript shape, context-compaction marker suppression, terminal error routing. | [#3315](https://github.com/nesquena/hermes-webui/issues/3315), [#3316](https://github.com/nesquena/hermes-webui/pull/3316). | +| Cancel ownership hardening | Frontend cancel should close its own SSE source and clear only its own busy state. | [#3344](https://github.com/nesquena/hermes-webui/issues/3344), [#3345](https://github.com/nesquena/hermes-webui/pull/3345). | +| Early-cancel startup race | Backend cancel should still interrupt the worker when the SSE registry detached before startup fully settled. | [#3475](https://github.com/nesquena/hermes-webui/issues/3475), [#3476](https://github.com/nesquena/hermes-webui/pull/3476). | +| Pending-intent control surface | Queue, Steer, Stop-and-send, Interrupt, delivered/applied/leftover semantics. | [#3058](https://github.com/nesquena/hermes-webui/issues/3058), [#3061](https://github.com/nesquena/hermes-webui/pull/3061). | +| Reattach and replay polish | Slow rebuild degraded state, replay/body timing, native cursor support, same lifecycle through replay. | Follow-up issue/PR or child RFC if protocol semantics expand. | +| Tool-limit and max-iteration terminal state | Limit reached state, control prompt visibility, no fake final answer. | Follow-up issue/PR; may involve Hermes Agent if the runtime owns the limit signal. | +| Artifact handoff and recoverability | Preserve the link between final/terminal replies and workspace artifacts created or edited during the turn. | Existing Artifacts and `workspace://` surfaces; follow-up issue/PR when replay, cancel, or terminal paths lose artifact metadata. | +| Sidebar/session ownership | Active/terminal state in session rows, stale spinner repair, session-list disappearance, background terminal feedback. | Follow-up issue/PR under session/runtime contracts. | +| Very long final answer ergonomics | Optional navigation/outline/preview affordances that preserve the final answer as normal prose. | Open product discussion; no implementation vehicle yet. | ## Relationship To Existing Contracts -This RFC sits above the current run-state and adapter contracts: +This RFC sits above the current runtime, recovery, and adapter contracts: - [`webui-run-state-consistency-contract.md`](webui-run-state-consistency-contract.md) defines how transcript, context, stream, replay, compression, and session metadata stay coherent. - [`canonical-session-resolution.md`](canonical-session-resolution.md) defines - how URL, local browser state, sidebar rows, and compression lineage resolve to - one visible session target. + how URL, local browser state, sidebar rows, and compression lineage resolve + to one visible session target. - [`turn-journal.md`](turn-journal.md) defines crash-safe submitted-turn and interrupted-turn recovery semantics. - [`hermes-run-adapter-contract.md`](hermes-run-adapter-contract.md) defines @@ -330,11 +502,48 @@ This RFC sits above the current run-state and adapter contracts: This RFC defines the product meaning those lower-level contracts need to preserve for long-running assistant replies. +The pending-intent control-surface RFC tracked by +[#3058](https://github.com/nesquena/hermes-webui/issues/3058) and +[#3061](https://github.com/nesquena/hermes-webui/pull/3061) should be treated +as a child contract: it can define user intervention semantics without +redefining the live-to-final reply lifecycle. + +## Review Checklist + +Use this checklist when reviewing PRs against this RFC: + +- Does the change preserve long-running session readability? +- Does live process text stay primary over tool metadata? +- Are tool details available without becoming the main transcript? +- Does the final answer remain separate from supporting activity? +- Are compression, no-final, tool-limit, cancel, and interrupt outcomes + classified honestly? +- Does reconnect/session switch rebuild the same reply lifecycle or degrade + explicitly? +- If the turn produced workspace artifacts, can the user still find them after + settle, replay, reconnect, cancel, or terminal failure? +- Do internal recovery or control messages stay out of ordinary chat content? +- Does sidebar/session state agree with the visible active or terminal turn? +- Is the PR's slice clear: lifecycle, terminal/recovery, cancel ownership, + live controls, sidebar/session ownership, or protocol integration? +- If the change belongs to Queue/Steer/Stop-and-send/Interrupt, is it routed to + the child control-surface RFC instead of being hidden inside this parent RFC? + ## Open Questions -- Should very long final answers need additional navigation or preview - affordances beyond the standard chat transcript behavior? -- Should repeated compression passes in one turn be shown as separate transient - statuses or summarized into one compression lifecycle marker? -- Should queue, steer, and interrupt receive a dedicated public control-surface - RFC, or should that contract live inside the existing adapter/control RFC? +Open questions are limited to product choices that are not already decided by +this RFC, an active implementation PR, or a child RFC. + +- Should very long final answers gain additional navigation, outline, or + preview affordances beyond standard chat transcript behavior? If yes, what + threshold triggers them and how do they preserve the answer as ordinary + assistant prose? +- When a turn produces multiple workspace artifacts, should the final answer + include an automatic artifact summary or navigation affordance, or should the + product rely on the existing Artifacts tab and explicit `workspace://` links? +- What is the minimum sidebar signal for background long-running sessions that + have completed, failed, cancelled, or need attention while the user was + viewing another session? +- Which terminal outcomes should offer inline recovery actions, such as retry, + continue, inspect details, or reopen from checkpoint, and which should remain + informational only?