fix(todos): stream live state updates with inflight recovery by v2psv · Pull Request #3454 · nesquena/hermes-webui

v2psv · 2026-06-03T00:59:59Z

Bug Description

Follow-up to the cold-load todo state work from #3373.

The Todos panel can now hydrate correctly after a page refresh, but it still does not consistently track the current todo tool state while an agent run is active.

Before this PR:

todo tool calls updated the agent-side TodoStore, but the side panel usually stayed stale until the run settled.
A browser reload / SSE reattach during an active stream could temporarily lose or roll back the visible todo snapshot.
Cold-load session state and persisted INFLIGHT state did not share one explicit reconciliation path, so stale local recovery data could win over the settled session snapshot in some edge cases.

Root Cause

The panel historically derived todo state by reverse-scanning settled tool messages in S.messages.

That works after a completed turn, but not during a live run:

live tool completions are tracked in stream / INFLIGHT structures before settled tool messages appear in S.messages;
tool_complete.preview is truncated and not a durable structured state contract;
cold-load session hydration and INFLIGHT recovery had no single source of truth or timestamp reconciliation rule.

Fix

This PR adds an explicit todo-state contract across server, SSE, and frontend state:

Emits a dedicated todo_state SSE event when the todo tool completes.
Sends full, redacted, idempotent todo snapshots instead of relying on truncated tool_complete.preview.
Reuses the same api.todo_state parser / normalizer for live emission and cold-load state.
Updates the frontend to treat S.todos + S.todoStateMeta as the Todos panel source of truth.
Persists live todo snapshots into INFLIGHT state so reload / reattach can restore the panel before the next event arrives.
Reconciles cold-load vs INFLIGHT snapshots by timestamp, including the coldTs === 0 edge case for compressed sessions.
Clears stale session.todo_state sidecars when a fresh messages=1 response does not include one.
Keeps the legacy reverse-scan fallback for old servers / upgrade windows.
Adds a CHANGELOG entry for the user-visible behavior fix.

How to Verify

Manual verification path:

Start an agent run that calls the todo tool multiple times.
Keep the Todos panel open while the stream is still active.
Confirm the panel updates after each todo snapshot instead of waiting for the final assistant response.
Reload / reattach during the active stream.
Confirm the panel restores the current todo snapshot from INFLIGHT and does not flicker back to an older list.
Open the same session cold after the run settles.
Confirm the panel hydrates from the server-provided session.todo_state.

Test Plan

Ran locally after rebasing onto origin/master:

/root/hermes-webui/.venv/bin/python -m pytest \
  tests/test_todo_state.py \
  tests/test_todo_state_emission.py \
  tests/test_streaming_todo_state_static.py \
  tests/test_session_todo_state_route.py \
  tests/test_todo_live_frontend_static.py \
  tests/test_todo_panel_cold_load_static.py \
  tests/test_security_redaction.py \
  -q

Result:

83 passed, 1 skipped

Also ran:

/root/hermes-webui/.venv/bin/python scripts/ruff_lint.py --diff origin/master

Result:

ruff_lint: no new violations on added/modified lines. OK.

Additional local hygiene checks:

git diff --check origin/master..HEAD

No whitespace errors or conflict markers found.

Risk Assessment

Low / Medium.

The change is scoped to Todos panel state propagation and recovery. The new SSE event is additive, and the frontend keeps the legacy message-scan fallback for compatibility with older servers.

Risk is mainly in state reconciliation:

out-of-order SSE replay is guarded by timestamp checks;
cross-session events are filtered by session_id;
malformed or non-todo payloads are ignored;
live SSE snapshots are redacted before emission;
cold-load snapshots continue to pass through session response redaction.
The PR intentionally keeps the detector symmetric with the existing agent TodoStore hydration logic so the WebUI panel does not disagree with the agent-side state.

nesquena-hermes · 2026-06-03T02:58:41Z

Reading the full diff at 888db2b5 — api/todo_state.py, api/streaming.py, and the four frontend files — this is a well-structured change. The single-source-of-truth model (S.todos + S.todoStateMeta sentinel in static/ui.js, fed by the todo_state SSE listener / INFLIGHT restore / cold-load), the latest-by-position vs. recency reconciliation in _hydrateTodosFromSession, and the fail-closed redaction on the live path all line up with the agent's _hydrate_todo_store contract. I verified that against the agent side: run_agent.py:2670-2699 walks history in reverse, breaks on the first role='tool' message with a todos list, and crucially does if last_todo_response: — so an empty [] write leaves the store empty. Your _normalize_snapshot docstring and the _legacyTodosFromMessages change (dropping the d.todos.length guard) keep the panel symmetric with that. Good.

One thing I'd flag before merge — a hot-path perf regression in the redaction helper.

Code reference

api/todo_state.py:213-236 (PR HEAD):

def _redact_snapshot(snapshot: dict) -> dict:
    from typing import cast
    from api.helpers import _redact_value
    return cast(dict, _redact_value(snapshot))

_redact_value is called without the _enabled kwarg, so it defaults to None. Tracing into api/helpers.py:354-366 → _redact_text at helpers.py:334-351:

def _redact_text(text, *, _enabled=None):
    if _enabled is None:
        from api.config import load_settings
        _enabled = bool(load_settings().get("api_redact_enabled", True))

So every string in the snapshot triggers its own load_settings(). And load_settings() (api/config.py:5039) is not memoized — each call does SETTINGS_FILE.read_text() plus a get_config() resolve.

Why it matters here

emit_todo_state fires on the live streaming path, in both tool-callback shapes (api/streaming.py:4777 and :4874). A todo snapshot has ~3 strings per item (id, content, status), so a 20-item list = ~60 load_settings() disk reads per todo write. A planning-heavy run with 10-30 todo calls multiplies that into hundreds-to-thousands of settings.json reads across one turn — on the SSE emit path, which is exactly where the existing "Opus pre-release perf fix" added the _enabled threading to redact_session_data (helpers.py:369-391) to avoid.

Recommendation

Thread the setting once, mirroring redact_session_data:

def _redact_snapshot(snapshot: dict) -> dict:
    from typing import cast
    from api.helpers import _redact_value
    from api.config import load_settings
    _enabled = bool(load_settings().get("api_redact_enabled", True))
    return cast(dict, _redact_value(snapshot, _enabled=_enabled))

That's one read per emission instead of one per string, and it keeps the fail-closed behavior intact (a raising load_settings() still propagates out of emit_todo_state's try and drops the event).

Test note

tests/test_todo_state_emission.py::test_emit_todo_state_redacts_before_sse monkeypatches helpers._redact_value directly, so it won't catch the _enabled threading either way — the fix above keeps that test green. If you want a guard against the regression returning, an ast-based assertion (like tests/test_streaming_todo_state_static.py already does for the emit calls) that _redact_snapshot passes _enabled= would lock it in.

Everything else looks solid — CI is green across the 3.11/3.12/3.13 matrix and the cross-session session_id/activeSid double-filter on the SSE listener matches the other live listeners.

nesquena-hermes · 2026-06-03T07:24:25Z

Re-read api/todo_state.py:213-240 and tests/test_todo_state_emission.py at the new HEAD — the perf regression I flagged is resolved correctly.

_redact_snapshot now reads the setting once and threads it through the recursive helper:

from api.config import load_settings
from api.helpers import _redact_value
_enabled = bool(load_settings().get("api_redact_enabled", True))
return cast(dict, _redact_value(snapshot, _enabled=_enabled))

That's one settings.json read per SSE emission instead of one per string, and the fail-closed behavior is preserved — a raising load_settings() still propagates out of emit_todo_state's try and drops the event.

The new test_emit_todo_state_threads_redaction_enabled_flag_once locks it in well: it asserts settings_calls == ["load_settings"] (exactly one read for a multi-string snapshot, which would have been 3+ on the old per-string path) and redaction_enabled_args == [False] (the flag is actually threaded, not re-derived). The existing test_emit_todo_state_redacts_before_sse was also updated to the *, _enabled=None signature so it stays green.

No further concerns from me on the redaction path. LGTM.

v2psv force-pushed the feat/todos-live-updates branch 2 times, most recently from 236e527 to 888db2b Compare June 3, 2026 01:55

fix(todos): stream live state updates with inflight recovery

6de7667

v2psv force-pushed the feat/todos-live-updates branch from 888db2b to 6de7667 Compare June 3, 2026 07:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(todos): stream live state updates with inflight recovery#3454

fix(todos): stream live state updates with inflight recovery#3454
v2psv wants to merge 1 commit into
nesquena:masterfrom
v2psv:feat/todos-live-updates

v2psv commented Jun 3, 2026

Uh oh!

nesquena-hermes commented Jun 3, 2026

Uh oh!

nesquena-hermes commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

v2psv commented Jun 3, 2026

Bug Description

Root Cause

Fix

How to Verify

Test Plan

Risk Assessment

Uh oh!

nesquena-hermes commented Jun 3, 2026

Code reference

Why it matters here

Recommendation

Test note

Uh oh!

nesquena-hermes commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants