Skip to content

Fix compression-exhausted stream finalization#3316

Open
franksong2702 wants to merge 2 commits into
nesquena:masterfrom
franksong2702:franksong2702/fix-auto-compression-tool-heavy-streams
Open

Fix compression-exhausted stream finalization#3316
franksong2702 wants to merge 2 commits into
nesquena:masterfrom
franksong2702:franksong2702/fix-auto-compression-tool-heavy-streams

Conversation

@franksong2702
Copy link
Copy Markdown
Contributor

@franksong2702 franksong2702 commented Jun 1, 2026

Thinking Path

  • Long single-turn, tool-heavy sessions can exhaust context after Hermes Agent fails to compress effectively.
  • WebUI must distinguish streamed interim/progress text from a real final assistant answer.
  • A persisted transcript ending in a tool result, assistant tool-call turn, or internal context-compaction reference marker is not a completed answer.
  • The UI should surface compression exhaustion as an error and keep internal reference-only summaries out of the settled transcript.

What Changed

  • Classifies compression exhaustion errors from agent/provider result text.
  • Treats failed, partial, compression_exhausted, tool-tail transcripts, assistant tool-call tails, and context-compaction marker tails as terminal failures instead of completed turns.
  • Removes the _assistant_added short-circuit so final-answer validation always checks the persisted transcript.
  • Keeps [CONTEXT COMPACTION — REFERENCE ONLY] content out of settled transcript rendering while preserving transient running compression status.
  • Adds regression coverage for terminal failure detection, context-compaction marker filtering, and final-answer semantics.
  • Updates the changelog for the user-visible behavior change.

Why It Matters

This prevents long tool-heavy sessions from appearing completed when Hermes Agent stopped before writing a final assistant answer. It also prevents internal context-compaction reference text from being rendered as user-facing final content.

Related to #3315, NousResearch/hermes-agent#36624, and NousResearch/hermes-agent#36626.

Verification

  • python -m pytest tests/test_auto_compression_terminal_failure.py tests/test_auto_compression_card.py tests/test_issues_373_374_375.py tests/test_issue765_streaming_persistence.py::TestIssue765FollowupHardening::test_silent_failure_path_does_not_reacquire_agent_lock -q
  • node --check static/ui.js
  • node --check static/messages.js
  • git diff --check

Risks / Follow-ups

  • This is the WebUI companion to the Hermes Agent compression fix; it does not by itself reduce agent-side context size.
  • The change intentionally avoids rendering internal compaction reference text in settled transcripts, while still allowing transient compression status during active runs.

Contract Routing

  • Contract family: runtime streaming finalization and session transcript visibility.
  • Evidence: focused regression tests for terminal failure detection, final-answer semantics, and settled transcript rendering.
  • Contract change: none intended; this restores the invariant that completed UI state requires a real final assistant answer.

Model Used

AI-assisted implementation with OpenAI GPT-5 Codex in a local coding workflow. The assistant inspected repository code, wrote targeted tests, implemented the fix, and ran the verification commands above.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Triage: hold + changes-requested — thanks @franksong2702. The goal is right (surfacing compression-exhaustion as a real error instead of a falsely-"completed" turn) and the classifier itself is sound — I verified _session_lacks_final_assistant_answer() empirically: it correctly returns HAS-final-answer for a normal completed turn and a tool→final-text turn, and terminal-failure only for tool-result/tool-call/empty-assistant/marker tails. I picked this up to advance toward release and ran it through the full gate (test suite + Opus + the Codex regression gate). Opus cleared it, but the Codex gate caught a state-consistency ordering bug on the compression path that I then confirmed against the code, so I'm holding it rather than shipping.

(CORE) Terminal-failure handling can run BEFORE the compression session-id migration, leaving frontend/backend session state inconsistent when compression exhaustion fires after the agent rotated session_id.

The new terminal-failure check is at api/streaming.py:5439-5443:

_terminal_failure = (
    _agent_result_terminal_failure(result)
    or _session_lacks_final_assistant_answer(_all_result_messages)
)
if _terminal_failure:
    _assistant_added = False
if _terminal_failure or (not _assistant_added and not _token_sent):
    ... # emits the apperror + returns

But the compression session-id migration + snapshot preservation block runs much later, at api/streaming.py:5680+:

_preserve_pre_compression_snapshot(s, old_sid)   # ~5680
... # migrate locks/cache, register continuation, emit `compressed`, save against new sid

Hermes Agent rotates agent.session_id during compression (agent/conversation_compression.py:505-520); WebUI only mirrors that rotation in the 5680+ block. So on a compression-exhausted result that arrives after the rotation, the terminal-failure path at 5443 emits the error + returns before 5680 ever runs — which means:

  • _preserve_pre_compression_snapshot() is skipped → the pre-compression history may not be archived
  • continuation/session migration is skipped → the error transcript is saved against the old WebUI session id
  • the frontend apperror path appends a synthetic error to the old activeSid, not the migrated continuation session
    → frontend/backend session state diverges (this is the same compression-rotation subsystem where we recently held fix: keep gateway context visible in chat transcripts #3300 for a transcript-loss regression, so we're being extra careful here).

Suggested fix (needs your design call — I didn't want to hot-patch compression-rotation ordering under release pressure):

  1. Factor the compression-rotation side-effect block (_preserve_pre_compression_snapshot + lock/cache migration + continuation registration) into a helper and run it BEFORE any return from the terminal-failure apperror path — OR move the terminal-failure check below the compression migration.
  2. Persist the terminal error on the migrated continuation session, and in the frontend (static/messages.js apperror handler) adopt the settled/migrated session like the done path does, instead of only pushing a local synthetic error into the old active session.
  3. Add a regression test for: compression exhausted AFTER session-id rotation → assert the snapshot is preserved, the continuation session is registered, and the error lands on the migrated session.

Everything else is good — the compression_exhausted classification, the label cascade, and the frontend clear-compression-UI handling are all correct. One small non-blocking note from Opus: _classify_provider_error only inspects the error string, so if a future agent path sets result['compression_exhausted']=True with an empty/non-matching error message it falls back to the generic "No response from provider" label (your included test sets both fields, so the current path is covered).

The rest of this session's transcript/streaming fixes (#3102 edit-replay, #3321 recovery-control filter) already shipped, so this isn't a wholesale rejection — just this one ordering interaction to sort out. Happy to pair on the helper-extraction if useful. No rush.

@nesquena-hermes nesquena-hermes added hold changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address labels Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changes-requested Maintainer left detailed feedback requesting changes; PR is waiting on author to address hold

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants