Skip to content

Restore visible WebUI progress contract#3015

Closed
franksong2702 wants to merge 1 commit into
nesquena:masterfrom
franksong2702:franksong2702/restore-visible-progress-contract
Closed

Restore visible WebUI progress contract#3015
franksong2702 wants to merge 1 commit into
nesquena:masterfrom
franksong2702:franksong2702/restore-visible-progress-contract

Conversation

@franksong2702
Copy link
Copy Markdown
Contributor

@franksong2702 franksong2702 commented May 27, 2026

Thinking Path

What Changed

  • Strengthened _WEBUI_PROGRESS_PROMPT so long tool-running WebUI turns should not appear silent.
  • Requires visible progress updates as normal assistant content, not only hidden reasoning.
  • Adds explicit guidance to speak before the first tool batch and after meaningful tool batches before continuing.
  • Adds an explicit guard against many independent tool batches running back-to-back without visible assistant text.
  • Keeps existing safeguards against hidden reasoning, chain-of-thought, secrets, raw logs, and long tool output.
  • Updates the Sprint 42 regression test to lock the stronger wording and reject the old optional you may provide phrasing.

Why It Matters

Frontend replay can preserve visible progress only after the model emits it. Without this prompt contract, some models/providers keep intermediate work in reasoning or tool channels, making the WebUI look silent or making Activity groups appear disconnected from the text.

Contract Routing

Runtime / streaming behavior. This updates the WebUI ephemeral progress instruction only. It does not change SSE reconnect, run-journal replay, persisted session schema, Activity placement, or reasoning echo suppression; those belong to #3005.

Contract Change

The prompt contract for long tool-running WebUI turns is strengthened from optional guidance to explicit visible-progress guidance. Direct answers and very short tasks remain exempt, and hidden reasoning / raw tool logs remain prohibited.

Verification

  • node --check static/ui.js static/messages.js static/sessions.js
  • pytest -q tests/test_sprint42.py::TestRuntimeRouteInjection::test_runtime_provider_forwards_interim_assistant_callback — local clean head: 1 passed.
  • GitHub Actions test (3.11), test (3.12), and test (3.13) pass on commit fc86e278.

Risks / Follow-ups

Model Used

OpenAI GPT-5 Codex. AI assistance was used to inspect the failure mode, split PR scope, implement the prompt/test change, and verify the clean branch.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Reading api/streaming.py:193-201 on origin/master plus the PR diff at api/streaming.py:194-200, this is a small, focused prompt change that strengthens the _WEBUI_PROGRESS_PROMPT from optional ("you may provide") to firm guidance, and explicitly calls out that reasoning/thinking/tool-result channels are not a substitute. The history note in #30142dfe3ffb introducing the contract then 540292a7 (PR #2547) softening it as part of the messaging-alignment work — checks out (git log --since="14 days ago" on api/streaming.py shows both).

Code reference

The new prompt body in api/streaming.py:193-201 looks right:

WebUI progress guidance:
- Match the normal Hermes messaging style, but do not let long tool-running WebUI turns appear silent.
- For long multi-step work that uses tools, provide brief user-visible progress updates as normal assistant content before the next tool batch or after a meaningful finding.
- Do not keep all progress only in reasoning, thinking, or tool-result channels; those are not a substitute for visible interim updates.

The carve-out for short/direct answers is preserved on the last line of the prompt, so this shouldn't introduce browser-only chatter for simple turns. The chain-of-thought / raw-log prohibition is still there too.

The Sprint 42 regression test in tests/test_sprint42.py:396-414 correctly asserts the new phrasing and assertNotIn("you may provide brief user-visible progress updates", ...) locks against accidental rollback to the optional wording. Good belt-and-suspenders.

Diagnosis

The change is well-scoped — one prompt block, one test update, one changelog entry. CI is green across 3.11/3.12/3.13. The interaction with 5b9484b8 ("suppress visible progress echoes") is fine: that commit only dedupes reasoning text that mirrors already-visible assistant output, it doesn't suppress assistant progress lines themselves. So a more chatty model under the new prompt won't get its visible progress eaten as an echo.

One minor caveat worth being explicit about, and #3015's body already notes this: prompt instructions are advisory. Providers that ignore the channel-of-output instruction will still surface progress only in reasoning. The frontend-side mitigation (surfacing reasoning when no visible text exists) is a separate axis not addressed here, and probably correct to keep separate.

Test plan

Beyond the existing static test, a useful manual dogfood once merged:

  1. Start a WebUI turn that forces 3+ tool calls back to back (e.g. "search project, then run X tool, then summarize").
  2. Confirm the transcript shows at least one short visible assistant line between tool batches — not just spinner + reasoning card.
  3. Verify a short direct-answer turn ("what time is it?") still answers without an extra "I'm about to..." line.

LGTM from a code-reading pass.

@franksong2702
Copy link
Copy Markdown
Contributor Author

Closing this draft because the core visible-progress prompt contract has been absorbed into #3401.

#3401 now includes the prompt-side change from this PR: long tool-running WebUI turns should not appear silent, visible progress should be normal assistant content rather than only reasoning/tool output, and regression coverage rejects the old optional you may provide wording.

This keeps the running-session assistant reply redesign in one review unit while avoiding a duplicate prompt-only PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants