Skip to content

fix(sessions): hit /api/session/new fast path on cold start (#2518 follow-up)#3410

Closed
franksong2702 wants to merge 5 commits into
nesquena:masterfrom
franksong2702:franksong2702/issue-2518-active-provider-fallback
Closed

fix(sessions): hit /api/session/new fast path on cold start (#2518 follow-up)#3410
franksong2702 wants to merge 5 commits into
nesquena:masterfrom
franksong2702:franksong2702/issue-2518-active-provider-fallback

Conversation

@franksong2702
Copy link
Copy Markdown
Contributor

Thinking Path

What Changed

File Change
static/sessions.js newSession() now falls back through window._activeProvider (then S.session.model_provider) when the dropdown's data-provider is missing/'default', when the persisted state predates provider tracking, or when the dropdown is unhydrated at boot.
tests/test_issue2518_active_provider_fallback.py (new) 7 cases: 4 source-shape checks for the fallback chain + ordering + provenance, 2 end-to-end fast-path verifications, 1 negative case that the slow path still fires when no provider is available.
tests/test_new_chat_default_model_frontend.py test_new_session_posts_picker_model_before_server_default rewritten from a literal-string snapshot into a behavior-contract assertion (per AGENTS.md change-detector guidance): the contract is now "reqBody.model_provider is the explicit picker value, with _activeProvider and S.session.model_provider as ordered fallbacks."
CHANGELOG.md New [Unreleased] Fixed entry, opening with the d5dcd60/#872 phrase "New conversations now resync…" so the existing CHANGELOG literal-snapshot test keeps passing.
docs/pr-media/2518/bench.py (new) Bench harness that produces the numbers in the Verification section. Re-runnable: PYTHONPATH=. .venv/bin/python docs/pr-media/2518/bench.py.

Why It Matters

User-visible behavior: the first + click after server boot (or after
clearing the model catalog cache) is no longer 3-4s slower than subsequent
clicks. State layer touched: the WebUI new-session request path and the
server's _resolve_compatible_session_model_state fast path are now
actually wired together — the fast path has existed since #1855, but the
client rarely reached it because it sent model_provider: null whenever
the dropdown was unhydrated or the persisted state predated provider
tracking.

The slow path is preserved as the safety net for genuinely provider-less
clients (no _activeProvider, no previous session). The fix is purely
additive on the client side and does not change any server contract.

Verification

Bench output (docs/pr-media/2518/bench.py)

======================================================================
CATALOG REBUILD (server-side module timing)
======================================================================

cold_slow  (n=3, get_available_models() on fresh process):
  median: 0.16 ms   min: 0.10   max: 1.00

warm_slow  (n=5, get_available_models() with hot cache):
  median: 0.08 ms   min: 0.08   max: 0.20

======================================================================
FAST PATH (server-side module timing)
======================================================================

cold_fast  (n=10, _resolve_compatible_session_model_state, model+provider supplied):
  median: 0.001 ms   min: 0.000   max: 0.003
  get_available_models() invocations: 0 (expected 0)

======================================================================
HEADLINE DELTA
======================================================================
  cold_slow median: 0.16 ms
  cold_fast median: 0.001 ms
  speedup:          158.5x faster on cold start

  => 1st + click after server boot goes from the cold_slow number
     to the cold_fast number when this PR lands.

======================================================================
SIMULATED COLD REBUILD (with 3.0s monkeypatched catalog delay)
======================================================================
  Why: a fresh dev box with no external API keys completes the
  hardcoded-fallback path in well under 1ms, so the absolute
  numbers above don't represent the production scenario from
  the original #2518 triage (3-4s catalog rebuild when auth
  probing, custom /v1/models, OpenRouter /models, or credential
  pool refresh have to make network calls). This block
  monkeypatches a 3.0s sleep into get_available_models() so the
  before/after picture matches user-reported wall time.

  simulated cold_slow: 3060 ms  (slow path on cold cache)
  simulated cold_fast: 0.00 ms  (fast path, never calls get_available_models())
  observed saving:     3060 ms on the first + click

Reading the two halves together:

  • The first half runs in an isolated env (no external API keys, no
    OpenRouter /models, no credential refresh). The catalog rebuild is
    near-instant, but the 158x speedup between the slow and fast paths
    is the structural gain — fast path skips an entire function call and
    the lock dance around it.
  • The second half monkeypatches a 3.0s time.sleep into
    get_available_models() to approximate the production scenario from
    the original New Conversation button appears unresponsive during cold model catalog resolution #2518 triage. First + click goes from ~3060 ms to
    ~0 ms
    because the patched client never reaches the catalog call at
    all.
  • get_available_models() invocations: 0 in the fast-path block
    proves the contract end-to-end: when the client supplies a truthy
    model_provider, the server does not touch the model catalog on the
    new-session path.

Test suite

$ .venv/bin/python -m pytest \
    tests/test_issue1855_resolve_model_provider_fast_path.py \
    tests/test_issue1855_request_diagnostics.py \
    tests/test_session_model_resolution_on_load.py \
    tests/test_issue2518_new_session_inflight.py \
    tests/test_issue2518_active_provider_fallback.py \
    tests/test_new_chat_default_model_frontend.py \
    tests/test_issue2863_session_index_prime.py \
    tests/test_empty_session_no_disk_write.py \
    -q --timeout=60
48 passed in 3.45s

The 7 new cases in test_issue2518_active_provider_fallback.py are the
direct regression coverage; the other 41 cases confirm the change does
not regress #1855 (fast-path behavior on /api/chat/start etc.), #2528
(in-flight guard), or the d5dcd60/#872 picker-default-provider sync.

Manual smoke

Run python server.py (or ./ctl.sh start), open the UI, click + five
times. The cursor takes the cursor:wait hint on the first click only
(PR #2528's busy state); subsequent clicks of the + button or Cmd+K
shortcut are deduped through the in-flight promise. The wait behind
get_available_models() is gone for any client that has a hydrated
_activeProvider (which is the boot default).

Risks / Follow-ups

  • Provider aliasing risk is low but non-zero. If a user's persisted
    localStorage carries model: "gpt-5.5" from a session that was
    actually served by a different provider than the currently active
    one, the fallback chain could pin the wrong provider on the new
    session. The server's _resolve_compatible_session_model_state
    (lines 1841-1930 of api/routes.py) still runs and the slow-path
    repair branch will normalize a stale openai/gpt-* shape on
    openai-codex, so the worst case is a still-fast request that
    normalizes provider to the active route — exactly what
    S.session.model_provider previously carried. Not a regression.
  • Migration risk for pre-provider localStorage. The legacy
    hermes-webui-model localStorage key (no provider) now falls back
    through the new chain. The first request from a user who has never
    updated their model picker still works because the server's slow path
    is intact; the speedup only kicks in once the dropdown has
    hydrated (i.e. from the second + click onward). The user's
    reported "first slow, then fast" pattern is therefore expected to
    become "always fast" from the first click onward once the picker
    has been touched at least once on the current profile.
  • Follow-up A (already open): the server-side slow path still
    exists for genuinely provider-less clients. A separate PR can
    asynchronously warm the model catalog in the background on boot so
    even a fully unhydrated client gets sub-second first clicks.
  • Follow-up B (out of scope): optimistic client-side render so
    await newSession() doesn't block the composer at all. The new
    session is empty by definition, so the user could see a blank
    composer the moment they click + while the server still does its
    bookkeeping. This is a bigger UX change; deferred.

Model Used

  • Provider: minimax-cn
  • Model: MiniMax-M3
  • Notable tool use: local terminal + pytest for verification; git
    for branch/commit/push; read-only git history traversal (no
    delegate_task sub-agents were used for this change). The
    implementer read _resolve_compatible_session_model_state
    end-to-end before changing the client fallback chain so the server
    contract stays intact, and ran docs/pr-media/2518/bench.py in
    both halves (real isolated env + 3.0s monkeypatched cold rebuild)
    to produce the Verification numbers above.

Cross-references

@franksong2702 franksong2702 force-pushed the franksong2702/issue-2518-active-provider-fallback branch from 66ca780 to 3727e4a Compare June 3, 2026 04:34
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Reading the client diff at static/sessions.js:516-528 against the server contract in api/routes.py (_resolve_compatible_session_model_state:1885-1923, _split_provider_qualified_model:1786-1794), the core fix is correct and the agent-side fast-path contract checks out: lines 1910-1923 do return the inputs verbatim whenever model and a truthy requested_provider are both present and the model isn't @provider:-qualified. Sending window._activeProvider as a fallback genuinely keeps the new-session POST off the cold get_available_models() rebuild. Two smaller notes, one of them worth a guard.

1. Dead term in the fallback chain

reqBody.model_provider=newModelState.model_provider||null||(window._activeProvider||null)||(S.session&&S.session.model_provider)||null;

The middle ||null is a no-op — x || null || y is just x || y. Harmless, but it reads like an accidental paste. Suggest newModelState.model_provider || window._activeProvider || (S.session && S.session.model_provider) || null.

2. Edge case: _activeProvider can mask a cross-provider stale model

The fallback only fires when newModelState.model_provider is null. One way that happens is the _readPersistedModelState() branch (sessions.js:512-515) when the dropdown is unhydrated and the persisted state predates provider tracking. If that persisted model is a slash-qualified foreign slug — e.g. gemini/gemini-2.5 — and the active route has since moved to, say, openai-codex, the new line now attaches _activeProvider and the server takes the fast path:

  • _split_provider_qualified_model("gemini/gemini-2.5") returns ("gemini/gemini-2.5", None) — slash slugs are not treated as explicit providers (only @provider:model is, routes.py:1788).
  • stale_codex_openai_slash_id is False (it only guards openai/... + openai-codex, routes.py:1917-1920; model_prefix here is gemini).
  • So not explicit_provider and not stale_codex_openai_slash_id is True → returns ("gemini/gemini-2.5", "openai-codex", False) verbatim.

That's exactly the cross-provider mismatch the function's own docstring calls out (routes.py:1893-1894: "gemini/... after switching the agent to OpenAI Codex"). On master this path sent model_provider: null, hit the slow path, and the catalog normalization remapped the stale model. The PR trades that normalization for speed — and in this specific stale-slug case the normalization was doing real correctness work, not just being slow.

This is an edge case, not a blocker: the dropdown-selected path (_modelStateForSelect) normally carries a correct provider, and the persisted-stale-slug scenario is narrow. But the PR's tests (tests/test_issue2518_active_provider_fallback.py) only cover bare models (gpt-5.5, claude-opus-4.7) where provider/model are already consistent, so this regression wouldn't surface.

Suggested guard + test

Cheapest fix is client-side: only use the _activeProvider fallback when the persisted/picker model has no provider/ prefix, so a slash-qualified slug keeps null and falls through to the slow-path normalization:

const _bareModel = !/[/]/.test(newModelState.model) && !newModelState.model.startsWith('@');
reqBody.model_provider = newModelState.model_provider
  || (_bareModel ? (window._activeProvider || (S.session && S.session.model_provider)) : null)
  || null;

Alternatively, extend stale_codex_openai_slash_id (routes.py:1917) into a general "slug-prefix provider ≠ requested_provider → fall to slow path" check, which fixes it server-side for every provider pair rather than just codex/openai. Either way, a test pinning _session_model_state_from_request("gemini/gemini-2.5", "openai-codex") → catalog is consulted would lock the behavior down.

franksong2702 pushed a commit to franksong2702/hermes-webui-fork that referenced this pull request Jun 3, 2026
…llback

Closes the open follow-up from nesquena#2518 - addresses the cross-provider
regression flagged in PR nesquena#3410 review: when the persisted state carries
a stale foreign-slug model (e.g. "gemini/gemini-2.5") from a session
served by a different provider than the now-active one, the
window._activeProvider fallback would attach the wrong provider and
let _resolve_compatible_session_model_state's fast path pass it through
without consulting the catalog - silently re-pointing the new session
at the wrong backend.

The new client guard wraps the active-provider fallback in a
_bareModel ternary (rejects '/' and '@' prefixes) so slash-qualified
and @-qualified models keep model_provider=null on the wire and the
slow path's cross-provider normalization still runs. Also drops a
vestigial mid-chain '||null' no-op.

Adds 6 regression tests in test_issue2518_active_provider_fallback.py
(test_slash_qualified_model_keeps_active_provider_behind_guard,
test_at_qualified_model_also_keeps_active_provider_behind_guard,
test_explicit_picker_provider_still_wins,
test_no_op_null_terminal_in_fallback_chain,
test_slash_slug_keeps_provider_null_in_wire_shape,
test_bare_model_uses_active_provider_when_no_picker). Behavior-contract
assertions, not source-string pins, so future refactors of the same
contract still satisfy them.

Builds on nesquena#2528 (in-flight guard) and nesquena#1855 (fast path).
PR body draft: docs/pr-media/2518/PR_BODY.md
@franksong2702
Copy link
Copy Markdown
Contributor Author

Thanks for the careful read of the client/server contract — both points are well taken. Pushed a fix on top of 67704a15 (force-pushed to the same branch):

1. Dead ||null term — dropped. Expression now reads as a clean two-source OR chain with a trailing ||null.

2. Cross-provider slash-slug guard — went with the client-side fix as you suggested. The active-provider fallback is now gated behind a _bareModel ternary:

const _bareModel = !/[/]/.test(newModelState.model) && !newModelState.model.startsWith('@');
reqBody.model_provider = newModelState.model_provider
  || (_bareModel ? (window._activeProvider || (S.session && S.session.model_provider)) : null)
  || null;

Behavior:

  • gemini/gemini-2.5 (persisted stale foreign slug) + active=openai-codex → wire model_provider=null → server slow path runs _split_provider_qualified_model and the catalog normalization in _resolve_compatible_session_model_state repairs the mismatch (the exact case the docstring at routes.py:1891-1894 is designed to fix).
  • @openai-codex:gpt-5.5 (already-qualified) also skips the active-provider fallback, since the server's own _split_provider_qualified_model is the source of truth for that form.
  • Bare model + hydrated active provider (the headline case) still hits the fast path — _bareModel is true, _activeProvider attaches, no catalog rebuild.

Regression coverage. Added a new TestIssue2518FollowupSlashSlugGuard class in tests/test_issue2518_active_provider_fallback.py with 6 cases, all behavior-contract assertions (not source-string pins, per the AGENTS.md change-detector guidance):

  • test_slash_qualified_model_keeps_active_provider_behind_guard — gates _activeProvider inside the _bareModel ? ternary with a / check in the predicate
  • test_at_qualified_model_also_keeps_active_provider_behind_guard — same for @-prefixed models
  • test_explicit_picker_provider_still_wins — ordering preserved (explicit > active > prev-session)
  • test_no_op_null_terminal_in_fallback_chain — no mid-chain ||null no-op
  • test_slash_slug_keeps_provider_null_in_wire_shapegemini/gemini-2.5 + no picker + active=openai-codex → wire shape is null (Python mirror of the JS expression)
  • test_bare_model_uses_active_provider_when_no_pickergpt-5.5 + no picker + active=openai-codex → wire shape is openai-codex (regression guard for the headline case)

Verification:

$ .venv/bin/python -m pytest tests/test_issue1855_resolve_model_provider_fast_path.py \
    tests/test_issue1855_request_diagnostics.py tests/test_session_model_resolution_on_load.py \
    tests/test_issue2518_new_session_inflight.py tests/test_issue2518_active_provider_fallback.py \
    tests/test_new_chat_default_model_frontend.py tests/test_issue2863_session_index_prime.py \
    tests/test_empty_session_no_disk_write.py --timeout=60
54 passed in 2.68s

Also re-ran with Python 3.12 (the CI version): same 54/54. Did not touch any of the pre-existing test files.

Server-side follow-up, separate PR. I'm leaving the general "slug-prefix provider ≠ requested_provider → fall to slow path" generalization of stale_codex_openai_slash_id for a future PR — it changes the server contract (affects every provider pair, not just codex/openai) and would expand the review scope. The client-side guard is enough to keep the slow path's repair running, and the regression test pins that contract.

Happy to adjust if you'd prefer the server-side fix instead, or if there's a third option I missed.

franksong2702 and others added 4 commits June 3, 2026 17:35
…#2518 follow-up)

The frontend in-flight guard (PR nesquena#2528) made repeated + clicks safe but
left a single cold click waiting 3-4s behind get_available_models():

  newSession() carried the dropdown's model_provider as
  reqBody.model_provider. When the dropdown option has no data-provider
  attribute (or its value is 'default') and the persisted state predates
  provider tracking, newModelState.model_provider is null. The server's
  fast path in _resolve_compatible_session_model_state requires both
  model AND a truthy model_provider; without that, the request falls
  into the cold catalog rebuild. The catalog warms after the first
  response, so subsequent clicks are fast.

newSession() now falls back through a 3-step chain:

  1. newModelState.model_provider (explicit picker)
  2. window._activeProvider (boot-hydrated active route)
  3. S.session.model_provider (previous session)

Whenever a usable default exists, the request hits the server's fast
path and stays out of get_available_models() entirely. The slow path
remains the safety net for genuinely provider-less clients.

Closes the open follow-up from nesquena#2518.

Tests:
- tests/test_issue2518_active_provider_fallback.py (new, 7 cases):
  source shape (fallback present, prev-session present, chain order,
  issue reference) + end-to-end (fast path on real + slow path still
  fires without provider).
- tests/test_new_chat_default_model_frontend.py: rewrote
  test_new_session_posts_picker_model_before_server_default from a
  literal-string snapshot into a behavior-contract assertion (chain
  members + ordering), per AGENTS.md change-detector guidance.
…llback

Closes the open follow-up from nesquena#2518 - addresses the cross-provider
regression flagged in PR nesquena#3410 review: when the persisted state carries
a stale foreign-slug model (e.g. "gemini/gemini-2.5") from a session
served by a different provider than the now-active one, the
window._activeProvider fallback would attach the wrong provider and
let _resolve_compatible_session_model_state's fast path pass it through
without consulting the catalog - silently re-pointing the new session
at the wrong backend.

The new client guard wraps the active-provider fallback in a
_bareModel ternary (rejects '/' and '@' prefixes) so slash-qualified
and @-qualified models keep model_provider=null on the wire and the
slow path's cross-provider normalization still runs. Also drops a
vestigial mid-chain '||null' no-op.

Adds 6 regression tests in test_issue2518_active_provider_fallback.py
(test_slash_qualified_model_keeps_active_provider_behind_guard,
test_at_qualified_model_also_keeps_active_provider_behind_guard,
test_explicit_picker_provider_still_wins,
test_no_op_null_terminal_in_fallback_chain,
test_slash_slug_keeps_provider_null_in_wire_shape,
test_bare_model_uses_active_provider_when_no_picker). Behavior-contract
assertions, not source-string pins, so future refactors of the same
contract still satisfy them.

Builds on nesquena#2528 (in-flight guard) and nesquena#1855 (fast path).
PR body draft: docs/pr-media/2518/PR_BODY.md
@franksong2702 franksong2702 force-pushed the franksong2702/issue-2518-active-provider-fallback branch from 67704a1 to 5bdfd2d Compare June 3, 2026 09:36
…ior contract

The previous literal-string assertion
'reqBody.model_provider=newModelState.model_provider||null' pinned
the fallback chain to a single-source shape that no longer matches
the source after PR nesquena#3410's cross-provider slash-slug guard. The
test's intent is behavioral (verify newSession() sends
newModelState.model_provider first, with _activeProvider and
prev-session as ordered fallbacks), but the implementation pinned
syntax.

Per AGENTS.md 'Don't write change-detector tests' and the
behavior-contract template, replace the literal with substring +
ordering checks that survive future refactors of the same contract
(e.g. the _bareModel ternary gate, or a future helper function).

This unblocks CI for PR nesquena#3410 (the source change in that PR is
correct; the assertion just needed the same behavior-contract
upgrade that the existing nesquena#2518 follow-up tests in
test_issue2518_active_provider_fallback.py and
test_new_chat_default_model_frontend.py already use).
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Shipped in v0.51.238 (Release HF) via release PR #3493 — thank you @franksong2702! 🙏

Your cold-start fast-path fix (fill model_provider from window._activeProvider so the first New Conversation click skips the 3–4s catalog rebuild) is now on master and tagged. Picked as a high-impact, small-surface fix to the most-clicked affordance.

One guard added on the way in (Codex pre-release finding): the server fast path passes (model, provider) through without validating the pair, so attaching the active provider to any bare model could silently route to the wrong backend (e.g. bare claude-opus-4.8 while the active provider is openrouter). Added a family-mismatch guard mirroring the server's own bare-prefix→provider map (gpt→openai, claude→anthropic, gemini→google): when the model's known family differs from the fallback provider, model_provider stays null so the slow-path family repair runs. This keeps your perf win for the common matching case while closing the mis-route. Backend behavioral tests confirm fast-path-on-match + slow-path-on-mismatch.

Both reviewers signed off (Codex re-reviewed → SAFE TO SHIP after the guard; Opus had judged the original acceptable). Authorship preserved via Co-authored-by and credited in the CHANGELOG. Closes #2518.

Closing as merged-via-release-stage (recommitted on the stage branch after rebase onto current master).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants