Skip to content

feat: always show reasoning effort selector, default off for unrecognized models (#3377)#3431

Open
b3nw wants to merge 4 commits into
nesquena:masterfrom
b3nw:feat/3377-thinking-level-missing
Open

feat: always show reasoning effort selector, default off for unrecognized models (#3377)#3431
b3nw wants to merge 4 commits into
nesquena:masterfrom
b3nw:feat/3377-thinking-level-missing

Conversation

@b3nw

@b3nw b3nw commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

feat: always show reasoning effort selector, default off for unrecognized models (#3377)

Improvement on the #3379 which fixed #3377.

Thinking Path

  • Hermes WebUI allows configuring reasoning-effort levels for models that support thinking.
  • Previously, the thinking/reasoning chip was shown or hidden based on a heuristic: recognized reasoning models got the chip, unrecognized models had it completely hidden.
  • This caused false negatives where custom providers, aggregator-rewritten model IDs (e.g., claude-sonnet-4-6:free), and new model releases would silently hide the selector even though the user might want to enable reasoning.
  • Rather than continuing to chase an ever-growing heuristic checklist, this PR inverts the model: always show the chip, but set the default based on whether the model is positively identified as reasoning-capable.
  • Recognized models (GPT-5+, Claude 4/3.7+, Qwen-3+, DeepSeek, Kimi, etc.) default to "Default" (active reasoning). Unrecognized models default to "None" (off), letting users opt-in on any model.
  • The benefit is that no model is ever locked out of the reasoning feature — the user always has the choice.

What Changed

1. Backend: Always return full effort list with reasoning_default_on flag

In api/config.py (get_reasoning_status):

  • When resolve_model_reasoning_efforts returns an empty list (unrecognized model), fall back to the full VALID_REASONING_EFFORTS list instead of [].
  • supports_reasoning_effort is now always True.
  • Added reasoning_default_on: True when the model was positively identified as reasoning-capable, False otherwise.

2. Frontend: Always show chip, default to "None" for unrecognized models

In static/ui.js:

  • _applyReasoningChip() no longer hides the chip when supported_efforts is empty. The chip is always displayed.
  • When reasoning_default_on is False and no effort has been explicitly set by the user, the chip defaults to "None" (inactive state).
  • _applyReasoningOptions() now shows all effort levels in the dropdown when the supported set is empty (previously hid them all).
  • fetchReasoningChip() error handler defaults to reasoning_default_on: false so the chip remains functional even on API errors.

3. Expanded Test Coverage

In tests/test_reasoning_effort_model_capabilities.py:

  • Updated test_get_reasoning_status_includes_supported_efforts to assert reasoning_default_on is True.
  • Added test_get_reasoning_status_unrecognized_model_still_offers_efforts: verifies that unrecognized models get the full effort list with reasoning_default_on=False.
  • Added test_get_reasoning_status_recognized_model_default_on: verifies that recognized models get reasoning_default_on=True.

4. Changelog Documentation

  • Documented the change in CHANGELOG.md under ## [Unreleased].

Why It Matters

Previously, the heuristic-based hide/show logic was a constant source of false negatives: every new model release, custom provider configuration, or aggregator-rewritten model ID risked hiding the reasoning selector. This PR eliminates that class of bugs entirely by never hiding the chip. Users can always opt into reasoning, and the default is informed but not restrictive.

Verification

Automated tests

Ran the pytest suite targeting reasoning effort model capabilities:

uv run --with pytest --with pyyaml --with cryptography pytest \
  tests/test_reasoning_effort_model_capabilities.py \
  tests/test_custom_provider_bare_model_reasoning.py -v

Result: 28 passed successfully.

Risks / Follow-ups

  • Current effort persistence: If a user sets an explicit effort (e.g., "high") on an unrecognized model and later switches to another unrecognized model, the persisted effort carries over. The frontend only defaults to "None" when no effort is persisted — an existing persisted value is respected. This is intentional; it matches the pre-PR behavior where the CLI's agent.reasoning_effort config is profile-scoped, not model-scoped.
  • Backward compatibility: supports_reasoning_effort is now always True, which changes the API contract. No known consumers rely on this boolean for chip visibility (the frontend uses reasoning_default_on), but third-party integrations should be aware.

AI Usage Disclosure

  • Provider: cursor / anthropic
  • Model: claude opus 4.6 via cursor
  • Tool Use: Explored codebase reasoning-effort flow end-to-end, implemented backend and frontend changes, wrote tests, hotpatched production container for validation, and drafted this PR description.

@b3nw b3nw closed this Jun 2, 2026
@b3nw b3nw reopened this Jun 2, 2026
@b3nw b3nw force-pushed the feat/3377-thinking-level-missing branch from 0789b94 to f17400c Compare June 2, 2026 19:58
…ized models (nesquena#3377)

Instead of hiding the thinking/reasoning effort chip entirely when a model
is not recognized as reasoning-capable, always present the selector with
the full effort scale available. For recognized models (GPT-5+, Claude
4/3.7+, Qwen-3+, DeepSeek, Kimi, etc.) the chip defaults to "Default"
(reasoning active). For unrecognized or ambiguous models the chip defaults
to "None" (reasoning off), letting users opt-in on any model.

Backend: get_reasoning_status() now always returns the full VALID_REASONING_EFFORTS
list in supported_efforts, plus a new reasoning_default_on flag indicating
whether the model was positively identified as reasoning-capable.

Frontend: _applyReasoningChip() always displays the chip; when
reasoning_default_on is false and no effort is persisted, it defaults to
"None". _applyReasoningOptions() shows all effort levels when the supported
set is empty (error fallback).
@b3nw b3nw force-pushed the feat/3377-thinking-level-missing branch from f17400c to 14b7c27 Compare June 2, 2026 20:03
@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Pulled the branch and read get_reasoning_status (api/config.py:2322), the JS chip logic (static/ui.js:1854-1912), and both test files against origin/master. The inversion is clean and the contract is internally consistent — one nuance about the empty-list semantics is worth a look before merge.

The backend change reads correctly

model_recognized = bool(supported_efforts)
if not supported_efforts:
    supported_efforts = list(VALID_REASONING_EFFORTS)
return {
    ...
    "supported_efforts": supported_efforts,
    "supports_reasoning_effort": True,
    "reasoning_default_on": model_recognized,
}

supports_reasoning_effort is now hard-coded True, and reasoning_default_on carries the old recognition signal. I grepped all consumers of supports_reasoning_effort — nothing in static/ reads it anymore (only tests/test_models_dev_reasoning.py:154 and the capability tests assert is True), so pinning it to True is safe and no test asserts the False branch. Good.

Frontend matches

_applyReasoningChip (ui.js:1866) now unconditionally sets wrap.style.display='' and only flips the default value to 'none' when reasoning_default_on is false and no effort is set:

var defaultOn=(meta&&meta.reasoning_default_on!==undefined)?meta.reasoning_default_on:true;
if(!defaultOn&&(!effort||effort==='')){ effort='none'; }

The error/catch path in fetchReasoningChip (ui.js:1898) was updated to {supported_efforts:null,reasoning_default_on:false}, which keeps the chip visible with options intact (since _applyReasoningOptions now shows all when !supported.size) rather than the old hidden state. The modified assertion in test_reasoning_chip_btw_fixes.py ("wrap.style.display='none'" not in fn) correctly locks in "never hide."

One nuance: [] has two meanings upstream

resolve_model_reasoning_efforts returns [] in two semantically distinct cases:

  1. Unrecognized model — genuinely unknown, the case this PR targets. "Show selector, default off, let the user opt in" is exactly right here.
  2. Positively known NOT to support reasoning — the ACP subprocess providers return [] deliberately at api/config.py:2285:
if provider in {"cursor-acp", "copilot-acp"}:
    return []

(and the capability layer returns [] when supports_reasoning is False, config.py:~2197).

After this change, a cursor-acp / copilot-acp session shows a reasoning-effort selector even though that provider can't honor it. The practical harm is low: reasoning_default_on=False means it defaults to "none" and won't send anything unless the user explicitly opts in, and the downstream path is defensive — streaming.py:4943 runs the selected value through parse_reasoning_effort and only attaches reasoning_config when non-None and the agent accepts the param (streaming.py:4974). So a stray opt-in on an ACP model degrades to a no-op, not an error. But it is a control that looks actionable and isn't. If you want to preserve the "positively unsupported" signal, the cheap fix is to keep returning reasoning_default_on=False and a supports_reasoning_effort=False for the ACP set, and have the JS hide only when explicitly false — but that partly re-introduces the heuristic the PR is trying to retire, so it's a judgment call. Flagging it rather than blocking on it.

Tests

The two new cases in test_reasoning_effort_model_capabilities.py cover both the recognized (reasoning_default_on True) and unrecognized (> 0 efforts, reasoning_default_on False) paths by monkeypatching resolve_model_reasoning_efforts. Per cron policy I didn't execute them, but the assertions match the backend logic above. Consider adding one case pinning the ACP-provider expectation either way, so the chosen behavior for case (2) is intentional and regression-guarded.

Overall a sensible inversion — replacing an ever-growing recognition checklist with "always available, smart default" is the right direction for #3377.

@b3nw b3nw force-pushed the feat/3377-thinking-level-missing branch from b2f6201 to 2631940 Compare June 3, 2026 14:02
@b3nw

b3nw commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

updated the implementation, summary of changes below, preformed manual testing to validate. @nesquena-hermes

Changes Made

  1. Upstream Merged: Merged origin/master into feat/3377-thinking-level-missing to bring in get_config_for_profile_home and other recent framework fixes, resolving the WebUI runtime crash.
  2. Subprocess/ACP Prefix Checks: Modified get_reasoning_status in api/config.py to identify explicitly unsupported models whose ID contains a slash with an ACP subprocess provider namespace (e.g. cursor-acp/* or copilot-acp/*), hiding the controls.
  3. Fallback Upstream Model Lookup:
    • Refactored _models_dev_reasoning_efforts to fallback and search upstream providers (openai, anthropic, gemini, google, deepseek, etc.) when capabilities return None under custom proxy providers (such as llm-proxy).
    • Normalizes the lookup model name by stripping any namespace prefixes (e.g., copilot/gpt-4o -> gpt-4o), so it correctly resolves against standard capabilities (e.g. gpt-4o under openai is flagged as supports_reasoning=False).

Automated Tests

Verified the changes locally using:

uv run --with pytest --with pyyaml --with cryptography pytest tests/test_reasoning_effort_model_capabilities.py tests/test_reasoning_chip_btw_fixes.py
  • Result: All 28 tests passed successfully.

Manual Verification

  • Explicitly Unsupported Model (copilot/gpt-4o): Verified that the reasoning chip is completely hidden from the WebUI.
  • Unrecognized/Custom Model (google/gemini-flash-lite-latest): Verified that the reasoning chip is visible and defaults to "None" (opt-in).
  • Supported Model (copilot/claude-sonnet-4.6): Verified that the reasoning chip is visible and defaults to "Default".

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Pulled the updated branch (HEAD 26319406) and read the full get_reasoning_status (api/config.py:2386-2475), the reworked _models_dev_reasoning_efforts fallback (api/config.py:2282-2326), the JS in static/ui.js:1877-1930, and the four new test cases. This cleanly addresses the ACP nuance I raised last round — the cursor-acp/copilot-acp set and the models.dev supports_reasoning=False set now both yield supports_reasoning_effort=False and a hidden chip, and the JS reads that flag directly instead of inferring from list length. The manual matrix you posted (copilot/gpt-4o hidden, google/gemini-flash-lite-latest visible+off, copilot/claude-sonnet-4.6 visible+default) lines up with the code. One real edge case before merge.

The two _models_dev_reasoning_efforts calls can disagree for copilot/lmstudio

get_reasoning_status derives supported_efforts from the primary resolver, then separately re-derives the "positively unsupported" signal by calling _models_dev_reasoning_efforts directly:

supported_efforts = resolve_model_reasoning_efforts(resolve_model, ...)
...
elif resolve_model:
    hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
    metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
    if metadata_efforts == []:
        explicitly_unsupported = True
...
if explicitly_unsupported:
    supported_efforts = []          # <-- overwrites the primary result

The problem: resolve_model_reasoning_efforts (api/config.py:2329-2380) does not route copilot/lmstudio through _models_dev_reasoning_efforts. For copilot/github-copilot it returns github_model_reasoning_efforts(...) and for lmstudio it probes the live endpoint — those are the authoritative sources for those providers and it returns before ever consulting models.dev. But the new elif branch calls _models_dev_reasoning_efforts unconditionally, and your new cross-provider fallback (api/config.py:2299-2316) now resolves the bare model name against standard catalogs:

bare_model = model.rsplit("/", 1)[-1]
standard_providers = ["openai","anthropic","gemini","google","deepseek","xai","mistral","copilot","openrouter"]
for p in standard_providers:
    ...
    caps = get_model_capabilities(provider=p, model=lookup_model)
    if caps is not None:
        capabilities = caps; break

So a copilot model whose GitHub API answer is "reasoning supported" (step-1 returns a non-empty list) but whose bare name matches a supports_reasoning=False entry in some standard catalog (e.g. gpt-4o resolved under openai) gets metadata_efforts == [] from step-2, flips explicitly_unsupported=True, and the non-empty step-1 result is overwritten with [] — hiding a chip the authoritative resolver had enabled. copilot is in PROVIDER_TO_MODELS_DEV (agent/models_dev.py:160 → "github-copilot"), so this path is live, not theoretical.

Recommendation

A non-empty step-1 result is authoritative — the model demonstrably supports reasoning, so it should never be re-marked unsupported. Gate the metadata recovery on the primary resolver having come back empty:

elif resolve_model and not supported_efforts:
    hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
    metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
    if metadata_efforts == []:
        explicitly_unsupported = True

This keeps every passing case you tested (ACP and copilot/gpt-4o both still produce [] at step-1, so the recovery still fires) while preventing the cross-provider bare-name fallback from clobbering an authoritative copilot/lmstudio "supported" answer. The []-collapse-loses-the-distinction problem you're working around only exists when step-1 already returned [], so this guard is exactly the right scope.

Minor: the bare-name fallback's first-match-wins loop is order-sensitive (openai before anthropic before openrouter); for a name that exists under several catalogs the iteration order silently decides. Low risk, but a one-line case in the test file pinning a known collision would lock the chosen precedence. Also a couple of the new blank lines carry trailing whitespace (e.g. the line after supported_efforts = resolve_model_reasoning_efforts(...)).

Solid iteration overall — the contract is now explicit on both sides and the test coverage for the recognized / unrecognized / ACP / metadata-unsupported quadrants is good.

@b3nw b3nw force-pushed the feat/3377-thinking-level-missing branch from e0195e6 to 6918580 Compare June 3, 2026 20:25
@b3nw

b3nw commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Changes Made - @nesquena-hermes

  1. Gated Metadata Recovery in get_reasoning_status:

    • Gated the _models_dev_reasoning_efforts metadata check in get_reasoning_status (api/config.py) to only run if supported_efforts is empty (not supported_efforts).
    • This ensures that authoritative provider-specific resolver results (such as Copilot or LMStudio) are never clobbered or overridden by fallback checks.
    • Cleaned up trailing whitespaces in the modified sections of api/config.py.
  2. Added Verification Tests:

    • test_get_reasoning_status_copilot_disagreement_authoritative: Asserts that when Copilot resolves reasoning capabilities authoritatively, fallback metadata recovery is bypassed and doesn't override the result.
    • test_models_dev_reasoning_efforts_precedence_loop: Pins the deterministic search order of standard providers (openai, anthropic, gemini, etc.) during fallback lookup to prevent order-sensitivity regressions.

@greptile-apps

greptile-apps Bot commented Jun 3, 2026

Copy link
Copy Markdown

Greptile Summary

This PR inverts the reasoning-effort chip visibility model: instead of hiding the chip for unrecognized models, it always shows the chip and uses a new reasoning_default_on flag to default unrecognized models to "None" while letting users opt in. A new explicitly_unsupported path preserves the hide-chip behavior for ACP providers and models with confirmed supports_reasoning=False in capability metadata.

  • Backend (api/config.py): get_reasoning_status now detects explicitly_unsupported via a direct _models_dev_reasoning_efforts call and returns reasoning_default_on alongside the existing fields; _models_dev_reasoning_efforts gains a 9-provider fallback loop for custom/proxy providers. Because resolve_model_reasoning_efforts already calls _models_dev_reasoning_efforts internally, the new direct call in get_reasoning_status introduces a redundant double lookup for every unrecognized model.
  • Frontend (static/ui.js): _applyReasoningChip reads reasoning_default_on and forces effort = 'none' when it is false and no explicit effort is stored; the error handler now passes supported_efforts: null to preserve the prior dropdown state rather than collapsing it to [].
  • Tests: New tests cover ACP exclusion, unrecognized-model fallback, and provider-loop ordering, but test_get_reasoning_status_unrecognized_model_still_offers_efforts omits a mock for _models_dev_reasoning_efforts, leaving the test sensitive to whether the live capability catalog is reachable in CI.

Confidence Score: 3/5

Safe to merge for the UI behavior change, but the double metadata lookup in get_reasoning_status could materially slow down model switches if get_model_capabilities makes network calls, and one new test has a hidden dependency on the live capability catalog.

The backend logic has a structural redundancy: _models_dev_reasoning_efforts (with its 9-provider fallback loop) is called once inside resolve_model_reasoning_efforts and again directly in get_reasoning_status. For every model switch that produces an empty supported_efforts list, this doubles the provider lookups. If get_model_capabilities is I/O-bound, this is a latency regression on a hot path. Additionally, the test that validates unrecognized-model behavior does not isolate _models_dev_reasoning_efforts, meaning it can silently flip if the test environment has the capability catalog available.

The double-lookup in api/config.py (lines 2447-2451) and the incomplete mock in tests/test_reasoning_effort_model_capabilities.py (lines 71-86) need attention before merging.

Important Files Changed

Filename Overview
api/config.py Adds explicitly_unsupported detection and reasoning_default_on flag to get_reasoning_status; also adds a 9-provider fallback loop in _models_dev_reasoning_efforts — the double invocation creates up to 20 provider lookups per call for unrecognized models.
static/ui.js Chip now always visible; adds reasoning_default_on defaulting logic and moves state mutations before the !supports early-return; error handler changed to preserve prior dropdown options on API failure — changes look correct.
tests/test_reasoning_effort_model_capabilities.py Good coverage of new paths; test_get_reasoning_status_unrecognized_model_still_offers_efforts does not mock _models_dev_reasoning_efforts, creating an implicit dependency on the live implementation returning None for an unknown model.
tests/test_reasoning_chip_btw_fixes.py Minor wording update to assertion message — no functional change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["fetchReasoningChip()"] --> B["GET /api/reasoning"]
    B -->|success| C["_applyReasoningChip(effort, st)"]
    B -->|error| D["_applyReasoningChip with reasoning_default_on=false"]
    C --> E["Update _currentReasoningEffortsSupported"]
    E --> F{"reasoning_default_on false AND no effort?"}
    F -->|yes| G["effort = none"]
    F -->|no| H["effort = normalized value"]
    G --> I{"supports_reasoning_effort?"}
    H --> I
    I -->|false| J["Hide chip"]
    I -->|true| K["Show chip with label and dropdown"]
    subgraph backend["get_reasoning_status in config.py"]
        L["resolve_model_reasoning_efforts()"] --> M{"supported_efforts empty?"}
        M -->|no| N["reasoning_default_on = True"]
        M -->|yes| O{"ACP provider?"}
        O -->|yes| P["explicitly_unsupported = True"]
        O -->|no| Q["_models_dev_reasoning_efforts second call"]
        Q -->|returns empty| P
        Q -->|returns None| R["show chip, full list, default_on=False"]
        N --> S["Return JSON"]
        P --> S
        R --> S
    end
Loading

Reviews (1): Last reviewed commit: "Gate reasoning status metadata lookup an..." | Re-trigger Greptile

Comment thread api/config.py
Comment on lines +2447 to +2451
elif resolve_model and not supported_efforts:
hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
if metadata_efforts == []:
explicitly_unsupported = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Double metadata lookup on every unrecognized model check

resolve_model_reasoning_efforts already calls _models_dev_reasoning_efforts internally (line 2378) — and that function, when the initial lookup returns None, runs a fallback loop over up to 9 standard providers. When get_reasoning_status then calls _models_dev_reasoning_efforts a second time here (to distinguish "heuristic said no" from "metadata confirmed no"), the entire fallback loop fires again. If get_model_capabilities involves I/O (e.g. HTTP calls to models.dev), every get_reasoning_status invocation on an unrecognized model pays up to 20 provider lookups instead of 10. A simple fix is to pass the metadata result out of resolve_model_reasoning_efforts or memoize _models_dev_reasoning_efforts per (model, provider) within a request.

Comment on lines +71 to +86
def test_get_reasoning_status_unrecognized_model_still_offers_efforts(monkeypatch):
"""Unrecognized models get the full effort list but reasoning_default_on=False."""
monkeypatch.setattr(
cfg,
"resolve_model_reasoning_efforts",
lambda *a, **k: [],
)
status = cfg.get_reasoning_status(
model_id="some-unknown-model",
provider_id="custom:myproxy",
)
assert len(status["supported_efforts"]) > 0, (
"Unrecognized models should still expose effort levels"
)
assert status["supports_reasoning_effort"] is True
assert status["reasoning_default_on"] is False

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 _models_dev_reasoning_efforts not mocked — test can flip if catalog is reachable

The test patches resolve_model_reasoning_efforts to return [], but get_reasoning_status then calls _models_dev_reasoning_efforts directly (the new explicitly_unsupported check). If agent.models_dev is importable in the test environment and get_model_capabilities returns a capabilities object with supports_reasoning=False for "some-unknown-model", the function returns [], explicitly_unsupported becomes True, and both supports_reasoning_effort and len(supported_efforts) assertions fail. Adding monkeypatch.setattr(cfg, "_models_dev_reasoning_efforts", lambda *a, **k: None) makes the intent explicit and removes the environmental dependency.

@nesquena-hermes

Copy link
Copy Markdown
Collaborator

Pulled the updated branch (HEAD 69185804) and read the reworked get_reasoning_status (api/config.py:2416-2472) plus the two new test cases. The gate I suggested landed exactly as intended, and the disagreement case is now regression-guarded — this resolves the copilot/lmstudio clobber I flagged last round.

The gate is correct

if (provider in {"cursor-acp", "copilot-acp"} or model_prefix in {"cursor-acp", "copilot-acp"}):
    explicitly_unsupported = True
elif resolve_model and not supported_efforts:        # <-- the new guard
    hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
    metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
    if metadata_efforts == []:
        explicitly_unsupported = True

The not supported_efforts condition means the metadata recovery only fires when the authoritative resolver already came back empty, so a non-empty copilot/lmstudio result can no longer be overwritten by the bare-name catalog fallback. That's the exact scope — the []-collapse-loses-the-distinction workaround only ever mattered when step-1 returned [].

The new test actually proves it

test_get_reasoning_status_copilot_disagreement_authoritative (tests/test_reasoning_effort_model_capabilities.py:131) is the right shape — it asserts both the result and that the second lookup never runs:

assert status["supported_efforts"] == ["medium", "high"]
assert not called_metadata_check, "Should not query models.dev metadata since resolver returned success"

The not called_metadata_check assertion is the key one: it locks in that the gate short-circuits before the redundant call, which also addresses the double-lookup latency concern greptile raised — for any model the resolver recognizes, there's now exactly one lookup, not two. test_models_dev_reasoning_efforts_precedence_loop (line 159) pins the 9-provider iteration order, so the order-sensitivity nit is covered too.

One small test-isolation note

test_get_reasoning_status_unrecognized_model_still_offers_efforts (line 71) monkeypatches resolve_model_reasoning_efforts → [] but leaves _models_dev_reasoning_efforts un-mocked. With the gate, supported_efforts is [], so the elif branch now does fire and calls the real _models_dev_reasoning_efforts("some-unknown-model", ...). The test passes only because an unknown model returns None (not []) from that path, so explicitly_unsupported stays false. That's correct today, but it's an implicit dependency on the live catalog answering None for an unknown name. A one-line monkeypatch.setattr(cfg, "_models_dev_reasoning_efforts", lambda *a, **k: None) would make it hermetic and immune to CI catalog state — worth adding since the other three cases already mock it.

Contract is now explicit on both sides and the quadrant coverage (recognized / unrecognized / ACP / metadata-unsupported / authoritative-disagreement) is solid. Reads merge-ready to me modulo that one test mock.

nesquena-hermes added a commit that referenced this pull request Jun 4, 2026
## Release v0.51.247 — Release HO (stage-q19)

Backend correctness fix.

### Fixed
| Issue | Author | Fix |
|-------|--------|-----|
| #3505 | @franksong2702 | **Reasoning effort is coerced to a level the active model/provider actually supports** before each request, instead of being sent verbatim and rejected. `openai-codex` `gpt-5` no longer gets `max` (→ `xhigh`); `o1`/`o3`/`o4` clamp to `low`/`medium`/`high`. Coercion only steps *down* (never escalates); `none`/unset preserved. The capability filter is applied across heuristic / models.dev / Copilot / LM Studio paths. |

This is the narrow, correct fix for the detection gap that #3431 tried to address by removing the chip-visibility gate (which we shelved). The chip-visibility gate is **untouched** (Codex confirmed) — `get_reasoning_status`/`_applyReasoningChip` still hide the chip for unconfirmed models.

### Review fix absorbed (Codex + self-flagged)
The first cut **dropped** a configured effort for *unrecognized* models, because capability detection returns `[]` for both "known-unsupported" and "simply-unknown" (custom providers, aggregator-rewritten ids, new releases) — that's a behavior change vs master (which sent it verbatim) and would silently disable reasoning. Fixed: an **empty** capability set now **preserves** the configured effort (provider stays the final authority; worst case = the same rejected request master already produces, i.e. no regression). Known-bad clamps return *non-empty* filtered sets, so they still degrade correctly. Nathan chose this "preserve-for-unknown" behavior. + regression test.

### Gate
- Full pytest suite: **7548 passed, 0 failed**
- ruff: CLEAN · 48 reasoning tests pass (incl. preserve-for-unknown + codex-clamp + never-escalate)
- Codex (regression): SHIP-ONLY-WITH-FIXES (unknown-model drop) → fixed → **SAFE TO SHIP**
- Verified empirically: gpt-5/codex max→xhigh, o3 max/xhigh→high, unknown high→high (preserved), none/unset preserved

Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Reasoning Effort Selector Hidden on Custom and Suffixed Models

2 participants