feat: always show reasoning effort selector, default off for unrecognized models (#3377) by b3nw · Pull Request #3431 · nesquena/hermes-webui

b3nw · 2026-06-02T19:44:16Z

feat: always show reasoning effort selector, default off for unrecognized models (#3377)

Improvement on the #3379 which fixed #3377.

Thinking Path

Hermes WebUI allows configuring reasoning-effort levels for models that support thinking.
Previously, the thinking/reasoning chip was shown or hidden based on a heuristic: recognized reasoning models got the chip, unrecognized models had it completely hidden.
This caused false negatives where custom providers, aggregator-rewritten model IDs (e.g., claude-sonnet-4-6:free), and new model releases would silently hide the selector even though the user might want to enable reasoning.
Rather than continuing to chase an ever-growing heuristic checklist, this PR inverts the model: always show the chip, but set the default based on whether the model is positively identified as reasoning-capable.
Recognized models (GPT-5+, Claude 4/3.7+, Qwen-3+, DeepSeek, Kimi, etc.) default to "Default" (active reasoning). Unrecognized models default to "None" (off), letting users opt-in on any model.
The benefit is that no model is ever locked out of the reasoning feature — the user always has the choice.

What Changed

1. Backend: Always return full effort list with `reasoning_default_on` flag

In api/config.py (get_reasoning_status):

When resolve_model_reasoning_efforts returns an empty list (unrecognized model), fall back to the full VALID_REASONING_EFFORTS list instead of [].
supports_reasoning_effort is now always True.
Added reasoning_default_on: True when the model was positively identified as reasoning-capable, False otherwise.

2. Frontend: Always show chip, default to "None" for unrecognized models

In static/ui.js:

_applyReasoningChip() no longer hides the chip when supported_efforts is empty. The chip is always displayed.
When reasoning_default_on is False and no effort has been explicitly set by the user, the chip defaults to "None" (inactive state).
_applyReasoningOptions() now shows all effort levels in the dropdown when the supported set is empty (previously hid them all).
fetchReasoningChip() error handler defaults to reasoning_default_on: false so the chip remains functional even on API errors.

3. Expanded Test Coverage

In tests/test_reasoning_effort_model_capabilities.py:

Updated test_get_reasoning_status_includes_supported_efforts to assert reasoning_default_on is True.
Added test_get_reasoning_status_unrecognized_model_still_offers_efforts: verifies that unrecognized models get the full effort list with reasoning_default_on=False.
Added test_get_reasoning_status_recognized_model_default_on: verifies that recognized models get reasoning_default_on=True.

4. Changelog Documentation

Documented the change in CHANGELOG.md under ## [Unreleased].

Why It Matters

Previously, the heuristic-based hide/show logic was a constant source of false negatives: every new model release, custom provider configuration, or aggregator-rewritten model ID risked hiding the reasoning selector. This PR eliminates that class of bugs entirely by never hiding the chip. Users can always opt into reasoning, and the default is informed but not restrictive.

Verification

Automated tests

Ran the pytest suite targeting reasoning effort model capabilities:

uv run --with pytest --with pyyaml --with cryptography pytest \
  tests/test_reasoning_effort_model_capabilities.py \
  tests/test_custom_provider_bare_model_reasoning.py -v

Result: 28 passed successfully.

Risks / Follow-ups

Current effort persistence: If a user sets an explicit effort (e.g., "high") on an unrecognized model and later switches to another unrecognized model, the persisted effort carries over. The frontend only defaults to "None" when no effort is persisted — an existing persisted value is respected. This is intentional; it matches the pre-PR behavior where the CLI's agent.reasoning_effort config is profile-scoped, not model-scoped.
Backward compatibility: supports_reasoning_effort is now always True, which changes the API contract. No known consumers rely on this boolean for chip visibility (the frontend uses reasoning_default_on), but third-party integrations should be aware.

AI Usage Disclosure

Provider: cursor / anthropic
Model: claude opus 4.6 via cursor
Tool Use: Explored codebase reasoning-effort flow end-to-end, implemented backend and frontend changes, wrote tests, hotpatched production container for validation, and drafted this PR description.

…ized models (nesquena#3377) Instead of hiding the thinking/reasoning effort chip entirely when a model is not recognized as reasoning-capable, always present the selector with the full effort scale available. For recognized models (GPT-5+, Claude 4/3.7+, Qwen-3+, DeepSeek, Kimi, etc.) the chip defaults to "Default" (reasoning active). For unrecognized or ambiguous models the chip defaults to "None" (reasoning off), letting users opt-in on any model. Backend: get_reasoning_status() now always returns the full VALID_REASONING_EFFORTS list in supported_efforts, plus a new reasoning_default_on flag indicating whether the model was positively identified as reasoning-capable. Frontend: _applyReasoningChip() always displays the chip; when reasoning_default_on is false and no effort is persisted, it defaults to "None". _applyReasoningOptions() shows all effort levels when the supported set is empty (error fallback).

nesquena-hermes · 2026-06-02T22:44:40Z

Pulled the branch and read get_reasoning_status (api/config.py:2322), the JS chip logic (static/ui.js:1854-1912), and both test files against origin/master. The inversion is clean and the contract is internally consistent — one nuance about the empty-list semantics is worth a look before merge.

The backend change reads correctly

model_recognized = bool(supported_efforts)
if not supported_efforts:
    supported_efforts = list(VALID_REASONING_EFFORTS)
return {
    ...
    "supported_efforts": supported_efforts,
    "supports_reasoning_effort": True,
    "reasoning_default_on": model_recognized,
}

supports_reasoning_effort is now hard-coded True, and reasoning_default_on carries the old recognition signal. I grepped all consumers of supports_reasoning_effort — nothing in static/ reads it anymore (only tests/test_models_dev_reasoning.py:154 and the capability tests assert is True), so pinning it to True is safe and no test asserts the False branch. Good.

Frontend matches

_applyReasoningChip (ui.js:1866) now unconditionally sets wrap.style.display='' and only flips the default value to 'none' when reasoning_default_on is false and no effort is set:

var defaultOn=(meta&&meta.reasoning_default_on!==undefined)?meta.reasoning_default_on:true;
if(!defaultOn&&(!effort||effort==='')){ effort='none'; }

The error/catch path in fetchReasoningChip (ui.js:1898) was updated to {supported_efforts:null,reasoning_default_on:false}, which keeps the chip visible with options intact (since _applyReasoningOptions now shows all when !supported.size) rather than the old hidden state. The modified assertion in test_reasoning_chip_btw_fixes.py ("wrap.style.display='none'" not in fn) correctly locks in "never hide."

One nuance: `[]` has two meanings upstream

resolve_model_reasoning_efforts returns [] in two semantically distinct cases:

Unrecognized model — genuinely unknown, the case this PR targets. "Show selector, default off, let the user opt in" is exactly right here.
Positively known NOT to support reasoning — the ACP subprocess providers return [] deliberately at api/config.py:2285:

if provider in {"cursor-acp", "copilot-acp"}:
    return []

(and the capability layer returns [] when supports_reasoning is False, config.py:~2197).

After this change, a cursor-acp / copilot-acp session shows a reasoning-effort selector even though that provider can't honor it. The practical harm is low: reasoning_default_on=False means it defaults to "none" and won't send anything unless the user explicitly opts in, and the downstream path is defensive — streaming.py:4943 runs the selected value through parse_reasoning_effort and only attaches reasoning_config when non-None and the agent accepts the param (streaming.py:4974). So a stray opt-in on an ACP model degrades to a no-op, not an error. But it is a control that looks actionable and isn't. If you want to preserve the "positively unsupported" signal, the cheap fix is to keep returning reasoning_default_on=False and a supports_reasoning_effort=False for the ACP set, and have the JS hide only when explicitly false — but that partly re-introduces the heuristic the PR is trying to retire, so it's a judgment call. Flagging it rather than blocking on it.

Tests

The two new cases in test_reasoning_effort_model_capabilities.py cover both the recognized (reasoning_default_on True) and unrecognized (> 0 efforts, reasoning_default_on False) paths by monkeypatching resolve_model_reasoning_efforts. Per cron policy I didn't execute them, but the assertions match the backend logic above. Consider adding one case pinning the ACP-provider expectation either way, so the chosen behavior for case (2) is intentional and regression-guarded.

Overall a sensible inversion — replacing an ever-growing recognition checklist with "always available, smart default" is the right direction for #3377.

…viders

b3nw · 2026-06-03T14:10:27Z

updated the implementation, summary of changes below, preformed manual testing to validate. @nesquena-hermes

Changes Made

Upstream Merged: Merged origin/master into feat/3377-thinking-level-missing to bring in get_config_for_profile_home and other recent framework fixes, resolving the WebUI runtime crash.
Subprocess/ACP Prefix Checks: Modified get_reasoning_status in api/config.py to identify explicitly unsupported models whose ID contains a slash with an ACP subprocess provider namespace (e.g. cursor-acp/* or copilot-acp/*), hiding the controls.
Fallback Upstream Model Lookup:
- Refactored _models_dev_reasoning_efforts to fallback and search upstream providers (openai, anthropic, gemini, google, deepseek, etc.) when capabilities return None under custom proxy providers (such as llm-proxy).
- Normalizes the lookup model name by stripping any namespace prefixes (e.g., copilot/gpt-4o -> gpt-4o), so it correctly resolves against standard capabilities (e.g. gpt-4o under openai is flagged as supports_reasoning=False).

Automated Tests

Verified the changes locally using:

uv run --with pytest --with pyyaml --with cryptography pytest tests/test_reasoning_effort_model_capabilities.py tests/test_reasoning_chip_btw_fixes.py

Result: All 28 tests passed successfully.

Manual Verification

Explicitly Unsupported Model (copilot/gpt-4o): Verified that the reasoning chip is completely hidden from the WebUI.
Unrecognized/Custom Model (google/gemini-flash-lite-latest): Verified that the reasoning chip is visible and defaults to "None" (opt-in).
Supported Model (copilot/claude-sonnet-4.6): Verified that the reasoning chip is visible and defaults to "Default".

nesquena-hermes · 2026-06-03T19:57:36Z

Pulled the updated branch (HEAD 26319406) and read the full get_reasoning_status (api/config.py:2386-2475), the reworked _models_dev_reasoning_efforts fallback (api/config.py:2282-2326), the JS in static/ui.js:1877-1930, and the four new test cases. This cleanly addresses the ACP nuance I raised last round — the cursor-acp/copilot-acp set and the models.dev supports_reasoning=False set now both yield supports_reasoning_effort=False and a hidden chip, and the JS reads that flag directly instead of inferring from list length. The manual matrix you posted (copilot/gpt-4o hidden, google/gemini-flash-lite-latest visible+off, copilot/claude-sonnet-4.6 visible+default) lines up with the code. One real edge case before merge.

The two `_models_dev_reasoning_efforts` calls can disagree for copilot/lmstudio

get_reasoning_status derives supported_efforts from the primary resolver, then separately re-derives the "positively unsupported" signal by calling _models_dev_reasoning_efforts directly:

supported_efforts = resolve_model_reasoning_efforts(resolve_model, ...)
...
elif resolve_model:
    hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
    metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
    if metadata_efforts == []:
        explicitly_unsupported = True
...
if explicitly_unsupported:
    supported_efforts = []          # <-- overwrites the primary result

The problem: resolve_model_reasoning_efforts (api/config.py:2329-2380) does not route copilot/lmstudio through _models_dev_reasoning_efforts. For copilot/github-copilot it returns github_model_reasoning_efforts(...) and for lmstudio it probes the live endpoint — those are the authoritative sources for those providers and it returns before ever consulting models.dev. But the new elif branch calls _models_dev_reasoning_efforts unconditionally, and your new cross-provider fallback (api/config.py:2299-2316) now resolves the bare model name against standard catalogs:

bare_model = model.rsplit("/", 1)[-1]
standard_providers = ["openai","anthropic","gemini","google","deepseek","xai","mistral","copilot","openrouter"]
for p in standard_providers:
    ...
    caps = get_model_capabilities(provider=p, model=lookup_model)
    if caps is not None:
        capabilities = caps; break

So a copilot model whose GitHub API answer is "reasoning supported" (step-1 returns a non-empty list) but whose bare name matches a supports_reasoning=False entry in some standard catalog (e.g. gpt-4o resolved under openai) gets metadata_efforts == [] from step-2, flips explicitly_unsupported=True, and the non-empty step-1 result is overwritten with [] — hiding a chip the authoritative resolver had enabled. copilot is in PROVIDER_TO_MODELS_DEV (agent/models_dev.py:160 → "github-copilot"), so this path is live, not theoretical.

Recommendation

A non-empty step-1 result is authoritative — the model demonstrably supports reasoning, so it should never be re-marked unsupported. Gate the metadata recovery on the primary resolver having come back empty:

elif resolve_model and not supported_efforts:
    hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
    metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
    if metadata_efforts == []:
        explicitly_unsupported = True

This keeps every passing case you tested (ACP and copilot/gpt-4o both still produce [] at step-1, so the recovery still fires) while preventing the cross-provider bare-name fallback from clobbering an authoritative copilot/lmstudio "supported" answer. The []-collapse-loses-the-distinction problem you're working around only exists when step-1 already returned [], so this guard is exactly the right scope.

Minor: the bare-name fallback's first-match-wins loop is order-sensitive (openai before anthropic before openrouter); for a name that exists under several catalogs the iteration order silently decides. Low risk, but a one-line case in the test file pinning a known collision would lock the chosen precedence. Also a couple of the new blank lines carry trailing whitespace (e.g. the line after supported_efforts = resolve_model_reasoning_efforts(...)).

Solid iteration overall — the contract is now explicit on both sides and the test coverage for the recognized / unrecognized / ACP / metadata-unsupported quadrants is good.

b3nw · 2026-06-03T20:27:39Z

Changes Made - @nesquena-hermes

Gated Metadata Recovery in get_reasoning_status:
- Gated the _models_dev_reasoning_efforts metadata check in get_reasoning_status (api/config.py) to only run if supported_efforts is empty (not supported_efforts).
- This ensures that authoritative provider-specific resolver results (such as Copilot or LMStudio) are never clobbered or overridden by fallback checks.
- Cleaned up trailing whitespaces in the modified sections of api/config.py.
Added Verification Tests:
- test_get_reasoning_status_copilot_disagreement_authoritative: Asserts that when Copilot resolves reasoning capabilities authoritatively, fallback metadata recovery is bypassed and doesn't override the result.
- test_models_dev_reasoning_efforts_precedence_loop: Pins the deterministic search order of standard providers (openai, anthropic, gemini, etc.) during fallback lookup to prevent order-sensitivity regressions.

greptile-apps · 2026-06-03T20:38:48Z

Greptile Summary

This PR inverts the reasoning-effort chip visibility model: instead of hiding the chip for unrecognized models, it always shows the chip and uses a new reasoning_default_on flag to default unrecognized models to "None" while letting users opt in. A new explicitly_unsupported path preserves the hide-chip behavior for ACP providers and models with confirmed supports_reasoning=False in capability metadata.

Backend (api/config.py): get_reasoning_status now detects explicitly_unsupported via a direct _models_dev_reasoning_efforts call and returns reasoning_default_on alongside the existing fields; _models_dev_reasoning_efforts gains a 9-provider fallback loop for custom/proxy providers. Because resolve_model_reasoning_efforts already calls _models_dev_reasoning_efforts internally, the new direct call in get_reasoning_status introduces a redundant double lookup for every unrecognized model.
Frontend (static/ui.js): _applyReasoningChip reads reasoning_default_on and forces effort = 'none' when it is false and no explicit effort is stored; the error handler now passes supported_efforts: null to preserve the prior dropdown state rather than collapsing it to [].
Tests: New tests cover ACP exclusion, unrecognized-model fallback, and provider-loop ordering, but test_get_reasoning_status_unrecognized_model_still_offers_efforts omits a mock for _models_dev_reasoning_efforts, leaving the test sensitive to whether the live capability catalog is reachable in CI.

Confidence Score: 3/5

Safe to merge for the UI behavior change, but the double metadata lookup in get_reasoning_status could materially slow down model switches if get_model_capabilities makes network calls, and one new test has a hidden dependency on the live capability catalog.

The backend logic has a structural redundancy: _models_dev_reasoning_efforts (with its 9-provider fallback loop) is called once inside resolve_model_reasoning_efforts and again directly in get_reasoning_status. For every model switch that produces an empty supported_efforts list, this doubles the provider lookups. If get_model_capabilities is I/O-bound, this is a latency regression on a hot path. Additionally, the test that validates unrecognized-model behavior does not isolate _models_dev_reasoning_efforts, meaning it can silently flip if the test environment has the capability catalog available.

The double-lookup in api/config.py (lines 2447-2451) and the incomplete mock in tests/test_reasoning_effort_model_capabilities.py (lines 71-86) need attention before merging.

Important Files Changed

Filename	Overview
api/config.py	Adds explicitly_unsupported detection and reasoning_default_on flag to get_reasoning_status; also adds a 9-provider fallback loop in _models_dev_reasoning_efforts — the double invocation creates up to 20 provider lookups per call for unrecognized models.
static/ui.js	Chip now always visible; adds reasoning_default_on defaulting logic and moves state mutations before the !supports early-return; error handler changed to preserve prior dropdown options on API failure — changes look correct.
tests/test_reasoning_effort_model_capabilities.py	Good coverage of new paths; test_get_reasoning_status_unrecognized_model_still_offers_efforts does not mock _models_dev_reasoning_efforts, creating an implicit dependency on the live implementation returning None for an unknown model.
tests/test_reasoning_chip_btw_fixes.py	Minor wording update to assertion message — no functional change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["fetchReasoningChip()"] --> B["GET /api/reasoning"]
    B -->|success| C["_applyReasoningChip(effort, st)"]
    B -->|error| D["_applyReasoningChip with reasoning_default_on=false"]
    C --> E["Update _currentReasoningEffortsSupported"]
    E --> F{"reasoning_default_on false AND no effort?"}
    F -->|yes| G["effort = none"]
    F -->|no| H["effort = normalized value"]
    G --> I{"supports_reasoning_effort?"}
    H --> I
    I -->|false| J["Hide chip"]
    I -->|true| K["Show chip with label and dropdown"]
    subgraph backend["get_reasoning_status in config.py"]
        L["resolve_model_reasoning_efforts()"] --> M{"supported_efforts empty?"}
        M -->|no| N["reasoning_default_on = True"]
        M -->|yes| O{"ACP provider?"}
        O -->|yes| P["explicitly_unsupported = True"]
        O -->|no| Q["_models_dev_reasoning_efforts second call"]
        Q -->|returns empty| P
        Q -->|returns None| R["show chip, full list, default_on=False"]
        N --> S["Return JSON"]
        P --> S
        R --> S
    end

_{Reviews (1): Last reviewed commit: "Gate reasoning status metadata lookup an..." | Re-trigger Greptile}

greptile-apps · 2026-06-03T20:38:52Z

+    elif resolve_model and not supported_efforts:
+        hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
+        metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
+        if metadata_efforts == []:
+            explicitly_unsupported = True


Double metadata lookup on every unrecognized model check

resolve_model_reasoning_efforts already calls _models_dev_reasoning_efforts internally (line 2378) — and that function, when the initial lookup returns None, runs a fallback loop over up to 9 standard providers. When get_reasoning_status then calls _models_dev_reasoning_efforts a second time here (to distinguish "heuristic said no" from "metadata confirmed no"), the entire fallback loop fires again. If get_model_capabilities involves I/O (e.g. HTTP calls to models.dev), every get_reasoning_status invocation on an unrecognized model pays up to 20 provider lookups instead of 10. A simple fix is to pass the metadata result out of resolve_model_reasoning_efforts or memoize _models_dev_reasoning_efforts per (model, provider) within a request.

greptile-apps · 2026-06-03T20:38:53Z

+def test_get_reasoning_status_unrecognized_model_still_offers_efforts(monkeypatch):
+    """Unrecognized models get the full effort list but reasoning_default_on=False."""
+    monkeypatch.setattr(
+        cfg,
+        "resolve_model_reasoning_efforts",
+        lambda *a, **k: [],
+    )
+    status = cfg.get_reasoning_status(
+        model_id="some-unknown-model",
+        provider_id="custom:myproxy",
+    )
+    assert len(status["supported_efforts"]) > 0, (
+        "Unrecognized models should still expose effort levels"
+    )
+    assert status["supports_reasoning_effort"] is True
+    assert status["reasoning_default_on"] is False


_models_dev_reasoning_efforts not mocked — test can flip if catalog is reachable

The test patches resolve_model_reasoning_efforts to return [], but get_reasoning_status then calls _models_dev_reasoning_efforts directly (the new explicitly_unsupported check). If agent.models_dev is importable in the test environment and get_model_capabilities returns a capabilities object with supports_reasoning=False for "some-unknown-model", the function returns [], explicitly_unsupported becomes True, and both supports_reasoning_effort and len(supported_efforts) assertions fail. Adding monkeypatch.setattr(cfg, "_models_dev_reasoning_efforts", lambda *a, **k: None) makes the intent explicit and removes the environmental dependency.

nesquena-hermes · 2026-06-04T00:14:14Z

Pulled the updated branch (HEAD 69185804) and read the reworked get_reasoning_status (api/config.py:2416-2472) plus the two new test cases. The gate I suggested landed exactly as intended, and the disagreement case is now regression-guarded — this resolves the copilot/lmstudio clobber I flagged last round.

The gate is correct

if (provider in {"cursor-acp", "copilot-acp"} or model_prefix in {"cursor-acp", "copilot-acp"}):
    explicitly_unsupported = True
elif resolve_model and not supported_efforts:        # <-- the new guard
    hinted_model = _strip_provider_hint_for_reasoning(resolve_model)
    metadata_efforts = _models_dev_reasoning_efforts(hinted_model, provider)
    if metadata_efforts == []:
        explicitly_unsupported = True

The not supported_efforts condition means the metadata recovery only fires when the authoritative resolver already came back empty, so a non-empty copilot/lmstudio result can no longer be overwritten by the bare-name catalog fallback. That's the exact scope — the []-collapse-loses-the-distinction workaround only ever mattered when step-1 returned [].

The new test actually proves it

test_get_reasoning_status_copilot_disagreement_authoritative (tests/test_reasoning_effort_model_capabilities.py:131) is the right shape — it asserts both the result and that the second lookup never runs:

assert status["supported_efforts"] == ["medium", "high"]
assert not called_metadata_check, "Should not query models.dev metadata since resolver returned success"

The not called_metadata_check assertion is the key one: it locks in that the gate short-circuits before the redundant call, which also addresses the double-lookup latency concern greptile raised — for any model the resolver recognizes, there's now exactly one lookup, not two. test_models_dev_reasoning_efforts_precedence_loop (line 159) pins the 9-provider iteration order, so the order-sensitivity nit is covered too.

One small test-isolation note

test_get_reasoning_status_unrecognized_model_still_offers_efforts (line 71) monkeypatches resolve_model_reasoning_efforts → [] but leaves _models_dev_reasoning_efforts un-mocked. With the gate, supported_efforts is [], so the elif branch now does fire and calls the real _models_dev_reasoning_efforts("some-unknown-model", ...). The test passes only because an unknown model returns None (not []) from that path, so explicitly_unsupported stays false. That's correct today, but it's an implicit dependency on the live catalog answering None for an unknown name. A one-line monkeypatch.setattr(cfg, "_models_dev_reasoning_efforts", lambda *a, **k: None) would make it hermetic and immune to CI catalog state — worth adding since the other three cases already mock it.

Contract is now explicit on both sides and the quadrant coverage (recognized / unrecognized / ACP / metadata-unsupported / authoritative-disagreement) is solid. Reads merge-ready to me modulo that one test mock.

@franksong2702

## Release v0.51.247 — Release HO (stage-q19) Backend correctness fix. ### Fixed | Issue | Author | Fix | |-------|--------|-----| | #3505 | @franksong2702 | **Reasoning effort is coerced to a level the active model/provider actually supports** before each request, instead of being sent verbatim and rejected. `openai-codex` `gpt-5` no longer gets `max` (→ `xhigh`); `o1`/`o3`/`o4` clamp to `low`/`medium`/`high`. Coercion only steps *down* (never escalates); `none`/unset preserved. The capability filter is applied across heuristic / models.dev / Copilot / LM Studio paths. | This is the narrow, correct fix for the detection gap that #3431 tried to address by removing the chip-visibility gate (which we shelved). The chip-visibility gate is **untouched** (Codex confirmed) — `get_reasoning_status`/`_applyReasoningChip` still hide the chip for unconfirmed models. ### Review fix absorbed (Codex + self-flagged) The first cut **dropped** a configured effort for *unrecognized* models, because capability detection returns `[]` for both "known-unsupported" and "simply-unknown" (custom providers, aggregator-rewritten ids, new releases) — that's a behavior change vs master (which sent it verbatim) and would silently disable reasoning. Fixed: an **empty** capability set now **preserves** the configured effort (provider stays the final authority; worst case = the same rejected request master already produces, i.e. no regression). Known-bad clamps return *non-empty* filtered sets, so they still degrade correctly. Nathan chose this "preserve-for-unknown" behavior. + regression test. ### Gate - Full pytest suite: **7548 passed, 0 failed** - ruff: CLEAN · 48 reasoning tests pass (incl. preserve-for-unknown + codex-clamp + never-escalate) - Codex (regression): SHIP-ONLY-WITH-FIXES (unknown-model drop) → fixed → **SAFE TO SHIP** - Verified empirically: gpt-5/codex max→xhigh, o3 max/xhigh→high, unknown high→high (preserved), none/unset preserved Co-authored-by: franksong2702 <franksong2702@users.noreply.github.com>

b3nw closed this Jun 2, 2026

b3nw reopened this Jun 2, 2026

b3nw force-pushed the feat/3377-thinking-level-missing branch from 0789b94 to f17400c Compare June 2, 2026 19:58

b3nw force-pushed the feat/3377-thinking-level-missing branch from f17400c to 14b7c27 Compare June 2, 2026 20:03

b3nw added 2 commits June 3, 2026 13:38

Merge origin/master

743eff8

feat: refine reasoning status controls for custom and unsupported pro…

2631940

…viders

b3nw force-pushed the feat/3377-thinking-level-missing branch from b2f6201 to 2631940 Compare June 3, 2026 14:02

Gate reasoning status metadata lookup and add capability tests

6918580

b3nw force-pushed the feat/3377-thinking-level-missing branch from e0195e6 to 6918580 Compare June 3, 2026 20:25

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

nesquena-hermes mentioned this pull request Jun 4, 2026

Release v0.51.247 — Release HO (stage-q19): coerce reasoning effort to model-supported levels #3521

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: always show reasoning effort selector, default off for unrecognized models (#3377)#3431

feat: always show reasoning effort selector, default off for unrecognized models (#3377)#3431
b3nw wants to merge 4 commits into
nesquena:masterfrom
b3nw:feat/3377-thinking-level-missing

b3nw commented Jun 2, 2026 •

edited

Loading

Uh oh!

nesquena-hermes commented Jun 2, 2026

Uh oh!

b3nw commented Jun 3, 2026

Uh oh!

nesquena-hermes commented Jun 3, 2026

Uh oh!

b3nw commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

nesquena-hermes commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

b3nw commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat: always show reasoning effort selector, default off for unrecognized models (#3377)

Thinking Path

What Changed

1. Backend: Always return full effort list with reasoning_default_on flag

2. Frontend: Always show chip, default to "None" for unrecognized models

3. Expanded Test Coverage

4. Changelog Documentation

Why It Matters

Verification

Automated tests

Risks / Follow-ups

AI Usage Disclosure

Uh oh!

nesquena-hermes commented Jun 2, 2026

The backend change reads correctly

Frontend matches

One nuance: [] has two meanings upstream

Tests

Uh oh!

b3nw commented Jun 3, 2026

Changes Made

Automated Tests

Manual Verification

Uh oh!

nesquena-hermes commented Jun 3, 2026

The two _models_dev_reasoning_efforts calls can disagree for copilot/lmstudio

Recommendation

Uh oh!

b3nw commented Jun 3, 2026

Changes Made - @nesquena-hermes

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

nesquena-hermes commented Jun 4, 2026

The gate is correct

The new test actually proves it

One small test-isolation note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

b3nw commented Jun 2, 2026 •

edited

Loading

1. Backend: Always return full effort list with `reasoning_default_on` flag

One nuance: `[]` has two meanings upstream

The two `_models_dev_reasoning_efforts` calls can disagree for copilot/lmstudio