fix(sse): prevent Claude OAuth multi-account correlation via metadata.user_id #2053
Conversation
…elation When multiple Claude OAuth accounts are routed through one OmniRoute, Anthropic could correlate them via the Claude Code metadata.user_id blob: - device_id from ~/.claude.json is shared across every account on one machine - account_uuid may not match the OAuth token actually being routed (active mismatch — stronger tell than just sharing) - session_id is shared across accounts when one CC process fans out via combo This forces per-OAuth-account identity synthesis whenever a Claude OAuth token is in use (isClaudeCodeClient || hasClaudeOAuthToken), so a non-CC client mimicking metadata.user_id against an OAuth token can't slip its identity through either. Also align /api/oauth/usage probe with real CLI shape (claude-code/<version> UA, Accept/Content-Type/Accept-Encoding match, 10s abort), and lock the Claude tile in Settings -> CLI Fingerprint as forced-on with a "Required" badge — the toggle was misleading because shouldFingerprint already forces fingerprinting on for OAuth regardless of the saved setting.
There was a problem hiding this comment.
Code Review
This pull request implements identity cloaking for Claude OAuth requests to prevent account correlation and updates the Claude usage fetching logic with a 10-second timeout and specific headers. Additionally, it modifies the dashboard settings UI to force-enable CLI fingerprinting for the Claude provider, adding corresponding localized strings across multiple languages. I have no feedback to provide.
… preserve cache on fetch failure Three connected improvements to the Claude OAuth provider-limits flow: - Bootstrap refresh on every provider-limits sync (manual refresh, scheduled 70min cycle, and lazy first-request after start). The bootstrap fetcher is now reusable from claudeIdentity.ts and runs in parallel with the usage probe; bootstrap fields (organization_type, organization_rate_limit_tier, account_uuid, organization_uuid, organization_name) are diffed against psd and only persisted when changed. - Real plan tier on the dashboard. resolvePlanValue now consults psd.organizationRateLimitTier (which carries the Max 5x/20x multiplier) and psd.organizationType. normalizePlanTier matches Anthropic-shaped strings like default_claude_max_20x → "Max 20x", claude_pro → "Pro", claude_team → "Team", etc., before the generic PRO/TEAM checks. - Stale-cache preservation. fetchAndPersistProviderLimits and syncAllProviderLimits no longer overwrite a previously good cache with an error-only entry (typical for 429 / network errors / permissions). When the live fetch fails: serve the prior cache and surface staleness via _stale / _staleSince fields. The dashboard renders the staleSince timestamp in amber with a "Last refresh failed — showing cached data" tooltip, and never displays the misleading "0% / error" row. Also: map Anthropic's internal "omelette" codename to "Designer" for the weekly model breakdown display, and i18n key staleQuotaTooltip across all 41 locale files.
Follow-up commit 316c490Three small fixes for specifically Claude OAuth: Plan tier shows correctly. Bootstrap refreshes on sync. Extracted a reusable Stale cache preservation. Error only entries (429 etc.) no longer overwrite the cache. Dashboard shows the prior timestamp in amber with a "Last refresh failed, showing cached data" tooltip. i18n: |
|
Obrigado pela sua contribuição! Seu trabalho foi incrível. Integrado na release/v3.8.0 🚀 |
1 similar comment
|
Obrigado pela sua contribuição! Seu trabalho foi incrível. Integrado na release/v3.8.0 🚀 |
|
Obrigado pela sua contribuição! Seu trabalho foi incrível. Integrado na release/v3.8.0 🚀 |
…#9) * feat: add kie media provider support * Update open-sse/handlers/videoGeneration.ts Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update open-sse/handlers/imageGeneration.ts Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update open-sse/handlers/imageGeneration.ts Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat(providers): add KIE text models and expand video models catalog * feat(ui): update media dashboard with new KIE video models * refactor(providers): robust KIE handlers with dynamic polling and improved types * refactor(providers): address code review feedback for KIE provider * chore(providers): prune redundant provider icon assets (#1992) Integrated into release/v3.8.0 * feat(gemini-cli): add custom projectId support (UI, DB, executor) (#1991) Integrated into release/v3.8.0 * docs: update CHANGELOG and bump version to 3.8.0 * fix(mitm): add Linux cert install and skip sudo password when root Add Linux certificate management via update-ca-certificates for Docker support. Skip sudo password validation when running as root, matching the existing cli-tools route behavior. * fix(cli): resolve .env loading failure for global npm installations * fix: remove Anthropic-Beta header from non-Anthropic providers to fix identity contamination (#1989) * chore(release): bump to v3.8.0 — changelog, docs, version sync * fix(dashboard): resolve Unknown plan display in Provider Limits - Replace || "Unknown" fallbacks with || null in usage.ts (GLM + Claude legacy) - Add plan extraction to Claude OAuth mapTokens (account_tier > plan > subscription_type > billing.plan) - Add unit tests for plan extraction and Provider Limits badge resolution * fix(dashboard): revert GLM and Claude legacy plan fallbacks to Unknown The original fix replaced || "Unknown" with || null for GLM and Claude legacy (non-OAuth) paths. Per user clarification, "Unknown" is a valid display fallback when no plan data exists — null-based fallbacks caused the Provider Limits dashboard to show no badge rather than a clear "Unknown" indicator. Revert only the usage.ts changes. Claude OAuth mapTokens plan extraction (claude.ts) and the associated tests remain unchanged. * feat: add kie media provider support * fix: address kie provider review feedback * fix: preserve kie market model ids * fix: address kie provider pr review * feat(combos): add reset-aware routing strategy * feat: add support for Z.AI provider and enhance quota handling * fix: generalize reset-aware quota routing * fix: address reset-aware routing review feedback * fix: address reset-aware follow-up feedback * feat: enhance GLM quota handling and add new quota labels for Z.AI * fix(mitm): prevent stub from loading at runtime via bypass module Turbopack resolveAlias (@/mitm/manager → manager.stub.ts) was designed for build-time safety but Next.js applies aliases to ALL imports — including dynamic ones. This caused await import("@/mitm/manager") at runtime to load the stub, which silently returned fake {running: true} without spawning the MITM proxy. The UI showed "MITM proxy started" but nothing was actually running. Fix introduces a two-path design: - @/mitm/manager → stub (build-time, safe for Turbopack) - @/mitm/manager.runtime → real manager (runtime, bypasses alias) Route handlers now dynamic-import from manager.runtime, which re-exports from ./manager and does NOT match the alias pattern. Additional fixes: - Make stub throw explicit errors at runtime so misconfiguration is immediately visible instead of silently faking success - Add server.cjs to outputFileTracingIncludes (NFT trace) and Dockerfile COPY so the MITM server binary exists in standalone/Docker output * fix(catalog): auto-calculate combo context_length from target model limits Fixes the root cause where OpenCode falls back to a ~4000 token limit for combos because no context_length is exposed in /v1/models. Previously combos only used context_length when set manually on the combo record. Now, when unset, the catalog computes the effective limit as the MINIMUM of its targets' individual token limits via getTokenLimit()/parseModel(). Manual values still override. Files changed: - src/app/api/v1/models/catalog.ts (+30 lines, auto-calc) - tests/unit/models-catalog-route.test.ts (+2 tests) Tests pass: 25/25 * chore(deps): resolve npm audit moderate vulnerability (hono) * chore: Remove Deprecated Models (#2033) Integrated into release/v3.8.0 * docs(env): add GITLAB_DUO_OAUTH_CLIENT_ID to .env.example (#2031) Integrated into release/v3.8.0 * fix(catalog): auto-calculate combo context_length from target model limits (#2030) Integrated into release/v3.8.0 * Update claude md and update glm-cn max context to 200k (#2027) Integrated into release/v3.8.0 * fix(chatgpt-web): plumb proxy through to native tls-client (#2022) (#2023) Integrated into release/v3.8.0 * fix(codex): expose native model ids in catalog (#2012) Integrated into release/v3.8.0 * feat(sse): refresh Claude OAuth wire image to claude-cli/2.1.131 (#2011) Integrated into release/v3.8.0 * fix: add fuzzy auto-combo routing for 'auto/*' model prefix (#2010) Integrated into release/v3.8.0 * Fix API key identity in usage analytics (#2008) Integrated into release/v3.8.0 * fix(docker): include OpenAPI spec in runtime image (#2007) Integrated into release/v3.8.0 * fix: allow Unicode letters in API key name validation (#1996) Integrated into release/v3.8.0 * fix: resolve model alias persistence double stringification preventing UI updates (#2018) * fix: dynamically filter bare model auto-resolution by active provider connections to prevent dead-routing (#2029) * fix: add Google Gemini embeddings compatibility via OpenAI-compatible endpoint mapping (#2006) * docs: update CHANGELOG.md for v3.8.0 (#2006, #2018, #2029) * feat(antigravity): overhaul identity, fingerprinting & envelope format - Add centralized antigravityIdentity service (sessionId, machineId, requestId) - Switch User-Agent to Electron/Chrome desktop format - Reorder upstream URLs: sandbox first, production last - Add runtime headers: x-client-name, x-client-version, x-machine-id, x-vscode-sessionid, x-goog-user-project - Add 403 retry without x-goog-user-project header - Add generation defaults (topK=40, topP=1.0, maxOutputTokens guard) - Strip cache_control from Claude requests recursively - Enterprise/consumer routing via userAgent field (jetski vs antigravity) - Update envelope field order and add enabledCreditTypes - MITM proxy: support multiple target hosts - Version: semver comparison with pickNewestVersion(), bump fallback to 4.1.33 - Update all affected tests * ci: update build-fork workflow to build from main branch * debug: add AG_REQUEST_HEADERS and AG_REQUEST_ENVELOPE debug logs Dumps outgoing headers (with masked Authorization) and envelope structure (fieldOrder, project, requestId, userAgent, requestType, enabledCreditTypes, sessionId, generationConfig) at debug level for production verification of identity overhaul. * fix(antigravity): don't inject default maxOutputTokens when client omits max_tokens Real Antigravity client does not send maxOutputTokens when the user hasn't specified it — the Cloud Code server decides the output limit. OmniRoute was incorrectly injecting a capped default from model specs, which caused thinking models to return empty content with low limits. * fix(antigravity): align identity protocol and behavior with official AM * fix(antigravity): add duplex half for streaming bodies * refactor: address PR review feedback * feat: implement global Codex fast service tier functionality and related settings * feat(usage): account for codex fast tier analytics * feat: add service tier breakdown component and handle missing docs directory * feat: enhance chat handling with cached settings and deduplicate quota fetches in reset-aware strategy * feat: add service tier column to usage_history and update migration checks * deps: bump hono from 4.12.14 to 4.12.18 (#2065) Integrated into release/v3.8.0 * fix(sse): use Gemini schema for Antigravity Claude (#2063) Integrated into release/v3.8.0 * feat(chat): dynamic tool limit detection with proactive truncation (#2061) Integrated into release/v3.8.0 * Fix bare GPT-5.5 routing for Codex-only installations (#2054) Integrated into release/v3.8.0 * fix(db): preserve legacy SQLite database path on Windows to prevent data loss (#1973) * docs: update changelog for issue 1973 resolution * feat: add fallbackDelayMs to combo configuration and related settings * feat: add STREAM_READINESS_TIMEOUT_MS and integrate into chat handling * fix(core): restore Claude Code adaptive thinking defaults and resolve audio transcription CORS regression - Restored default adaptive thinking injection for non-Haiku Claude Code models when explicit client headers are omitted. - Updated Claude OAuth unit tests to accurately account for dynamic cliUserID property injection in mapped credentials. - Fixed module resolution regression in audio transcription handler caused by missing getCorsOrigin utility. * fix: clean up proxy page redundancy and fix 1proxy sync empty body error (#2052) Integrated into release/v3.8.0 * fix(auth): allow bootstrap without password (#2048) Integrated into release/v3.8.0 * feat(combo): add context_length input field to combo edit form (#2047) Integrated into release/v3.8.0 * [cli omniroute] Add modular CLI setup and provider commands (#2046) Integrated into release/v3.8.0 * fix: Follow OpenAI specification, handle throttling in batch and fix UI (#2045) Integrated into release/v3.8.0 * fix(db): add missing migration renumbering entries for compression migrations (#2041) Integrated into release/v3.8.0 * fix(db): reduce hot-path persistence overhead (#2039) Integrated into release/v3.8.0 * fix(compression): support Responses input and expand Spanish rules (#2028) Integrated into release/v3.8.0 * feat(multi): manifest-aware tier routing — W1-W4 complete (#2014) Integrated into release/v3.8.0 * fix(db): resolve migration conflict by renumbering 051 to 052 and 053 * fix: clean up proxy page redundancy and fix 1proxy sync empty body error (#2052) * fix(sse): prevent Claude Code identity cloak overrides and fix fallback resilience (#2053) * fix: update dependencies and merge PR 2035 * Merge PR #2019 and resolve conflicts * feat: enhance error handling for semaphore capacity and implement fallback logic in chat processing * fix(runtime): harden timer handling and model pricing fallback Align runtime behavior with test and stream expectations across the app. Use `globalThis` timer APIs for SSE heartbeats, set the Playwright server `NODE_ENV` explicitly by mode, and fall back to Codex pricing lookups after stripping effort suffixes when a direct model match is missing. Refresh affected unit and e2e coverage to use deterministic timers and updated settings navigation so timeout- and stream-related assertions are stable on release builds. * feat: update API bridge proxy timeout to 600000ms and enhance related tests * fix(providers): strip OpenAI-specific fields in Kiro translator to prevent 400 errors (#2037) * fix(ui): resolve text contrast issues for zero-config warning banner in light mode (#2050) * fix(core): inject global system prompt correctly into downstream chat completions pipeline (#2080) * fix(routing): add missing v1beta rewrites to next.config to resolve 404 on Gemini models endpoint (#2102) * feat(api): allow configuration via API calls - open management routes to Bearer keys with manage scope - (#2103) Integrated into release/v3.8.0 * fix(antigravity): sanitize Claude Cloud Code payloads (#2090) Integrated into release/v3.8.0 * fix(kiro): normalize tool-use payloads (#2104) Integrated into release/v3.8.0 * feat(providers): batch delete provider connections via checkbox multi-select (#2094) Integrated into release/v3.8.0 * feat(providers): add 9 new free AI providers (LLM7, Lepton, Kluster, UncloseAI, BazaarLink, Completions, Enally, FreeTheAi) (#2096) Integrated into release/v3.8.0 * fix(api): usage and keys (#2092) Integrated into release/v3.8.0 * feat(mcp): add DeepSeek quota and limit feature - Add deepseekQuotaFetcher.ts for DeepSeek balance API integration - Integrate with quotaPreflight and quotaMonitor systems - Support both USD and CNY currency display - Add DeepSeek to USAGE_SUPPORTED_PROVIDERS whitelist - Add DeepSeek to PROVIDER_LIMITS_APIKEY_PROVIDERS - Credits-style UI display with currency symbols and color coding - Add comprehensive unit tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(usage): add extensible CURRENCY_SYMBOLS mapping for deepseek currencies * fix(kiro): merge adjacent user history turns after role normalization (#2105) Merged automatically * Refresh providers, model catalogs, and docs for v3.8.0 (#2088) Merged automatically * feat(cursor): full OpenAI parity (tool calls, streaming, sessions) (#2082) Merged automatically * deps: bump hono from 4.12.14 to 4.12.18 (#2079) Merged automatically * deps: bump fast-uri from 3.1.0 to 3.1.2 (#2078) Merged automatically * fix(glm): add dedicated coding transport (#2087) Integrated into release/v3.8.0 * Feat/qdrant embedding model discovery (#2086) Integrated into release/v3.8.0 * feat(auth): per-session sticky routing for codex (#1887) Integrated into release/v3.8.0 * fix(sse): prevent Claude OAuth multi-account correlation via metadata.user_id (#2053) Integrated into release/v3.8.0 * feat(cli): Comprehensive CLI Enhancement Suite - 20+ new commands (#2074) Integrated into release/v3.8.0 * README SEO/AEO/GEO + Competitive Marketing (#2091) Integrated into release/v3.8.0 * chore: update CHANGELOG.md for PR 2091 * chore(security): apply CodeQL fixes to release branch * chore(release): finalize v3.8.0 stabilization and fix typescript regressions - Fix stream readiness loop and upstream error code propagation in chatCore.ts - Resolve Headers iterator TypeScript errors - Fix type mismatches and missing props in BuilderIntelligentStep, Card, and providers page - Fix providerLimits typecasts and resolve implicit any errors - Ensure green build and strict type compliance for production * feat(circuit-breaker): classify 429 errors and apply per-kind cooldowns (#2116) Integrated into release/v3.8.0 * fix(sse): classify hour quota errors as QUOTA_EXHAUSTED * Fix CC-compatible streaming bridge * fix(i18n): complete Simplified Chinese translations * docs(i18n): sync CHANGELOG.md to 39 languages * feat(github): add targetFormat openai-responses to all GitHub models * chore: enhance Inworld TTS support * security: fix code scanning alerts — sanitize error messages and suppress false-positive hash warnings - Sanitize error messages in errorResponse() and cursor buildErrorResponse() to strip stack traces before sending to client (fixes js/stack-trace-exposure) - Add explicit CodeQL suppression comments for intentional SHA-256 usage in API key hashing (fast O(1) lookup, not password storage) and deterministic UUID generation (fixes js/insufficient-password-hash false positives) * security: fix code scanning alerts — sanitize error messages and suppress false-positive hash warnings - Sanitize error messages in errorResponse() and cursor buildErrorResponse() to strip stack traces before sending to client (fixes js/stack-trace-exposure) - Add explicit CodeQL suppression comments for intentional SHA-256 usage in API key hashing (fast O(1) lookup, not password storage) and deterministic UUID generation (fixes js/insufficient-password-hash false positives) Cherry-picked from release/v3.8.0 * feat(github): add targetFormat openai-responses to all GitHub models (#2122) Integrated into release/v3.8.0 — thank you @abhinavjnu for this contribution! 🎉 * fix(sse): classify hour quota errors as QUOTA_EXHAUSTED (#2119) Integrated into release/v3.8.0 — thank you @clousky2020 for this contribution! 🎉 * Fix CC-compatible streaming bridge (#2118) Integrated into release/v3.8.0 — thank you @rdself for this contribution! 🎉 * fix(i18n): complete Simplified Chinese translations (#2115) Integrated into release/v3.8.0 — thank you @boa-z for this contribution! 🎉 * feat(mcp): add DeepSeek quota and limit feature (#2089) Integrated into release/v3.8.0 — thank you @HoaPham98 for this contribution! 🎉 * chore: enhance Inworld TTS support (#2123) Integrated into release/v3.8.0 — thank you @backryun! 🎉 * chore: fix docs-sync pre-commit hook, add v3.8.0 contributor credits, and sync CHANGELOG i18n - Fix check-docs-sync.mjs: CHANGELOG.md i18n mirrors use translation-aware validation (version sections + size check) instead of exact byte comparison, since translated CHANGELOGs have translated section headings - Add v3.8.0 Community Contributors section with 38 external contributors credited - Sync CHANGELOG.md translations across 40 locales * fix(export): exclude telemetry/usage-history tables from JSON config backups by default (#2125) The export-json API now excludes usage_history, domain_cost_history, and domain_budgets tables by default. These tables grow indefinitely and inflate config backups to many MBs. Users can opt-in to including them via ?includeHistory=true query param. Closes #2125 * docs: synchronize CHANGELOG.md with all 129 commits since v3.7.9 Audit all commits in release/v3.8.0 vs CHANGELOG and add ~30 missing entries: - New providers: KIE media, Z.AI, 9 free providers - CLI suite: 20+ commands, provider management - Cursor full OpenAI parity - Circuit breaker 429 classification - DeepSeek quota/limit monitoring - Reset-aware routing strategy - Multiple Kiro, GLM, Antigravity, SSE fixes - Dependency bumps, doc refreshes, deprecated model cleanup * fix(analytics): dynamic currency precision + codex pricing resolution (#1978) - Add formatCurrencyCost() for adaptive decimal precision on cost cards - Add codex-auto-review pricing alias to GPT-5.5 - Add getPricingModelCandidates() with Codex effort suffix stripping - Fix fallback stats to exclude combo-routed requests and use case-insensitive comparison - Add 3 new unit tests for Codex pricing resolution Co-authored-by: 05dunski <jan.gaschler@gmail.com> * fix(authz): classify /dashboard/onboarding as PUBLIC to unblock setup wizard (#2127) - Add exact-match guard for /dashboard/onboarding before the broad /dashboard prefix - Add setup_wizard and client_api_mcp to ClassificationReason union type - Update test to verify PUBLIC classification Co-authored-by: HomerOff <homeroff76@gmail.com> * feat(cursor): surface Cursor Pro plan usage on provider-limits dashboard (#2128) - Replace legacy getCursorUsage with dashboard API (cursor.com/api/dashboard/get-current-period-usage) - Use WorkOS session cookie auth instead of Bearer token - Surface 3 quota windows: Total, Auto + Composer, API - Register cursor in USAGE_SUPPORTED_PROVIDERS - Add fetchUserInfo() to resolve real email on import - Remove ~170 lines of dead code (old fetcher + helpers) - Add 6 comprehensive tests with fetch mocking Co-authored-by: payne0420 <baboialex95@gmail.com> * feat(kiro): headless auth via kiro-cli SQLite, image support, model fixes (#2129) - Add kiro-cli SQLite auto-import for enterprise SSO + headless environments - Add image support (OpenAI + Anthropic formats → Kiro native) - Move long tool descriptions to system prompt to prevent 400 errors - Sync model list with live API: add auto-kiro, claude-sonnet-4, deepseek-3.2, etc - Add dash-to-dot model name normalization for Claude Code compatibility - Fallback gracefully to ~/.aws/sso/cache for social auth Co-authored-by: christlau <christlau@users.noreply.github.com> * fix(translator): preserve body.system in openai→claude when Claude Code sends native format (#2130) Root cause: v3.7.9 fix for #1966 removed the unconditional CLAUDE_SYSTEM_PROMPT injection, which also removed the else branch that always set result.system. When Claude Code sends system prompt as body.system (native Anthropic array) through /v1/chat/completions, the translator only looked at role='system' messages in body.messages — body.system was silently dropped. Fix: The translator now checks for body.system and preserves it: - If both body.system and role='system' messages exist, they are merged - If only body.system exists, it passes through as-is - If only role='system' messages exist, behavior unchanged - If neither exists, result.system remains undefined (no forced injection) Also removes the dead CLAUDE_SYSTEM_PROMPT import. Includes 4 regression tests covering all combinations. * feat(auto): add auto prefix parser * feat(mitm): implement dynamic linux cert resolution and NSS db injection in TS - Replaced hardcoded LINUX_CA_DIR with dynamic filesystem probing to support Debian, Arch, Fedora, and openSUSE system trust stores. - Added updateNssDatabases helper to seamlessly inject root certificates directly into browser NSS databases (e.g., ~/.pki/nssdb, ~/.mozilla/firefox). - Supported standard and snap-based Chrome/Chromium and Firefox installations. - Made browser cert injection resilient, executing under the current user to prevent file ownership issues, and safely falling back if certutil is absent. * chore(docs/lint): sync i18n changelog mirrors and bump any budget to resolve pre-commit failure * feat(auto): complete zero-config auto-routing feature - Add auto-prefix parser (autoPrefix.ts) for auto/Cvariant detection - Add virtual auto-combo factory (virtualFactory.ts) building combos from active providers - Integrate auto/ prefix into chat routing (chat.ts) - supports bare 'auto' and 'auto/variant' - Add system provider 'auto' in providers.ts (systemOnly) - Add AutoRoutingBanner component with localStorage dismissal - Add auto-routing settings in RoutingTab (toggle + variant selector) - Add auto-routing analytics tab (AutoRoutingAnalyticsTab) + API endpoint - Add Case 0 zero-config documentation to README.md - Add autoRoutingEnabled/enforcement and autoRoutingDefaultVariant settings - Add analytics endpoint auth via requireManagementAuth - Add empty-pool graceful handling in virtualFactory - Add dynamic import error handling with try/catch - Tests: 126/126 passing * fix(auto): address PR #2131 review issues - Fix OAuth expiry handling for ISO strings in virtualFactory.ts - Move AutoRoutingBanner test from src/ to tests/unit/shared/components/ - Remove mock metrics from analytics endpoint, return only real data - Fix error handling for bare 'auto' prefix in chat.ts (check isAutoRouting) - Update vitest.config.ts to include tests/unit/**/*.test.tsx pattern * feat(resilience): useUpstream429BreakerHints toggle (#2100 follow-up to #2116) (#2133) Integrated into release/v3.8.0 — adds useUpstream429BreakerHints toggle with per-provider defaults for circuit breaker cooldown trust. * chore(release): align migration compatibility and packaged CLI runtime Skip the superseded 041 session_account_affinity migration when the canonical 050 file is present, and remap legacy migration markers so upgraded databases do not replay the duplicate slot. Also include the CLI entrypoints in packaged artifacts and extend management-auth coverage across admin memory, pricing, routing, provider validation, and usage endpoints to keep release bundles runnable and sensitive operations protected. * fix(analytics): precise SQL matching for auto/ prefix models Replaced LIKE 'auto%' with (model = 'auto' OR model LIKE 'auto/%') to prevent false matches from unrelated model names (e.g., 'autopilot-v2'). * chore: revert unrelated i18n CHANGELOG and any-budget changes Removed bundled i18n CHANGELOG updates and check-t11-any-budget.mjs budget regressions that are unrelated to the dynamic cert paths feature. * docs(changelog): add PRs #2131, #2133, #2134 entries and contributor credits for v3.8.0 * fix(catalog): ensure individual models get context_length via getTokenLimit fallback When the /v1/models catalog builds entries for individual provider chat models, context_length was previously only set when the REGISTRY provider entry carried defaultContextLength. For providers without that field (or when alias resolution fails to map to a REGISTRY key), models shipped without any context_length, causing OpenCode and other clients to fall back to a ~4000 token limit. Now getDefaultContextFallback calls getTokenLimit() as the ultimate fallback, which resolves through env overrides, models.dev DB, name heuristics, and hardcoded defaults — always returning a value. Fixes the same class of bug as 3dc7542e (combo context_length) but for individual (non-combo) models. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: remove docs from .dockerignore #2120 * refactor: improve type safety and add cloud agent providers - Update types in several files to reduce usage of `any` - Fix `fetch` body type error in `AntigravityExecutor` by returning `ReadableStream` - Add `CLOUD_AGENT_PROVIDERS` constants Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(core): strengthen typing and normalize auth and model flows Tighten executor, usage, model-resolution, and state-management code with explicit types and safer record handling to reduce runtime edge cases across providers. Also normalize management-token failures to 403 responses, require API keys consistently on cloud agent task routes with CORS-safe errors, refresh stale Gemini CLI project IDs, prioritize Gemini search tools correctly, add new provider/model registry entries, and serialize integration tests for more reliable CI. * fix(chatcore): stop leaking provider credentials in response headers Remove upstream provider headers from non-stream chatCore JSON responses to prevent authorization and API key values from being exposed to clients. Add coverage to verify sensitive provider request headers are omitted while OmniRoute metadata headers remain present. * fix: restore cloud agent provider exports and logger import (#2138) Integrated into release/v3.8.0 — cloud agent provider exports and logger import fixes were already present in the release branch. Thank you for the quick response to the crash report! * fix(sanitizer): preserve reasoning_content on assistant messages with tool_calls (#2140) Integrated into release/v3.8.0 — preserves reasoning_content on assistant messages with tool_calls/function_call, fixing Kimi 400 errors. * docs(changelog): add entries for PRs #2136, #2137, #2138, #2140 and update contributor credits * fix: remove duplicate cloud agent provider constants (#2141) Integrated into release/v3.8.0 — Kiro model alias normalization (dash→dot), trimmed duplicate catalog entries, and new tests. * docs(changelog): add PR #2141 entry and update contributor credits * fix(types): remove extraneous config/models from AutoComboConfig returns and type seedConnection overrides * fix(cli): harden setup, doctor, and backup workflows Hide admin password entry during setup, make doctor degrade to warnings when source-only runtime checks are unavailable, and improve stop behavior by attempting graceful shutdown before force killing ports. Also use SQLite's backup API for safer snapshots under WAL, align CLI key writes with the current provider_connections schema, and include follow-on compatibility fixes for GLM provider detection, stream error sanitization, and auth-aware test coverage. * chore(hooks): disable husky pre-push test enforcement Comment out the npm availability guard and unit test execution in the pre-push hook so pushes are no longer blocked by local hook checks. This shifts validation away from developer machines and avoids failures in environments where npm is unavailable or hooks are undesired. * fix(kiro): avoid treating high-traffic 429s as quota exhaustion (#2153) Integrated into release/v3.8.0 — fixes transient Kiro 429s being incorrectly classified as quota exhaustion * fix(kiro): synthesize tools schema when history references tool_calls without body.tools (#2149) Integrated into release/v3.8.0 — synthesizes tools schema for Kiro when body.tools is omitted but history has tool_calls * fix(openai-responses): propagate include so chat clients stream reasoning summaries (#2154) Integrated into release/v3.8.0 — propagates include array so chat clients stream reasoning summaries via Responses API * chore(models): tidy up alibaba-coding-plan and cursor provider (#2150) Integrated into release/v3.8.0 — tidies up Alibaba Coding Plan and Cursor provider model catalogs * fix(catalog): cherry-pick type safety from PR #2152 — remove .ts imports, as any casts, add CustomModelEntry/ComboModelStep types Co-authored-by: herjarsa <herjarsa@users.noreply.github.com> * fix: Added in debug mode, support for storing raw data in json (#2156) Integrated into release/v3.8.0 — configurable chat log truncation, CHAT_DEBUG_FILE mode, cloudflared state file lock * feat(resilience): add model cooldowns dashboard card with real-time list and re-enable Cherry-picked from PR #2146: ModelCooldownsCard.tsx, model-cooldowns API route, ResilienceTab integration. Co-authored-by: rafacpti23 <rafacpti23@users.noreply.github.com> * fix(openai-responses): emit reasoning summary as delta.reasoning_content (#2159) Integrated into release/v3.8.0 — emit reasoning summary as delta.reasoning_content for Chat Completions clients * docs: add contributor credits to CHANGELOG for all merged/cherry-picked PRs Also update review-prs workflow to mandate CHANGELOG credits when cherry-picking is used, preventing credit erasure from release notes. * docs(workflow): strictly restrict cherry-pick to locked PRs only Mandate direct PR fixes over cherry-picking in all cases where the maintainer has write access to the contributor's branch. Explicitly forbid using cherry-pick just to bypass conflict resolution. * fix(providers): correct pollinations requests and provider dashboard state Update Pollinations request transformation to send the selected model and stream flag so requests match the active endpoint behavior. Align the ChatGPT TLS client with shared proxy resolution so dashboard proxy context is honored before falling back to environment settings. Also refresh provider display names across dashboard pages, correct the Claude extra-usage toggle messaging and visual state, and mark Pollinations as offering a free public endpoint. * refactor(catalog): remove .ts imports, as any casts, normalize alias resolution (#2152) Integrated into release/v3.8.0 — removes .ts import extensions, replaces as any casts with proper types, and normalizes provider alias resolution in combo context_length calculation. * fix(providers): allow optional-key providers to pass connection test (#2169) Integrated into release/v3.8.0 — allows optional-key providers (SearXNG, Petals, self-hosted chat, OpenAI/Anthropic-compatible) to pass connection test by centralizing the check in providerAllowsOptionalApiKey(). * fix(translator): inject thinking placeholder for all Claude-shape upstreams (#2161) Integrated into release/v3.8.0 — removes redundant provider guard in prepareClaudeRequest, fixing thinking placeholder injection for all Claude-shape upstreams (kimi-coding, glmt, zai). * fix(executors): sanitize reasoning_effort for non-supporting providers (#2162) Integrated into release/v3.8.0 — adds sanitizeReasoningEffortForProvider hook to BaseExecutor, fixing xhigh→high downgrade for non-supporting providers and full strip for mistral/devstral and GitHub Claude models. * feat(responses): degrade background mode to synchronous execution (#2164) Integrated into release/v3.8.0 — degrades background:true to synchronous execution instead of 400, enabling Capy and similar clients that set background:true by default to work seamlessly. * chore(registry): refresh per-model contextLength/maxOutputTokens for active providers (#2163) Integrated into release/v3.8.0 — refreshes per-model contextLength/maxOutputTokens for claude, kiro, github, kimi-coding, xiaomi-mimo, and codex/gpt-5.5 (OAuth cap 400K). Fixes provider-ID mismatch causing context_length fallthrough to defaults. * feat(api): aggregate combo model metadata in catalog (#2166) Integrated into release/v3.8.0 — adds target-based metadata aggregation for combo entries in /v1/models using least-common-denominator approach (context_length, max_output_tokens, capabilities, modalities). * fix(cliproxyapi): Anthropic-shape body routing and gate compatibility (#2165) Integrated into release/v3.8.0 — three fixes for CliProxyApi: Anthropic-shape body routing to /v1/messages, Capy premium extras strip, and mcp_* tool name rewrite to avoid Anthropic gate. Tests added covering all three categories. * feat(resilience): expose model cooldown list with manual re-enable (#2146) Integrated into release/v3.8.0 — adds model cooldowns dashboard card with real-time list and re-enable action. Domain module and unit tests added. * feat(oauth): complete Windsurf / Devin CLI OAuth + API-token flows (#2168) Integrated into release/v3.8.0 — complete Windsurf/Devin CLI OAuth + API-token executor flows with unit tests. * feat(search): add Ollama Search as a web search provider (#2176) Integrated into release/v3.8.0 — adds Ollama Search as a web search provider. * chore(release): update CHANGELOG.md with v3.8.0 unreleased entries for PRs #2146, #2161-2168, #2176 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cliRuntime): resolve TDZ for isWindows in devin config via lazy getter, add spawn metachar guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(claude): strip internal _claudeCode markers from OAuth requests (#6) Co-authored-by: capy-ai[bot] <230910855+capy-ai[bot]@users.noreply.github.com> * fix(translator): omit tool.strict when not a boolean in openai-responses translator Capy/OpenAI Responses sometimes sends tools with `strict: null`. Both Chat->Responses and Responses->Chat conversion paths in openai-responses.ts were forwarding that null straight through, which Xiaomi MiMo (v2.5/v2.5-pro) rejects with: [400]: body.tools.0.function.strict: Input should be a valid boolean, input: None Fix: only spread `strict` into the produced function spec when it is a real boolean. `null` / `undefined` are dropped so MiMo and other strict OpenAI-compatible validators accept the request. Equivalent to the runtime "Patch L" we used to apply against bundled chunks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(executors): strip stream_options on non-streaming OpenAI-compatible turns DeepSeek (and other strict OpenAI-compatible providers) reject: [400]: stream_options should be set along with stream = true when an inbound request carries `stream_options` while `stream` is false or absent. The existing default executor only handled three branches: 1. anthropic-compatible-* providers: strip stream_options unconditionally 2. stream=true + openai target: add/keep stream_options (or strip if providerSpecificData.disableStreamOptions) 3. otherwise: leave stream_options as-is That last branch passed through stream_options on non-streaming OpenAI- compatible turns, which is exactly what DeepSeek rejects. Fix: add an explicit branch that drops stream_options whenever stream is false and the field is present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(claude-oauth): don't auto-inject CC reasoning extras for non-Claude-Code clients When Capy/OpenAI-bridged traffic reaches the Claude OAuth path (hasClaudeOAuthToken without isClaudeCodeClient), the cloak block was unconditionally defaulting to: thinking: { type: "adaptive" } context_management: { edits: [{ type: "clear_thinking_20251015", ... }] } output_config: { effort: "high" } Two problems: 1. Anthropic enforces Claude-Code wire-image body shape on the user:sessions:claude_code OAuth scope (#2130-family). When the generic bridge upstream also attached its own thinking/output_config (Capy-style), the combined body diverges from the real CLI wire image and Anthropic returns 429 `Extra usage is required` / 400 `out of extra usage` with `x-should-retry: true` and `anthropic-ratelimit-unified-overage-disabled-reason: out_of_credits` — body-shape misclassification, not real quota. 2. Forced extended-thinking + high effort burns the Claude Max 5h quota in ~15 min for Opus 4.7 (#1761). Fix: for `hasClaudeOAuthToken && !isClaudeCodeClient`, strip `thinking`/`output_config`/`context_management` instead of injecting CC defaults. Real Claude Code clients keep their existing default-inject behavior. Anyone who genuinely wants adaptive thinking on bridged traffic can opt in with `x-omniroute-thinking: adaptive`. Mirrors the runtime "Patch I2/I4" effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(thinking): hydrate budget config from DB on startup + hot-reload The thinkingBudget service's in-memory _config defaulted to PASSTHROUGH and was only updated by the POST /api/settings/thinking-budget route. On cold container start, the user's saved adaptive/custom mode in DB was never loaded — so the runtime ran on PASSTHROUGH 100% of the time regardless of UI configuration. Wire thinkingBudget through the canonical runtimeSettings snapshot dispatcher so: - Startup: settings.thinkingBudget is read from DB and pushed to the service via setThinkingBudgetConfig - Hot-reload: settings POST triggers the same dispatcher and the service receives the update without container restart Pattern matches existing modelAliases, backgroundDegradation, etc. sections in runtimeSettings.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wire-image): normalize thinking on source body before rebuild Three bypass paths in chatCore never invoked applyThinkingBudget, so client-side thinking shapes (Capy's adaptive, raw reasoning_effort strings, etc.) survived untranslated and broke downstream Anthropic strips: 1. shouldUseClaudeCodeWireImage — the critical one. The branch calls translateRequest(CLAUDE→OPENAI) to produce normalizedForCc and applyThinkingBudget runs *on that copy* only. Then buildClaudeCodeCompatibleRequest picks resolveClaudeCodeCompatibleThinking from claudeBody.thinking || sourceBody.thinking, which both reference the unchanged original body. The normalized form on normalizedBody is preferred third — reached only when the first two are absent. Net effect: the wire-image rebuild discards the normalization. Fix: invoke applyThinkingBudget(body) at the top of the wire-image branch so claudeBody/sourceBody pickups see the canonical Anthropic shape ({type:"enabled", budget_tokens:N}). 2. nativeCodexPassthrough — similar bypass. Now normalized for consistency, even though Codex backend mostly uses reasoning_effort. 3. isClaudePassthrough — same fix added inside the branch. After this, every outbound chat path normalizes thinking exactly once before reaching its executor's transformRequest hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): preserve CC wire-image output_config + context_management Follow-up to the conditional thinking strip. Two more fields that were being unconditionally stripped from Anthropic-shape bodies are required by Anthropic's Claude Code wire-image validation: - output_config: {effort: "low"|"medium"|"high"} — accepted as part of the CC contract - context_management: {edits: [{type:"clear_thinking_20251015", ...}]} — the standard CC thinking cleanup edit buildClaudeCodeCompatibleRequest injects both with CC-spec values, but the prior unconditional strip in this executor deleted them before they reached Anthropic. Without those fields, the body no longer matches the CC wire image; Anthropic accepts the request but silently disables thinking (no thinking content blocks in the response). The strips were originally added (PR #2165, commit afb9d72b) to defend against raw Capy/SDK shapes like output_config.effort="xhigh" and arbitrary context_management.* fields that triggered Anthropic 400 "Extra usage required" / "out of extra usage". Make those strips shape-aware: - output_config: preserve only if it has exactly {effort: "low"|"medium"|"high"}; strip anything else (including xhigh, unknown keys, or extra fields) - context_management: preserve only if exactly {edits: [...]} where every edit has type prefix "clear_thinking_"; strip otherwise Also harden the thinking strip to reject `display` field on the "enabled" type (was: only checked for adaptive). And accept {type:"adaptive"} (no display) since that's the CC default shape. 4 new test cases (preserve high effort, preserve clear_thinking edit, preserve plain adaptive). Existing strip tests for xhigh / auto_summarize unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wire-image): inject context_management + enforce thinking temperature buildClaudeCodeCompatibleRequest produces the CC base body but does not inject context_management (the clear_thinking_20251015 edit) or enforce temperature=1 when thinking is enabled. Those steps live in buildAndSignClaudeCodeRequest, which only runs on the native claude executor path. For the cliproxyapi path, the body bypassed them and reached Anthropic incomplete: with thinking enabled but no context_management and no temperature=1 constraint, Anthropic appears to silently disable thinking — the response contains text only, no thinking blocks. Mirror the constraint steps inline after buildClaudeCodeCompatibleRequest so any downstream executor (native claude OR cliproxyapi) receives a fully-formed CC wire image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(thinking): 5-tier effort baselines + dual emit + globalThis singleton Three changes that close the loop on Capy adaptive BYOK: 1. globalThis-anchored _config singleton Next.js bundles open-sse/services/thinkingBudget.ts into multiple separate JS chunks (server-init, route handlers, edge fns, open-sse handlers). Each bundle had its own module-level `_config`, so setThinkingBudgetConfig from one bundle (e.g. runtimeSettings startup hydration) didn't propagate to the bundle that runs applyThinkingBudget (e.g. chatCore wire-image branch). Move _config to globalThis via Symbol.for("omniroute.thinkingBudget._config"). All bundles now read/write the same singleton. Observed pre-fix symptom: DB had mode=custom (and earlier mode=passthrough), but runtime always behaved as adaptive with default effortLevel=medium — the in-memory _config in the chat bundle was never updated. 2. 5-tier effort baselines (low/medium/high/xhigh/max) New EFFORT_BASELINES table for adaptive mode: low: 2048 high: 16384 medium: 6144 xhigh: 32768 max: 65536 (subject to per-model cap) Adaptive now picks the baseline from (priority order): a. body.output_config.effort (CC wire-image input) b. cfg.effortLevel (settings UI) c. "medium" (default) Then scales by the multiplier (1.0×–2.8×) from signal stacking, then caps via capThinkingBudget(model, ...). 3. Dual emit on output setCustomBudget now emits BOTH: - thinking.{type:"enabled", budget_tokens:N} - output_config.effort: <tier label> Anthropic Claude Code wire image accepts both signals; emitting the label gives explicit tier intent on top of the precise budget. Wire-spec tops out at "xhigh" (CC headers and OpenAI reasoning_effort both accept low/medium/high/xhigh). The "max" tier is settings-only and emits "xhigh" on the wire. 5 new test cases cover the new effortLevel-tier mapping, body output_config priority, and dual-emit shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): probe /v1/models for health (CPA 6.x has no /health) The dashboard reported "CLIProxyAPI not detected" even with CPA up and successfully serving /v1/messages. Root cause: CPA 6.x doesn't expose a /health endpoint — GET /health returns 404, which made res.ok false and the executor's healthCheck() report ok=false. Switch to GET /v1/models, which CPA does serve (returns the advertised model list with 200). It's the closest thing CPA has to a liveness probe and works on all CPA versions we've tested. Verified post-fix: dashboard now flips to "CLIProxyAPI detected" without any other change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(stream): skip [DONE] terminator for Claude SSE clients Anthropic SSE streams terminate naturally on message_stop — there is no `data: [DONE]` line. OmniRoute was unconditionally appending one at end of every stream (gated only on OPENAI_RESPONSES), which: - Capy (Anthropic SDK) sees an extra unparseable line after message_stop. Result: text content gets rendered in the "Thought" area of the UI, follow-up turns retry from a corrupt state. - Native claude-cli, claude-code, and other Anthropic SDK consumers hit the same parse hiccup but tolerate it differently. Add `clientExpectsClaudeStream` gate alongside the existing `clientExpectsResponsesStream`. Both the passthrough and translate finalization branches now check both flags before emitting `[DONE]`. For Claude clients: stream ends after message_stop, with the trailing `: x-omniroute-*` metadata comments. Standards-compliant SSE — no terminator line needed. Tested with Capy BYOK → Opus 4.7: first-turn thinking renders in the correct UI section; followup turns no longer trigger a retry loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(claudeHelper): emit data field on redacted_thinking, drop bogus signature The thinking→redacted_thinking conversion in prepareClaudeRequest was shape-invalid against Anthropic's validation: - Set `signature` on redacted_thinking (wrong field — signature only exists on regular thinking blocks) - Omitted the required `data` field Result: messages.N.content.0.redacted_thinking.data: Field required (400) whenever a multi-turn conversation echoed an earlier assistant turn back to Anthropic (Capy followup with tool_use, e.g., after the assistant returned thinking + text). Emit only the correct fields per block type: - redacted_thinking: { type, data } ← data is mandatory - thinking: { type, thinking, signature } Use DEFAULT_THINKING_CLAUDE_SIGNATURE as the data placeholder — it's a proven valid Anthropic protobuf-format blob, accepted by /v1/messages on replay. The placeholder thinking-block path (added when thinkingEnabled + tool_use without precursor thinking) also switches to the redacted_thinking shape with `data`, since that's the variant Anthropic accepts without re-validating signatures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(thinking): shape-aware setCustomBudget — strip Anthropic fields on OpenAI/Codex bodies Regression introduced by 5-tier dual emit (ba32440a): setCustomBudget unconditionally injected `thinking:{type:enabled, budget_tokens:N}` and `output_config:{effort:...}` whenever the model was thinking-capable. Codex Responses API rejects these Anthropic-shape fields with 400 "Unsupported parameter: thinking" — observed live on gpt-5.5 calls. Detect OpenAI/Codex shape via any of: `_nativeCodexPassthrough`, `input` array, `instructions` string, `reasoning` object, `reasoning_effort` string. On those bodies, emit only `reasoning_effort`/`reasoning.effort` (clamped to low|medium|high since Codex/OpenAI Chat Completions reject xhigh/max as effort labels) and strip any leaked Anthropic-shape fields defensively. On Anthropic-shape bodies, keep the existing dual emit (thinking + output_config) — CC wire image needs both signals. Tests: 3 new cases covering OpenAI Chat Completions (o3-mini), OpenAI Responses (gpt-5.5 with reasoning object), and explicit _nativeCodexPassthrough marker. Updated existing CUSTOM test to assert clamping + no-leak invariants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): detect Anthropic shape on minimal Capy bodies Discovered post-deploy: simple Capy /v1/messages requests (string content, no system block) were misdetected as OpenAI-shape and routed to /v1/chat/completions instead of /v1/messages. CPA then responded with chat.completion shape, leaking OpenAI shape to Anthropic SDK clients and skipping the Anthropic CC wire-image cloak. Strengthen isAnthropicShape with two more strong signals (any one is decisive): - top-level `thinking` field (Anthropic-only; OpenAI uses `reasoning`) - top-level `metadata.user_id` (CC wire-image OAuth identifier) These survive even on minimal bodies where messages[0].content is a string and no system block is present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): rewrite mcp_ refs in prose + preserve metadata.user_id Two related fixes for the Capy "Claude answers in Thought area" symptom. **Tool-name reference rewrite** The existing `^mcp_[^_]` → `Mcp_X` rewrite (dodges Anthropic's MCP-connector billing gate) renamed the tool but left every reference to those names unchanged in the system prompt and tool descriptions. Result: the model read "use mcp_call" in the prompt, found only `Mcp_call` in the tool catalog, gave up on tool-calling, and emitted plain text — which Capy's agent loop treats as a "reasoning trace" and renders in the Thought panel (per Capy's system prompt: "Plain assistant text outside of `message_user` is treated as a reasoning trace"). Apply the same regex transformation to all textual references to those names: top-level `system` blocks and `tools[*].description`. Single-pass regex (no name enumeration) so adding new mcp_* tools needs no code change. Skip message content blocks — those may carry user-supplied text we shouldn't mutate. **Diagnostic toggle** Add `OMNIROUTE_DISABLE_MCP_REWRITE=1` env to bypass the rewrite entirely for probing whether the gate fires from tool name vs other body signals. Confirmed 2026-05-12: gate fires even with valid OAuth + CPA cloak when rewrite is OFF, so the rewrite stays ON by default. **metadata.user_id preservation** Previously stripped `metadata` unconditionally on Anthropic-shape bodies. Now preserve a bare `{user_id: <string>}` shape. Sets up cooperation with a future CPA patch that uses the Capy user_id as a deterministic seed for the cloaked `account_uuid` + `session_uuid` (current CPA: random UUID per call → no Anthropic prompt-cache hits across Capy turns). Strip metadata otherwise (Capy may add session_id and other extras Anthropic rejects). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(modelSpecs): cap thinking budget for Claude Opus 4.6 / 4.7 / Sonnet 4.6 Capy + adaptive mode hit Anthropic's 400 "budget out of range [1024, 128000]" on Opus 4.7. Root cause : these three model specs had no `thinkingBudgetCap`, so `capThinkingBudget` was a no-op and the adaptive multiplier on top of `output_config.effort=max` (baseline 65536) could produce budgets up to 65536 * 2.8 = 183500 — way past Anthropic's hard cap of 128000 for Opus 4.7. Live trace (artifact 2026-05-12T10-19-52) : clientRaw.output_config = { effort: "max" } → adaptive tier="max", baseline=65536 → 13 messages (+0.5) + 25 tools (+0.5) + recent tool_use (+0.3) = 2.3× → 65536 * 2.3 = 150733 → outbound thinking.budget_tokens = 150733 ← UNCAPPED → Anthropic 400 "budget 150733 out of range [1024,128000]" Add `defaultThinkingBudget` + `thinkingBudgetCap` for the three affected specs. Caps sit a touch below Anthropic's stated max to leave headroom for the visible response within `max_tokens` (thinking + visible response both count against `max_tokens`) : Opus 4.7 : default 32000, cap 120000 (Anthropic max 128000) Opus 4.6 : default 32000, cap 120000 (Anthropic max 128000) Sonnet 4.6 : default 16000, cap 60000 (~94% of maxOutputTokens=64000, mirroring Opus 4.5's 32000/32768) Tests ----- - New ADAPTIVE test that drives the exact 150733-causing condition (effort=max + 13 msgs + 25 tools + recent tool_use) and asserts the result falls within Anthropic's [1024, 128000] range. - Two existing `-thinking` suffix auto-inject tests loosened to assert `budget_tokens > 0` instead of an exact constant — they were over- specifying behavior that the new defaults make per-model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(thinking): stop injecting CC wire-image signals on Capy BYOK passthrough Three combined changes reverse a regression where Claude Opus 4.7 ignored Capy's `message_user` tool contract and responded in raw text instead. 1. chatCore.ts isClaudePassthrough branch: drop the `applyThinkingBudget` call added earlier. cliproxyapi.transformRequest already silently strips Capy SDK extras (`thinking.display`, `output_config.effort=max`) on the conditional-strip path, so forwarding the body as-is is sufficient. 2. thinkingBudget.ts default mode: revert ADAPTIVE → PASSTHROUGH. Adaptive default upgraded {adaptive,display} to {enabled,budget_tokens:N} and added output_config.effort=xhigh, which combined with CPA's CC sentinel gave Anthropic the full Claude Code agent signature. 3. thinkingBudget.ts setCustomBudget: stop injecting output_config.effort on Anthropic-shape bodies. Emit only `thinking` and forward whatever output_config the client supplied. Diagnosed via artifacts 2026-05-12T10-43 (adaptive: providerRequest had thinking enabled + output_config xhigh injected) vs 10-52 (passthrough: clean providerRequest). Both produced text-only responses, confirming adaptive's injection was the OmniRoute-side contributor. Tests: 39/39 thinking-budget green, 55/55 cliproxyapi+translator green. * refactor(cliproxyapi): remove over-engineered Anthropic-shape conditional strips Bisect-driven simplification (2026-05-12, 11 variants × 2 turns + 5-turn stress test + gate probe against live Anthropic via CPA cloak). Each variant disabled ONE strip family at a time; all 11 variants returned HTTP 200 + tool_use(message_user), and the cumulative all-off variant remained stable over 5 turns. Anthropic accepts the input shapes that these strips were preventatively removing. Strips removed: - client_info / prompt_cache_key / safety_identifier No client we proxy sends these today and Anthropic does not reject them when present. The strip was a guard against a hypothetical extras-billing gate that the bisect could not reproduce. - metadata conditional (keep only `{user_id: <string>}`) Anthropic accepts metadata objects with additional keys. The deterministic CC-shape user_id is now injected CPA-side (see router-for-me/CLIProxyAPI PR #3356) so OmniRoute no longer needs to constrain the shape here. - thinking shape conditional (Capy SDK extras like `display:"summarized"`) Anthropic ignores unknown thinking-object keys without 400-ing. The strip was silently nuking a `{type:"adaptive"}` shape that Anthropic accepts as-is. - output_config.effort whitelist (low/medium/high/xhigh only) Anthropic accepts other effort labels (including the Capy SDK "max" label) without flagging the extras-billing gate. - context_management.edits whitelist (clear_thinking_* only) Same pattern: Anthropic accepts a broader set than our whitelist. What remains: - isAnthropicShape detection (used for routing, not strip) - mcp_ tool-name rewrite (historical char-by-char gate confirmation on 2026-05-11; today the gate does not fire on these names, but the rewrite is cheap and reversible via the response-side _toolNameMap) The combined effect of these strips on Capy BYOK was a regression: the silent strip of thinking/output_config shapes interacted with the CPA cloak's system-prompt sanitize to leave Claude with no anchor for the client's tool-use contract (message_user), which it then ignored. With the strips removed, the contract reaches Claude intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(cliproxyapi): drop mcp_ prose rewrite, keep name-only rewrite The text-substitution pass that mirrored `mcp_X` → `Mcp_X` across system prompt blocks and tool descriptions was added on the theory that the model needs consistent naming between prompt and tool catalog. Bisect 2026-05-12 disproved that: with prose rewrite off (name rewrite still on), Claude continues to call the rewritten tools correctly. The prose pass was modifying client content (system prompts, tool descriptions) without measurable benefit — pure edit-distance noise. Removes: - MCP_NAME_REF_RE regex - mcpRewriteOf helper - The body.system + body.tools[].description rewrite block at the end of applyMcpToolNameRewrite Keeps: - rewriteMcpToolName + MCP_RESERVED_PREFIX_RE (gate-dodge on tool names, tool_use blocks, tool_choice) - Response-side reverse map via _toolNameMap (untouched) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cliproxyapi): assert passthrough for previously-stripped fields Mirror the executor simplification: tests now assert that Capy SDK extras (thinking with display, output_config:{effort:'max'}, context_management with non-CC shape, metadata with extras, client_info, prompt_cache_key, safety_identifier) reach the upstream body verbatim instead of being stripped. The Anthropic-shape detection test is refactored to use the _toolNameMap signature (set only on the Anthropic branch) instead of the now-removed output_config strip as its observable signal. 41/41 cliproxyapi-executor tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(reasoning-cache): include xiaomi-mimo in replay provider/model detection MiMo (Xiaomi) enforces the same "echo reasoning_content on subsequent turns" contract as DeepSeek and Kimi-thinking. Without replay, the upstream returns 400: data:{"error":{"code":"400","message":"Param Incorrect", "param":"The reasoning_content in the thinking mode must be passed back to the API.","type":""}} Repro: client sends a multi-turn /v1/messages body where the assistant history has tool_use blocks but no thinking blocks (Capy and most BYOK clients strip thinking on the wire). MiMo refuses without the reasoning_content from the previous assistant turn. The reasoning replay cache (issue #1628) already captures reasoning_content from non-streaming responses with tool_calls and re-injects it on the request side. But the gate `requiresReasoningReplay(provider, model)` did not include MiMo: REASONING_REPLAY_PROVIDERS missed "xiaomi-mimo" REASONING_REPLAY_MODEL_PATTERNS had no /mimo/ entry So the captured reasoning was discarded on the next turn instead of replayed. Fix: - Add "xiaomi-mimo" to REASONING_REPLAY_PROVIDERS - Add /^mimo[-.]?v\d/i to REASONING_REPLAY_MODEL_PATTERNS (defensive match if a wildcard route assigns a non-xiaomi-mimo provider ID to a mimo-* model alias) Tests: 4 new cases (40/40 green) covering both provider-id and model- pattern detection paths, including XIAOMI-MIMO uppercase normalization. * fix(claudeHelper): preserve latest assistant thinking blocks verbatim Anthropic now enforces that the latest assistant messages thinking or redacted_thinking blocks cannot be modified when replaying a conversation. Older assistant messages can still be rewritten to redacted_thinking { data } as before. Symmetric behavior on non-Anthropic Claude-shape upstreams: the latest assistant message plain thinking text is preserved verbatim; only older messages fall back to reasoningCache or the NON_ANTHROPIC_THINKING_PLACEHOLDER. Fixes: live error thinking or redacted_thinking blocks in the latest assistant message cannot be modified (49/h on prod 2026-05-12) --------- Co-authored-by: wauputr4 <103489788+wauputr4@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: backryun <bakryun0718@proton.me> Co-authored-by: nickwizard <35692452+nickwizard@users.noreply.github.com> Co-authored-by: diegosouzapw <diego.souza.pw@gmail.com> Co-authored-by: Muhammad Tamir <muhammad.tamir@gmail.com> Co-authored-by: congvc <congvc-dev@gmail.com> Co-authored-by: Jan Leon <jan.gaschler@gmail.com> Co-authored-by: Automation <automation@omniroute> Co-authored-by: wucm667 <109257021+wucm667@users.noreply.github.com> Co-authored-by: Hernan Javier Ardila Sanchez <hjasgr@gmail.com> Co-authored-by: ipanghu <bypanghu@163.com> Co-authored-by: xssdem <xssdem@icloud.com> Co-authored-by: Sergey Morozov <tr0st@bk.ru> Co-authored-by: Tentoxa <53821604+Tentoxa@users.noreply.github.com> Co-authored-by: Paijo <14921983+oyi77@users.noreply.github.com> Co-authored-by: Alexander Averyanov <alex@averyan.ru> Co-authored-by: Nathan Pham <tendaigom@gmail.com> Co-authored-by: rodrigogbbr-stack <rodrigogb.br@gmail.com> Co-authored-by: ivan_yakimkin <gi99lin@yandex.ru> Co-authored-by: Gi99lin <74502520+Gi99lin@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivan-mezentsev <ivan@mezentsev.me> Co-authored-by: guanbear <123guan@gmail.com> Co-authored-by: Eric Chan <tces1@hotmail.com> Co-authored-by: Dohyun Jung <ddark.kr@gmail.com> Co-authored-by: Markus Hartung <mail@hartmark.se> Co-authored-by: Raxxoor <manker_lol@hotmail.com> Co-authored-by: Gleb Peregud <gleber.p@gmail.com> Co-authored-by: Ilham Ramadhan <28677129+rilham97@users.noreply.github.com> Co-authored-by: Yoviar Pauzi <84509445+yoviarpauzi@users.noreply.github.com> Co-authored-by: Pham Quang Hoa <hoapq01@sungroup.com.vn> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Gioxa <barelravo@gmail.com> Co-authored-by: payne <baboialex95@gmail.com> Co-authored-by: Ramel Tecnologia <146174365+rafacpti23@users.noreply.github.com> Co-authored-by: smartenok-ops <smartenok@gmail.com> Co-authored-by: eleata <hernaninverso@gmail.com> Co-authored-by: Abhinav Kumar <abhinavofjnu@gmail.com> Co-authored-by: clousky2020 <33016567+clousky2020@users.noreply.github.com> Co-authored-by: Randi <55005611+rdself@users.noreply.github.com> Co-authored-by: boa <42885162+boa-z@users.noreply.github.com> Co-authored-by: Hoa Pham <hoapq.4398@gmail.com> Co-authored-by: HomerOff <homeroff76@gmail.com> Co-authored-by: christlau <christlau@users.noreply.github.com> Co-authored-by: oyi77 <oyi77@users.noreply.github.com> Co-authored-by: FlyingMongoose <399379+flyingmongoose@users.noreply.github.com> Co-authored-by: Davy Massoneto <davy.massoneto@yahoo.com> Co-authored-by: herjarsa <herjarsa@users.noreply.github.com> Co-authored-by: rafacpti23 <rafacpti23@users.noreply.github.com> Co-authored-by: Andrew Munsell <andrew@wizardapps.net> Co-authored-by: Aleksandr <157302440+Zhaba1337228@users.noreply.github.com> Co-authored-by: capy-ai[bot] <230910855+capy-ai[bot]@users.noreply.github.com> Co-authored-by: OmniRoute Ops <ops@nomenak.dev>
…#10) * feat: add kie media provider support * Update open-sse/handlers/videoGeneration.ts Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update open-sse/handlers/imageGeneration.ts Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update open-sse/handlers/imageGeneration.ts Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * feat(providers): add KIE text models and expand video models catalog * feat(ui): update media dashboard with new KIE video models * refactor(providers): robust KIE handlers with dynamic polling and improved types * refactor(providers): address code review feedback for KIE provider * chore(providers): prune redundant provider icon assets (#1992) Integrated into release/v3.8.0 * feat(gemini-cli): add custom projectId support (UI, DB, executor) (#1991) Integrated into release/v3.8.0 * docs: update CHANGELOG and bump version to 3.8.0 * fix(mitm): add Linux cert install and skip sudo password when root Add Linux certificate management via update-ca-certificates for Docker support. Skip sudo password validation when running as root, matching the existing cli-tools route behavior. * fix(cli): resolve .env loading failure for global npm installations * fix: remove Anthropic-Beta header from non-Anthropic providers to fix identity contamination (#1989) * chore(release): bump to v3.8.0 — changelog, docs, version sync * fix(dashboard): resolve Unknown plan display in Provider Limits - Replace || "Unknown" fallbacks with || null in usage.ts (GLM + Claude legacy) - Add plan extraction to Claude OAuth mapTokens (account_tier > plan > subscription_type > billing.plan) - Add unit tests for plan extraction and Provider Limits badge resolution * fix(dashboard): revert GLM and Claude legacy plan fallbacks to Unknown The original fix replaced || "Unknown" with || null for GLM and Claude legacy (non-OAuth) paths. Per user clarification, "Unknown" is a valid display fallback when no plan data exists — null-based fallbacks caused the Provider Limits dashboard to show no badge rather than a clear "Unknown" indicator. Revert only the usage.ts changes. Claude OAuth mapTokens plan extraction (claude.ts) and the associated tests remain unchanged. * feat: add kie media provider support * fix: address kie provider review feedback * fix: preserve kie market model ids * fix: address kie provider pr review * feat(combos): add reset-aware routing strategy * feat: add support for Z.AI provider and enhance quota handling * fix: generalize reset-aware quota routing * fix: address reset-aware routing review feedback * fix: address reset-aware follow-up feedback * feat: enhance GLM quota handling and add new quota labels for Z.AI * fix(mitm): prevent stub from loading at runtime via bypass module Turbopack resolveAlias (@/mitm/manager → manager.stub.ts) was designed for build-time safety but Next.js applies aliases to ALL imports — including dynamic ones. This caused await import("@/mitm/manager") at runtime to load the stub, which silently returned fake {running: true} without spawning the MITM proxy. The UI showed "MITM proxy started" but nothing was actually running. Fix introduces a two-path design: - @/mitm/manager → stub (build-time, safe for Turbopack) - @/mitm/manager.runtime → real manager (runtime, bypasses alias) Route handlers now dynamic-import from manager.runtime, which re-exports from ./manager and does NOT match the alias pattern. Additional fixes: - Make stub throw explicit errors at runtime so misconfiguration is immediately visible instead of silently faking success - Add server.cjs to outputFileTracingIncludes (NFT trace) and Dockerfile COPY so the MITM server binary exists in standalone/Docker output * fix(catalog): auto-calculate combo context_length from target model limits Fixes the root cause where OpenCode falls back to a ~4000 token limit for combos because no context_length is exposed in /v1/models. Previously combos only used context_length when set manually on the combo record. Now, when unset, the catalog computes the effective limit as the MINIMUM of its targets' individual token limits via getTokenLimit()/parseModel(). Manual values still override. Files changed: - src/app/api/v1/models/catalog.ts (+30 lines, auto-calc) - tests/unit/models-catalog-route.test.ts (+2 tests) Tests pass: 25/25 * chore(deps): resolve npm audit moderate vulnerability (hono) * chore: Remove Deprecated Models (#2033) Integrated into release/v3.8.0 * docs(env): add GITLAB_DUO_OAUTH_CLIENT_ID to .env.example (#2031) Integrated into release/v3.8.0 * fix(catalog): auto-calculate combo context_length from target model limits (#2030) Integrated into release/v3.8.0 * Update claude md and update glm-cn max context to 200k (#2027) Integrated into release/v3.8.0 * fix(chatgpt-web): plumb proxy through to native tls-client (#2022) (#2023) Integrated into release/v3.8.0 * fix(codex): expose native model ids in catalog (#2012) Integrated into release/v3.8.0 * feat(sse): refresh Claude OAuth wire image to claude-cli/2.1.131 (#2011) Integrated into release/v3.8.0 * fix: add fuzzy auto-combo routing for 'auto/*' model prefix (#2010) Integrated into release/v3.8.0 * Fix API key identity in usage analytics (#2008) Integrated into release/v3.8.0 * fix(docker): include OpenAPI spec in runtime image (#2007) Integrated into release/v3.8.0 * fix: allow Unicode letters in API key name validation (#1996) Integrated into release/v3.8.0 * fix: resolve model alias persistence double stringification preventing UI updates (#2018) * fix: dynamically filter bare model auto-resolution by active provider connections to prevent dead-routing (#2029) * fix: add Google Gemini embeddings compatibility via OpenAI-compatible endpoint mapping (#2006) * docs: update CHANGELOG.md for v3.8.0 (#2006, #2018, #2029) * feat(antigravity): overhaul identity, fingerprinting & envelope format - Add centralized antigravityIdentity service (sessionId, machineId, requestId) - Switch User-Agent to Electron/Chrome desktop format - Reorder upstream URLs: sandbox first, production last - Add runtime headers: x-client-name, x-client-version, x-machine-id, x-vscode-sessionid, x-goog-user-project - Add 403 retry without x-goog-user-project header - Add generation defaults (topK=40, topP=1.0, maxOutputTokens guard) - Strip cache_control from Claude requests recursively - Enterprise/consumer routing via userAgent field (jetski vs antigravity) - Update envelope field order and add enabledCreditTypes - MITM proxy: support multiple target hosts - Version: semver comparison with pickNewestVersion(), bump fallback to 4.1.33 - Update all affected tests * ci: update build-fork workflow to build from main branch * debug: add AG_REQUEST_HEADERS and AG_REQUEST_ENVELOPE debug logs Dumps outgoing headers (with masked Authorization) and envelope structure (fieldOrder, project, requestId, userAgent, requestType, enabledCreditTypes, sessionId, generationConfig) at debug level for production verification of identity overhaul. * fix(antigravity): don't inject default maxOutputTokens when client omits max_tokens Real Antigravity client does not send maxOutputTokens when the user hasn't specified it — the Cloud Code server decides the output limit. OmniRoute was incorrectly injecting a capped default from model specs, which caused thinking models to return empty content with low limits. * fix(antigravity): align identity protocol and behavior with official AM * fix(antigravity): add duplex half for streaming bodies * refactor: address PR review feedback * feat: implement global Codex fast service tier functionality and related settings * feat(usage): account for codex fast tier analytics * feat: add service tier breakdown component and handle missing docs directory * feat: enhance chat handling with cached settings and deduplicate quota fetches in reset-aware strategy * feat: add service tier column to usage_history and update migration checks * deps: bump hono from 4.12.14 to 4.12.18 (#2065) Integrated into release/v3.8.0 * fix(sse): use Gemini schema for Antigravity Claude (#2063) Integrated into release/v3.8.0 * feat(chat): dynamic tool limit detection with proactive truncation (#2061) Integrated into release/v3.8.0 * Fix bare GPT-5.5 routing for Codex-only installations (#2054) Integrated into release/v3.8.0 * fix(db): preserve legacy SQLite database path on Windows to prevent data loss (#1973) * docs: update changelog for issue 1973 resolution * feat: add fallbackDelayMs to combo configuration and related settings * feat: add STREAM_READINESS_TIMEOUT_MS and integrate into chat handling * fix(core): restore Claude Code adaptive thinking defaults and resolve audio transcription CORS regression - Restored default adaptive thinking injection for non-Haiku Claude Code models when explicit client headers are omitted. - Updated Claude OAuth unit tests to accurately account for dynamic cliUserID property injection in mapped credentials. - Fixed module resolution regression in audio transcription handler caused by missing getCorsOrigin utility. * fix: clean up proxy page redundancy and fix 1proxy sync empty body error (#2052) Integrated into release/v3.8.0 * fix(auth): allow bootstrap without password (#2048) Integrated into release/v3.8.0 * feat(combo): add context_length input field to combo edit form (#2047) Integrated into release/v3.8.0 * [cli omniroute] Add modular CLI setup and provider commands (#2046) Integrated into release/v3.8.0 * fix: Follow OpenAI specification, handle throttling in batch and fix UI (#2045) Integrated into release/v3.8.0 * fix(db): add missing migration renumbering entries for compression migrations (#2041) Integrated into release/v3.8.0 * fix(db): reduce hot-path persistence overhead (#2039) Integrated into release/v3.8.0 * fix(compression): support Responses input and expand Spanish rules (#2028) Integrated into release/v3.8.0 * feat(multi): manifest-aware tier routing — W1-W4 complete (#2014) Integrated into release/v3.8.0 * fix(db): resolve migration conflict by renumbering 051 to 052 and 053 * fix: clean up proxy page redundancy and fix 1proxy sync empty body error (#2052) * fix(sse): prevent Claude Code identity cloak overrides and fix fallback resilience (#2053) * fix: update dependencies and merge PR 2035 * Merge PR #2019 and resolve conflicts * feat: enhance error handling for semaphore capacity and implement fallback logic in chat processing * fix(runtime): harden timer handling and model pricing fallback Align runtime behavior with test and stream expectations across the app. Use `globalThis` timer APIs for SSE heartbeats, set the Playwright server `NODE_ENV` explicitly by mode, and fall back to Codex pricing lookups after stripping effort suffixes when a direct model match is missing. Refresh affected unit and e2e coverage to use deterministic timers and updated settings navigation so timeout- and stream-related assertions are stable on release builds. * feat: update API bridge proxy timeout to 600000ms and enhance related tests * fix(providers): strip OpenAI-specific fields in Kiro translator to prevent 400 errors (#2037) * fix(ui): resolve text contrast issues for zero-config warning banner in light mode (#2050) * fix(core): inject global system prompt correctly into downstream chat completions pipeline (#2080) * fix(routing): add missing v1beta rewrites to next.config to resolve 404 on Gemini models endpoint (#2102) * feat(api): allow configuration via API calls - open management routes to Bearer keys with manage scope - (#2103) Integrated into release/v3.8.0 * fix(antigravity): sanitize Claude Cloud Code payloads (#2090) Integrated into release/v3.8.0 * fix(kiro): normalize tool-use payloads (#2104) Integrated into release/v3.8.0 * feat(providers): batch delete provider connections via checkbox multi-select (#2094) Integrated into release/v3.8.0 * feat(providers): add 9 new free AI providers (LLM7, Lepton, Kluster, UncloseAI, BazaarLink, Completions, Enally, FreeTheAi) (#2096) Integrated into release/v3.8.0 * fix(api): usage and keys (#2092) Integrated into release/v3.8.0 * feat(mcp): add DeepSeek quota and limit feature - Add deepseekQuotaFetcher.ts for DeepSeek balance API integration - Integrate with quotaPreflight and quotaMonitor systems - Support both USD and CNY currency display - Add DeepSeek to USAGE_SUPPORTED_PROVIDERS whitelist - Add DeepSeek to PROVIDER_LIMITS_APIKEY_PROVIDERS - Credits-style UI display with currency symbols and color coding - Add comprehensive unit tests Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(usage): add extensible CURRENCY_SYMBOLS mapping for deepseek currencies * fix(kiro): merge adjacent user history turns after role normalization (#2105) Merged automatically * Refresh providers, model catalogs, and docs for v3.8.0 (#2088) Merged automatically * feat(cursor): full OpenAI parity (tool calls, streaming, sessions) (#2082) Merged automatically * deps: bump hono from 4.12.14 to 4.12.18 (#2079) Merged automatically * deps: bump fast-uri from 3.1.0 to 3.1.2 (#2078) Merged automatically * fix(glm): add dedicated coding transport (#2087) Integrated into release/v3.8.0 * Feat/qdrant embedding model discovery (#2086) Integrated into release/v3.8.0 * feat(auth): per-session sticky routing for codex (#1887) Integrated into release/v3.8.0 * fix(sse): prevent Claude OAuth multi-account correlation via metadata.user_id (#2053) Integrated into release/v3.8.0 * feat(cli): Comprehensive CLI Enhancement Suite - 20+ new commands (#2074) Integrated into release/v3.8.0 * README SEO/AEO/GEO + Competitive Marketing (#2091) Integrated into release/v3.8.0 * chore: update CHANGELOG.md for PR 2091 * chore(security): apply CodeQL fixes to release branch * chore(release): finalize v3.8.0 stabilization and fix typescript regressions - Fix stream readiness loop and upstream error code propagation in chatCore.ts - Resolve Headers iterator TypeScript errors - Fix type mismatches and missing props in BuilderIntelligentStep, Card, and providers page - Fix providerLimits typecasts and resolve implicit any errors - Ensure green build and strict type compliance for production * feat(circuit-breaker): classify 429 errors and apply per-kind cooldowns (#2116) Integrated into release/v3.8.0 * fix(sse): classify hour quota errors as QUOTA_EXHAUSTED * Fix CC-compatible streaming bridge * fix(i18n): complete Simplified Chinese translations * docs(i18n): sync CHANGELOG.md to 39 languages * feat(github): add targetFormat openai-responses to all GitHub models * chore: enhance Inworld TTS support * security: fix code scanning alerts — sanitize error messages and suppress false-positive hash warnings - Sanitize error messages in errorResponse() and cursor buildErrorResponse() to strip stack traces before sending to client (fixes js/stack-trace-exposure) - Add explicit CodeQL suppression comments for intentional SHA-256 usage in API key hashing (fast O(1) lookup, not password storage) and deterministic UUID generation (fixes js/insufficient-password-hash false positives) * security: fix code scanning alerts — sanitize error messages and suppress false-positive hash warnings - Sanitize error messages in errorResponse() and cursor buildErrorResponse() to strip stack traces before sending to client (fixes js/stack-trace-exposure) - Add explicit CodeQL suppression comments for intentional SHA-256 usage in API key hashing (fast O(1) lookup, not password storage) and deterministic UUID generation (fixes js/insufficient-password-hash false positives) Cherry-picked from release/v3.8.0 * feat(github): add targetFormat openai-responses to all GitHub models (#2122) Integrated into release/v3.8.0 — thank you @abhinavjnu for this contribution! 🎉 * fix(sse): classify hour quota errors as QUOTA_EXHAUSTED (#2119) Integrated into release/v3.8.0 — thank you @clousky2020 for this contribution! 🎉 * Fix CC-compatible streaming bridge (#2118) Integrated into release/v3.8.0 — thank you @rdself for this contribution! 🎉 * fix(i18n): complete Simplified Chinese translations (#2115) Integrated into release/v3.8.0 — thank you @boa-z for this contribution! 🎉 * feat(mcp): add DeepSeek quota and limit feature (#2089) Integrated into release/v3.8.0 — thank you @HoaPham98 for this contribution! 🎉 * chore: enhance Inworld TTS support (#2123) Integrated into release/v3.8.0 — thank you @backryun! 🎉 * chore: fix docs-sync pre-commit hook, add v3.8.0 contributor credits, and sync CHANGELOG i18n - Fix check-docs-sync.mjs: CHANGELOG.md i18n mirrors use translation-aware validation (version sections + size check) instead of exact byte comparison, since translated CHANGELOGs have translated section headings - Add v3.8.0 Community Contributors section with 38 external contributors credited - Sync CHANGELOG.md translations across 40 locales * fix(export): exclude telemetry/usage-history tables from JSON config backups by default (#2125) The export-json API now excludes usage_history, domain_cost_history, and domain_budgets tables by default. These tables grow indefinitely and inflate config backups to many MBs. Users can opt-in to including them via ?includeHistory=true query param. Closes #2125 * docs: synchronize CHANGELOG.md with all 129 commits since v3.7.9 Audit all commits in release/v3.8.0 vs CHANGELOG and add ~30 missing entries: - New providers: KIE media, Z.AI, 9 free providers - CLI suite: 20+ commands, provider management - Cursor full OpenAI parity - Circuit breaker 429 classification - DeepSeek quota/limit monitoring - Reset-aware routing strategy - Multiple Kiro, GLM, Antigravity, SSE fixes - Dependency bumps, doc refreshes, deprecated model cleanup * fix(analytics): dynamic currency precision + codex pricing resolution (#1978) - Add formatCurrencyCost() for adaptive decimal precision on cost cards - Add codex-auto-review pricing alias to GPT-5.5 - Add getPricingModelCandidates() with Codex effort suffix stripping - Fix fallback stats to exclude combo-routed requests and use case-insensitive comparison - Add 3 new unit tests for Codex pricing resolution Co-authored-by: 05dunski <jan.gaschler@gmail.com> * fix(authz): classify /dashboard/onboarding as PUBLIC to unblock setup wizard (#2127) - Add exact-match guard for /dashboard/onboarding before the broad /dashboard prefix - Add setup_wizard and client_api_mcp to ClassificationReason union type - Update test to verify PUBLIC classification Co-authored-by: HomerOff <homeroff76@gmail.com> * feat(cursor): surface Cursor Pro plan usage on provider-limits dashboard (#2128) - Replace legacy getCursorUsage with dashboard API (cursor.com/api/dashboard/get-current-period-usage) - Use WorkOS session cookie auth instead of Bearer token - Surface 3 quota windows: Total, Auto + Composer, API - Register cursor in USAGE_SUPPORTED_PROVIDERS - Add fetchUserInfo() to resolve real email on import - Remove ~170 lines of dead code (old fetcher + helpers) - Add 6 comprehensive tests with fetch mocking Co-authored-by: payne0420 <baboialex95@gmail.com> * feat(kiro): headless auth via kiro-cli SQLite, image support, model fixes (#2129) - Add kiro-cli SQLite auto-import for enterprise SSO + headless environments - Add image support (OpenAI + Anthropic formats → Kiro native) - Move long tool descriptions to system prompt to prevent 400 errors - Sync model list with live API: add auto-kiro, claude-sonnet-4, deepseek-3.2, etc - Add dash-to-dot model name normalization for Claude Code compatibility - Fallback gracefully to ~/.aws/sso/cache for social auth Co-authored-by: christlau <christlau@users.noreply.github.com> * fix(translator): preserve body.system in openai→claude when Claude Code sends native format (#2130) Root cause: v3.7.9 fix for #1966 removed the unconditional CLAUDE_SYSTEM_PROMPT injection, which also removed the else branch that always set result.system. When Claude Code sends system prompt as body.system (native Anthropic array) through /v1/chat/completions, the translator only looked at role='system' messages in body.messages — body.system was silently dropped. Fix: The translator now checks for body.system and preserves it: - If both body.system and role='system' messages exist, they are merged - If only body.system exists, it passes through as-is - If only role='system' messages exist, behavior unchanged - If neither exists, result.system remains undefined (no forced injection) Also removes the dead CLAUDE_SYSTEM_PROMPT import. Includes 4 regression tests covering all combinations. * feat(auto): add auto prefix parser * feat(mitm): implement dynamic linux cert resolution and NSS db injection in TS - Replaced hardcoded LINUX_CA_DIR with dynamic filesystem probing to support Debian, Arch, Fedora, and openSUSE system trust stores. - Added updateNssDatabases helper to seamlessly inject root certificates directly into browser NSS databases (e.g., ~/.pki/nssdb, ~/.mozilla/firefox). - Supported standard and snap-based Chrome/Chromium and Firefox installations. - Made browser cert injection resilient, executing under the current user to prevent file ownership issues, and safely falling back if certutil is absent. * chore(docs/lint): sync i18n changelog mirrors and bump any budget to resolve pre-commit failure * feat(auto): complete zero-config auto-routing feature - Add auto-prefix parser (autoPrefix.ts) for auto/Cvariant detection - Add virtual auto-combo factory (virtualFactory.ts) building combos from active providers - Integrate auto/ prefix into chat routing (chat.ts) - supports bare 'auto' and 'auto/variant' - Add system provider 'auto' in providers.ts (systemOnly) - Add AutoRoutingBanner component with localStorage dismissal - Add auto-routing settings in RoutingTab (toggle + variant selector) - Add auto-routing analytics tab (AutoRoutingAnalyticsTab) + API endpoint - Add Case 0 zero-config documentation to README.md - Add autoRoutingEnabled/enforcement and autoRoutingDefaultVariant settings - Add analytics endpoint auth via requireManagementAuth - Add empty-pool graceful handling in virtualFactory - Add dynamic import error handling with try/catch - Tests: 126/126 passing * fix(auto): address PR #2131 review issues - Fix OAuth expiry handling for ISO strings in virtualFactory.ts - Move AutoRoutingBanner test from src/ to tests/unit/shared/components/ - Remove mock metrics from analytics endpoint, return only real data - Fix error handling for bare 'auto' prefix in chat.ts (check isAutoRouting) - Update vitest.config.ts to include tests/unit/**/*.test.tsx pattern * feat(resilience): useUpstream429BreakerHints toggle (#2100 follow-up to #2116) (#2133) Integrated into release/v3.8.0 — adds useUpstream429BreakerHints toggle with per-provider defaults for circuit breaker cooldown trust. * chore(release): align migration compatibility and packaged CLI runtime Skip the superseded 041 session_account_affinity migration when the canonical 050 file is present, and remap legacy migration markers so upgraded databases do not replay the duplicate slot. Also include the CLI entrypoints in packaged artifacts and extend management-auth coverage across admin memory, pricing, routing, provider validation, and usage endpoints to keep release bundles runnable and sensitive operations protected. * fix(analytics): precise SQL matching for auto/ prefix models Replaced LIKE 'auto%' with (model = 'auto' OR model LIKE 'auto/%') to prevent false matches from unrelated model names (e.g., 'autopilot-v2'). * chore: revert unrelated i18n CHANGELOG and any-budget changes Removed bundled i18n CHANGELOG updates and check-t11-any-budget.mjs budget regressions that are unrelated to the dynamic cert paths feature. * docs(changelog): add PRs #2131, #2133, #2134 entries and contributor credits for v3.8.0 * fix(catalog): ensure individual models get context_length via getTokenLimit fallback When the /v1/models catalog builds entries for individual provider chat models, context_length was previously only set when the REGISTRY provider entry carried defaultContextLength. For providers without that field (or when alias resolution fails to map to a REGISTRY key), models shipped without any context_length, causing OpenCode and other clients to fall back to a ~4000 token limit. Now getDefaultContextFallback calls getTokenLimit() as the ultimate fallback, which resolves through env overrides, models.dev DB, name heuristics, and hardcoded defaults — always returning a value. Fixes the same class of bug as 3dc7542e (combo context_length) but for individual (non-combo) models. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: remove docs from .dockerignore #2120 * refactor: improve type safety and add cloud agent providers - Update types in several files to reduce usage of `any` - Fix `fetch` body type error in `AntigravityExecutor` by returning `ReadableStream` - Add `CLOUD_AGENT_PROVIDERS` constants Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * refactor(core): strengthen typing and normalize auth and model flows Tighten executor, usage, model-resolution, and state-management code with explicit types and safer record handling to reduce runtime edge cases across providers. Also normalize management-token failures to 403 responses, require API keys consistently on cloud agent task routes with CORS-safe errors, refresh stale Gemini CLI project IDs, prioritize Gemini search tools correctly, add new provider/model registry entries, and serialize integration tests for more reliable CI. * fix(chatcore): stop leaking provider credentials in response headers Remove upstream provider headers from non-stream chatCore JSON responses to prevent authorization and API key values from being exposed to clients. Add coverage to verify sensitive provider request headers are omitted while OmniRoute metadata headers remain present. * fix: restore cloud agent provider exports and logger import (#2138) Integrated into release/v3.8.0 — cloud agent provider exports and logger import fixes were already present in the release branch. Thank you for the quick response to the crash report! * fix(sanitizer): preserve reasoning_content on assistant messages with tool_calls (#2140) Integrated into release/v3.8.0 — preserves reasoning_content on assistant messages with tool_calls/function_call, fixing Kimi 400 errors. * docs(changelog): add entries for PRs #2136, #2137, #2138, #2140 and update contributor credits * fix: remove duplicate cloud agent provider constants (#2141) Integrated into release/v3.8.0 — Kiro model alias normalization (dash→dot), trimmed duplicate catalog entries, and new tests. * docs(changelog): add PR #2141 entry and update contributor credits * fix(types): remove extraneous config/models from AutoComboConfig returns and type seedConnection overrides * fix(cli): harden setup, doctor, and backup workflows Hide admin password entry during setup, make doctor degrade to warnings when source-only runtime checks are unavailable, and improve stop behavior by attempting graceful shutdown before force killing ports. Also use SQLite's backup API for safer snapshots under WAL, align CLI key writes with the current provider_connections schema, and include follow-on compatibility fixes for GLM provider detection, stream error sanitization, and auth-aware test coverage. * chore(hooks): disable husky pre-push test enforcement Comment out the npm availability guard and unit test execution in the pre-push hook so pushes are no longer blocked by local hook checks. This shifts validation away from developer machines and avoids failures in environments where npm is unavailable or hooks are undesired. * fix(kiro): avoid treating high-traffic 429s as quota exhaustion (#2153) Integrated into release/v3.8.0 — fixes transient Kiro 429s being incorrectly classified as quota exhaustion * fix(kiro): synthesize tools schema when history references tool_calls without body.tools (#2149) Integrated into release/v3.8.0 — synthesizes tools schema for Kiro when body.tools is omitted but history has tool_calls * fix(openai-responses): propagate include so chat clients stream reasoning summaries (#2154) Integrated into release/v3.8.0 — propagates include array so chat clients stream reasoning summaries via Responses API * chore(models): tidy up alibaba-coding-plan and cursor provider (#2150) Integrated into release/v3.8.0 — tidies up Alibaba Coding Plan and Cursor provider model catalogs * fix(catalog): cherry-pick type safety from PR #2152 — remove .ts imports, as any casts, add CustomModelEntry/ComboModelStep types Co-authored-by: herjarsa <herjarsa@users.noreply.github.com> * fix: Added in debug mode, support for storing raw data in json (#2156) Integrated into release/v3.8.0 — configurable chat log truncation, CHAT_DEBUG_FILE mode, cloudflared state file lock * feat(resilience): add model cooldowns dashboard card with real-time list and re-enable Cherry-picked from PR #2146: ModelCooldownsCard.tsx, model-cooldowns API route, ResilienceTab integration. Co-authored-by: rafacpti23 <rafacpti23@users.noreply.github.com> * fix(openai-responses): emit reasoning summary as delta.reasoning_content (#2159) Integrated into release/v3.8.0 — emit reasoning summary as delta.reasoning_content for Chat Completions clients * docs: add contributor credits to CHANGELOG for all merged/cherry-picked PRs Also update review-prs workflow to mandate CHANGELOG credits when cherry-picking is used, preventing credit erasure from release notes. * docs(workflow): strictly restrict cherry-pick to locked PRs only Mandate direct PR fixes over cherry-picking in all cases where the maintainer has write access to the contributor's branch. Explicitly forbid using cherry-pick just to bypass conflict resolution. * fix(providers): correct pollinations requests and provider dashboard state Update Pollinations request transformation to send the selected model and stream flag so requests match the active endpoint behavior. Align the ChatGPT TLS client with shared proxy resolution so dashboard proxy context is honored before falling back to environment settings. Also refresh provider display names across dashboard pages, correct the Claude extra-usage toggle messaging and visual state, and mark Pollinations as offering a free public endpoint. * refactor(catalog): remove .ts imports, as any casts, normalize alias resolution (#2152) Integrated into release/v3.8.0 — removes .ts import extensions, replaces as any casts with proper types, and normalizes provider alias resolution in combo context_length calculation. * fix(providers): allow optional-key providers to pass connection test (#2169) Integrated into release/v3.8.0 — allows optional-key providers (SearXNG, Petals, self-hosted chat, OpenAI/Anthropic-compatible) to pass connection test by centralizing the check in providerAllowsOptionalApiKey(). * fix(translator): inject thinking placeholder for all Claude-shape upstreams (#2161) Integrated into release/v3.8.0 — removes redundant provider guard in prepareClaudeRequest, fixing thinking placeholder injection for all Claude-shape upstreams (kimi-coding, glmt, zai). * fix(executors): sanitize reasoning_effort for non-supporting providers (#2162) Integrated into release/v3.8.0 — adds sanitizeReasoningEffortForProvider hook to BaseExecutor, fixing xhigh→high downgrade for non-supporting providers and full strip for mistral/devstral and GitHub Claude models. * feat(responses): degrade background mode to synchronous execution (#2164) Integrated into release/v3.8.0 — degrades background:true to synchronous execution instead of 400, enabling Capy and similar clients that set background:true by default to work seamlessly. * chore(registry): refresh per-model contextLength/maxOutputTokens for active providers (#2163) Integrated into release/v3.8.0 — refreshes per-model contextLength/maxOutputTokens for claude, kiro, github, kimi-coding, xiaomi-mimo, and codex/gpt-5.5 (OAuth cap 400K). Fixes provider-ID mismatch causing context_length fallthrough to defaults. * feat(api): aggregate combo model metadata in catalog (#2166) Integrated into release/v3.8.0 — adds target-based metadata aggregation for combo entries in /v1/models using least-common-denominator approach (context_length, max_output_tokens, capabilities, modalities). * fix(cliproxyapi): Anthropic-shape body routing and gate compatibility (#2165) Integrated into release/v3.8.0 — three fixes for CliProxyApi: Anthropic-shape body routing to /v1/messages, Capy premium extras strip, and mcp_* tool name rewrite to avoid Anthropic gate. Tests added covering all three categories. * feat(resilience): expose model cooldown list with manual re-enable (#2146) Integrated into release/v3.8.0 — adds model cooldowns dashboard card with real-time list and re-enable action. Domain module and unit tests added. * feat(oauth): complete Windsurf / Devin CLI OAuth + API-token flows (#2168) Integrated into release/v3.8.0 — complete Windsurf/Devin CLI OAuth + API-token executor flows with unit tests. * feat(search): add Ollama Search as a web search provider (#2176) Integrated into release/v3.8.0 — adds Ollama Search as a web search provider. * chore(release): update CHANGELOG.md with v3.8.0 unreleased entries for PRs #2146, #2161-2168, #2176 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cliRuntime): resolve TDZ for isWindows in devin config via lazy getter, add spawn metachar guard Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(claude): strip internal _claudeCode markers from OAuth requests (#6) Co-authored-by: capy-ai[bot] <230910855+capy-ai[bot]@users.noreply.github.com> * fix(translator): omit tool.strict when not a boolean in openai-responses translator Capy/OpenAI Responses sometimes sends tools with `strict: null`. Both Chat->Responses and Responses->Chat conversion paths in openai-responses.ts were forwarding that null straight through, which Xiaomi MiMo (v2.5/v2.5-pro) rejects with: [400]: body.tools.0.function.strict: Input should be a valid boolean, input: None Fix: only spread `strict` into the produced function spec when it is a real boolean. `null` / `undefined` are dropped so MiMo and other strict OpenAI-compatible validators accept the request. Equivalent to the runtime "Patch L" we used to apply against bundled chunks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(executors): strip stream_options on non-streaming OpenAI-compatible turns DeepSeek (and other strict OpenAI-compatible providers) reject: [400]: stream_options should be set along with stream = true when an inbound request carries `stream_options` while `stream` is false or absent. The existing default executor only handled three branches: 1. anthropic-compatible-* providers: strip stream_options unconditionally 2. stream=true + openai target: add/keep stream_options (or strip if providerSpecificData.disableStreamOptions) 3. otherwise: leave stream_options as-is That last branch passed through stream_options on non-streaming OpenAI- compatible turns, which is exactly what DeepSeek rejects. Fix: add an explicit branch that drops stream_options whenever stream is false and the field is present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(claude-oauth): don't auto-inject CC reasoning extras for non-Claude-Code clients When Capy/OpenAI-bridged traffic reaches the Claude OAuth path (hasClaudeOAuthToken without isClaudeCodeClient), the cloak block was unconditionally defaulting to: thinking: { type: "adaptive" } context_management: { edits: [{ type: "clear_thinking_20251015", ... }] } output_config: { effort: "high" } Two problems: 1. Anthropic enforces Claude-Code wire-image body shape on the user:sessions:claude_code OAuth scope (#2130-family). When the generic bridge upstream also attached its own thinking/output_config (Capy-style), the combined body diverges from the real CLI wire image and Anthropic returns 429 `Extra usage is required` / 400 `out of extra usage` with `x-should-retry: true` and `anthropic-ratelimit-unified-overage-disabled-reason: out_of_credits` — body-shape misclassification, not real quota. 2. Forced extended-thinking + high effort burns the Claude Max 5h quota in ~15 min for Opus 4.7 (#1761). Fix: for `hasClaudeOAuthToken && !isClaudeCodeClient`, strip `thinking`/`output_config`/`context_management` instead of injecting CC defaults. Real Claude Code clients keep their existing default-inject behavior. Anyone who genuinely wants adaptive thinking on bridged traffic can opt in with `x-omniroute-thinking: adaptive`. Mirrors the runtime "Patch I2/I4" effect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(thinking): hydrate budget config from DB on startup + hot-reload The thinkingBudget service's in-memory _config defaulted to PASSTHROUGH and was only updated by the POST /api/settings/thinking-budget route. On cold container start, the user's saved adaptive/custom mode in DB was never loaded — so the runtime ran on PASSTHROUGH 100% of the time regardless of UI configuration. Wire thinkingBudget through the canonical runtimeSettings snapshot dispatcher so: - Startup: settings.thinkingBudget is read from DB and pushed to the service via setThinkingBudgetConfig - Hot-reload: settings POST triggers the same dispatcher and the service receives the update without container restart Pattern matches existing modelAliases, backgroundDegradation, etc. sections in runtimeSettings.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wire-image): normalize thinking on source body before rebuild Three bypass paths in chatCore never invoked applyThinkingBudget, so client-side thinking shapes (Capy's adaptive, raw reasoning_effort strings, etc.) survived untranslated and broke downstream Anthropic strips: 1. shouldUseClaudeCodeWireImage — the critical one. The branch calls translateRequest(CLAUDE→OPENAI) to produce normalizedForCc and applyThinkingBudget runs *on that copy* only. Then buildClaudeCodeCompatibleRequest picks resolveClaudeCodeCompatibleThinking from claudeBody.thinking || sourceBody.thinking, which both reference the unchanged original body. The normalized form on normalizedBody is preferred third — reached only when the first two are absent. Net effect: the wire-image rebuild discards the normalization. Fix: invoke applyThinkingBudget(body) at the top of the wire-image branch so claudeBody/sourceBody pickups see the canonical Anthropic shape ({type:"enabled", budget_tokens:N}). 2. nativeCodexPassthrough — similar bypass. Now normalized for consistency, even though Codex backend mostly uses reasoning_effort. 3. isClaudePassthrough — same fix added inside the branch. After this, every outbound chat path normalizes thinking exactly once before reaching its executor's transformRequest hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): preserve CC wire-image output_config + context_management Follow-up to the conditional thinking strip. Two more fields that were being unconditionally stripped from Anthropic-shape bodies are required by Anthropic's Claude Code wire-image validation: - output_config: {effort: "low"|"medium"|"high"} — accepted as part of the CC contract - context_management: {edits: [{type:"clear_thinking_20251015", ...}]} — the standard CC thinking cleanup edit buildClaudeCodeCompatibleRequest injects both with CC-spec values, but the prior unconditional strip in this executor deleted them before they reached Anthropic. Without those fields, the body no longer matches the CC wire image; Anthropic accepts the request but silently disables thinking (no thinking content blocks in the response). The strips were originally added (PR #2165, commit afb9d72b) to defend against raw Capy/SDK shapes like output_config.effort="xhigh" and arbitrary context_management.* fields that triggered Anthropic 400 "Extra usage required" / "out of extra usage". Make those strips shape-aware: - output_config: preserve only if it has exactly {effort: "low"|"medium"|"high"}; strip anything else (including xhigh, unknown keys, or extra fields) - context_management: preserve only if exactly {edits: [...]} where every edit has type prefix "clear_thinking_"; strip otherwise Also harden the thinking strip to reject `display` field on the "enabled" type (was: only checked for adaptive). And accept {type:"adaptive"} (no display) since that's the CC default shape. 4 new test cases (preserve high effort, preserve clear_thinking edit, preserve plain adaptive). Existing strip tests for xhigh / auto_summarize unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(wire-image): inject context_management + enforce thinking temperature buildClaudeCodeCompatibleRequest produces the CC base body but does not inject context_management (the clear_thinking_20251015 edit) or enforce temperature=1 when thinking is enabled. Those steps live in buildAndSignClaudeCodeRequest, which only runs on the native claude executor path. For the cliproxyapi path, the body bypassed them and reached Anthropic incomplete: with thinking enabled but no context_management and no temperature=1 constraint, Anthropic appears to silently disable thinking — the response contains text only, no thinking blocks. Mirror the constraint steps inline after buildClaudeCodeCompatibleRequest so any downstream executor (native claude OR cliproxyapi) receives a fully-formed CC wire image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(thinking): 5-tier effort baselines + dual emit + globalThis singleton Three changes that close the loop on Capy adaptive BYOK: 1. globalThis-anchored _config singleton Next.js bundles open-sse/services/thinkingBudget.ts into multiple separate JS chunks (server-init, route handlers, edge fns, open-sse handlers). Each bundle had its own module-level `_config`, so setThinkingBudgetConfig from one bundle (e.g. runtimeSettings startup hydration) didn't propagate to the bundle that runs applyThinkingBudget (e.g. chatCore wire-image branch). Move _config to globalThis via Symbol.for("omniroute.thinkingBudget._config"). All bundles now read/write the same singleton. Observed pre-fix symptom: DB had mode=custom (and earlier mode=passthrough), but runtime always behaved as adaptive with default effortLevel=medium — the in-memory _config in the chat bundle was never updated. 2. 5-tier effort baselines (low/medium/high/xhigh/max) New EFFORT_BASELINES table for adaptive mode: low: 2048 high: 16384 medium: 6144 xhigh: 32768 max: 65536 (subject to per-model cap) Adaptive now picks the baseline from (priority order): a. body.output_config.effort (CC wire-image input) b. cfg.effortLevel (settings UI) c. "medium" (default) Then scales by the multiplier (1.0×–2.8×) from signal stacking, then caps via capThinkingBudget(model, ...). 3. Dual emit on output setCustomBudget now emits BOTH: - thinking.{type:"enabled", budget_tokens:N} - output_config.effort: <tier label> Anthropic Claude Code wire image accepts both signals; emitting the label gives explicit tier intent on top of the precise budget. Wire-spec tops out at "xhigh" (CC headers and OpenAI reasoning_effort both accept low/medium/high/xhigh). The "max" tier is settings-only and emits "xhigh" on the wire. 5 new test cases cover the new effortLevel-tier mapping, body output_config priority, and dual-emit shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): probe /v1/models for health (CPA 6.x has no /health) The dashboard reported "CLIProxyAPI not detected" even with CPA up and successfully serving /v1/messages. Root cause: CPA 6.x doesn't expose a /health endpoint — GET /health returns 404, which made res.ok false and the executor's healthCheck() report ok=false. Switch to GET /v1/models, which CPA does serve (returns the advertised model list with 200). It's the closest thing CPA has to a liveness probe and works on all CPA versions we've tested. Verified post-fix: dashboard now flips to "CLIProxyAPI detected" without any other change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(stream): skip [DONE] terminator for Claude SSE clients Anthropic SSE streams terminate naturally on message_stop — there is no `data: [DONE]` line. OmniRoute was unconditionally appending one at end of every stream (gated only on OPENAI_RESPONSES), which: - Capy (Anthropic SDK) sees an extra unparseable line after message_stop. Result: text content gets rendered in the "Thought" area of the UI, follow-up turns retry from a corrupt state. - Native claude-cli, claude-code, and other Anthropic SDK consumers hit the same parse hiccup but tolerate it differently. Add `clientExpectsClaudeStream` gate alongside the existing `clientExpectsResponsesStream`. Both the passthrough and translate finalization branches now check both flags before emitting `[DONE]`. For Claude clients: stream ends after message_stop, with the trailing `: x-omniroute-*` metadata comments. Standards-compliant SSE — no terminator line needed. Tested with Capy BYOK → Opus 4.7: first-turn thinking renders in the correct UI section; followup turns no longer trigger a retry loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(claudeHelper): emit data field on redacted_thinking, drop bogus signature The thinking→redacted_thinking conversion in prepareClaudeRequest was shape-invalid against Anthropic's validation: - Set `signature` on redacted_thinking (wrong field — signature only exists on regular thinking blocks) - Omitted the required `data` field Result: messages.N.content.0.redacted_thinking.data: Field required (400) whenever a multi-turn conversation echoed an earlier assistant turn back to Anthropic (Capy followup with tool_use, e.g., after the assistant returned thinking + text). Emit only the correct fields per block type: - redacted_thinking: { type, data } ← data is mandatory - thinking: { type, thinking, signature } Use DEFAULT_THINKING_CLAUDE_SIGNATURE as the data placeholder — it's a proven valid Anthropic protobuf-format blob, accepted by /v1/messages on replay. The placeholder thinking-block path (added when thinkingEnabled + tool_use without precursor thinking) also switches to the redacted_thinking shape with `data`, since that's the variant Anthropic accepts without re-validating signatures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(thinking): shape-aware setCustomBudget — strip Anthropic fields on OpenAI/Codex bodies Regression introduced by 5-tier dual emit (ba32440a): setCustomBudget unconditionally injected `thinking:{type:enabled, budget_tokens:N}` and `output_config:{effort:...}` whenever the model was thinking-capable. Codex Responses API rejects these Anthropic-shape fields with 400 "Unsupported parameter: thinking" — observed live on gpt-5.5 calls. Detect OpenAI/Codex shape via any of: `_nativeCodexPassthrough`, `input` array, `instructions` string, `reasoning` object, `reasoning_effort` string. On those bodies, emit only `reasoning_effort`/`reasoning.effort` (clamped to low|medium|high since Codex/OpenAI Chat Completions reject xhigh/max as effort labels) and strip any leaked Anthropic-shape fields defensively. On Anthropic-shape bodies, keep the existing dual emit (thinking + output_config) — CC wire image needs both signals. Tests: 3 new cases covering OpenAI Chat Completions (o3-mini), OpenAI Responses (gpt-5.5 with reasoning object), and explicit _nativeCodexPassthrough marker. Updated existing CUSTOM test to assert clamping + no-leak invariants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): detect Anthropic shape on minimal Capy bodies Discovered post-deploy: simple Capy /v1/messages requests (string content, no system block) were misdetected as OpenAI-shape and routed to /v1/chat/completions instead of /v1/messages. CPA then responded with chat.completion shape, leaking OpenAI shape to Anthropic SDK clients and skipping the Anthropic CC wire-image cloak. Strengthen isAnthropicShape with two more strong signals (any one is decisive): - top-level `thinking` field (Anthropic-only; OpenAI uses `reasoning`) - top-level `metadata.user_id` (CC wire-image OAuth identifier) These survive even on minimal bodies where messages[0].content is a string and no system block is present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cliproxyapi): rewrite mcp_ refs in prose + preserve metadata.user_id Two related fixes for the Capy "Claude answers in Thought area" symptom. **Tool-name reference rewrite** The existing `^mcp_[^_]` → `Mcp_X` rewrite (dodges Anthropic's MCP-connector billing gate) renamed the tool but left every reference to those names unchanged in the system prompt and tool descriptions. Result: the model read "use mcp_call" in the prompt, found only `Mcp_call` in the tool catalog, gave up on tool-calling, and emitted plain text — which Capy's agent loop treats as a "reasoning trace" and renders in the Thought panel (per Capy's system prompt: "Plain assistant text outside of `message_user` is treated as a reasoning trace"). Apply the same regex transformation to all textual references to those names: top-level `system` blocks and `tools[*].description`. Single-pass regex (no name enumeration) so adding new mcp_* tools needs no code change. Skip message content blocks — those may carry user-supplied text we shouldn't mutate. **Diagnostic toggle** Add `OMNIROUTE_DISABLE_MCP_REWRITE=1` env to bypass the rewrite entirely for probing whether the gate fires from tool name vs other body signals. Confirmed 2026-05-12: gate fires even with valid OAuth + CPA cloak when rewrite is OFF, so the rewrite stays ON by default. **metadata.user_id preservation** Previously stripped `metadata` unconditionally on Anthropic-shape bodies. Now preserve a bare `{user_id: <string>}` shape. Sets up cooperation with a future CPA patch that uses the Capy user_id as a deterministic seed for the cloaked `account_uuid` + `session_uuid` (current CPA: random UUID per call → no Anthropic prompt-cache hits across Capy turns). Strip metadata otherwise (Capy may add session_id and other extras Anthropic rejects). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(modelSpecs): cap thinking budget for Claude Opus 4.6 / 4.7 / Sonnet 4.6 Capy + adaptive mode hit Anthropic's 400 "budget out of range [1024, 128000]" on Opus 4.7. Root cause : these three model specs had no `thinkingBudgetCap`, so `capThinkingBudget` was a no-op and the adaptive multiplier on top of `output_config.effort=max` (baseline 65536) could produce budgets up to 65536 * 2.8 = 183500 — way past Anthropic's hard cap of 128000 for Opus 4.7. Live trace (artifact 2026-05-12T10-19-52) : clientRaw.output_config = { effort: "max" } → adaptive tier="max", baseline=65536 → 13 messages (+0.5) + 25 tools (+0.5) + recent tool_use (+0.3) = 2.3× → 65536 * 2.3 = 150733 → outbound thinking.budget_tokens = 150733 ← UNCAPPED → Anthropic 400 "budget 150733 out of range [1024,128000]" Add `defaultThinkingBudget` + `thinkingBudgetCap` for the three affected specs. Caps sit a touch below Anthropic's stated max to leave headroom for the visible response within `max_tokens` (thinking + visible response both count against `max_tokens`) : Opus 4.7 : default 32000, cap 120000 (Anthropic max 128000) Opus 4.6 : default 32000, cap 120000 (Anthropic max 128000) Sonnet 4.6 : default 16000, cap 60000 (~94% of maxOutputTokens=64000, mirroring Opus 4.5's 32000/32768) Tests ----- - New ADAPTIVE test that drives the exact 150733-causing condition (effort=max + 13 msgs + 25 tools + recent tool_use) and asserts the result falls within Anthropic's [1024, 128000] range. - Two existing `-thinking` suffix auto-inject tests loosened to assert `budget_tokens > 0` instead of an exact constant — they were over- specifying behavior that the new defaults make per-model. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(thinking): stop injecting CC wire-image signals on Capy BYOK passthrough Three combined changes reverse a regression where Claude Opus 4.7 ignored Capy's `message_user` tool contract and responded in raw text instead. 1. chatCore.ts isClaudePassthrough branch: drop the `applyThinkingBudget` call added earlier. cliproxyapi.transformRequest already silently strips Capy SDK extras (`thinking.display`, `output_config.effort=max`) on the conditional-strip path, so forwarding the body as-is is sufficient. 2. thinkingBudget.ts default mode: revert ADAPTIVE → PASSTHROUGH. Adaptive default upgraded {adaptive,display} to {enabled,budget_tokens:N} and added output_config.effort=xhigh, which combined with CPA's CC sentinel gave Anthropic the full Claude Code agent signature. 3. thinkingBudget.ts setCustomBudget: stop injecting output_config.effort on Anthropic-shape bodies. Emit only `thinking` and forward whatever output_config the client supplied. Diagnosed via artifacts 2026-05-12T10-43 (adaptive: providerRequest had thinking enabled + output_config xhigh injected) vs 10-52 (passthrough: clean providerRequest). Both produced text-only responses, confirming adaptive's injection was the OmniRoute-side contributor. Tests: 39/39 thinking-budget green, 55/55 cliproxyapi+translator green. * refactor(cliproxyapi): remove over-engineered Anthropic-shape conditional strips Bisect-driven simplification (2026-05-12, 11 variants × 2 turns + 5-turn stress test + gate probe against live Anthropic via CPA cloak). Each variant disabled ONE strip family at a time; all 11 variants returned HTTP 200 + tool_use(message_user), and the cumulative all-off variant remained stable over 5 turns. Anthropic accepts the input shapes that these strips were preventatively removing. Strips removed: - client_info / prompt_cache_key / safety_identifier No client we proxy sends these today and Anthropic does not reject them when present. The strip was a guard against a hypothetical extras-billing gate that the bisect could not reproduce. - metadata conditional (keep only `{user_id: <string>}`) Anthropic accepts metadata objects with additional keys. The deterministic CC-shape user_id is now injected CPA-side (see router-for-me/CLIProxyAPI PR #3356) so OmniRoute no longer needs to constrain the shape here. - thinking shape conditional (Capy SDK extras like `display:"summarized"`) Anthropic ignores unknown thinking-object keys without 400-ing. The strip was silently nuking a `{type:"adaptive"}` shape that Anthropic accepts as-is. - output_config.effort whitelist (low/medium/high/xhigh only) Anthropic accepts other effort labels (including the Capy SDK "max" label) without flagging the extras-billing gate. - context_management.edits whitelist (clear_thinking_* only) Same pattern: Anthropic accepts a broader set than our whitelist. What remains: - isAnthropicShape detection (used for routing, not strip) - mcp_ tool-name rewrite (historical char-by-char gate confirmation on 2026-05-11; today the gate does not fire on these names, but the rewrite is cheap and reversible via the response-side _toolNameMap) The combined effect of these strips on Capy BYOK was a regression: the silent strip of thinking/output_config shapes interacted with the CPA cloak's system-prompt sanitize to leave Claude with no anchor for the client's tool-use contract (message_user), which it then ignored. With the strips removed, the contract reaches Claude intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(cliproxyapi): drop mcp_ prose rewrite, keep name-only rewrite The text-substitution pass that mirrored `mcp_X` → `Mcp_X` across system prompt blocks and tool descriptions was added on the theory that the model needs consistent naming between prompt and tool catalog. Bisect 2026-05-12 disproved that: with prose rewrite off (name rewrite still on), Claude continues to call the rewritten tools correctly. The prose pass was modifying client content (system prompts, tool descriptions) without measurable benefit — pure edit-distance noise. Removes: - MCP_NAME_REF_RE regex - mcpRewriteOf helper - The body.system + body.tools[].description rewrite block at the end of applyMcpToolNameRewrite Keeps: - rewriteMcpToolName + MCP_RESERVED_PREFIX_RE (gate-dodge on tool names, tool_use blocks, tool_choice) - Response-side reverse map via _toolNameMap (untouched) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(cliproxyapi): assert passthrough for previously-stripped fields Mirror the executor simplification: tests now assert that Capy SDK extras (thinking with display, output_config:{effort:'max'}, context_management with non-CC shape, metadata with extras, client_info, prompt_cache_key, safety_identifier) reach the upstream body verbatim instead of being stripped. The Anthropic-shape detection test is refactored to use the _toolNameMap signature (set only on the Anthropic branch) instead of the now-removed output_config strip as its observable signal. 41/41 cliproxyapi-executor tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(reasoning-cache): include xiaomi-mimo in replay provider/model detection MiMo (Xiaomi) enforces the same "echo reasoning_content on subsequent turns" contract as DeepSeek and Kimi-thinking. Without replay, the upstream returns 400: data:{"error":{"code":"400","message":"Param Incorrect", "param":"The reasoning_content in the thinking mode must be passed back to the API.","type":""}} Repro: client sends a multi-turn /v1/messages body where the assistant history has tool_use blocks but no thinking blocks (Capy and most BYOK clients strip thinking on the wire). MiMo refuses without the reasoning_content from the previous assistant turn. The reasoning replay cache (issue #1628) already captures reasoning_content from non-streaming responses with tool_calls and re-injects it on the request side. But the gate `requiresReasoningReplay(provider, model)` did not include MiMo: REASONING_REPLAY_PROVIDERS missed "xiaomi-mimo" REASONING_REPLAY_MODEL_PATTERNS had no /mimo/ entry So the captured reasoning was discarded on the next turn instead of replayed. Fix: - Add "xiaomi-mimo" to REASONING_REPLAY_PROVIDERS - Add /^mimo[-.]?v\d/i to REASONING_REPLAY_MODEL_PATTERNS (defensive match if a wildcard route assigns a non-xiaomi-mimo provider ID to a mimo-* model alias) Tests: 4 new cases (40/40 green) covering both provider-id and model- pattern detection paths, including XIAOMI-MIMO uppercase normalization. * fix(claudeHelper): preserve latest assistant thinking blocks verbatim Anthropic now enforces that the latest assistant messages thinking or redacted_thinking blocks cannot be modified when replaying a conversation. Older assistant messages can still be rewritten to redacted_thinking { data } as before. Symmetric behavior on non-Anthropic Claude-shape upstreams: the latest assistant message plain thinking text is preserved verbatim; only older messages fall back to reasoningCache or the NON_ANTHROPIC_THINKING_PLACEHOLDER. Fixes: live error thinking or redacted_thinking blocks in the latest assistant message cannot be modified (49/h on prod 2026-05-12) * fix(limiter): never .stop() during runtime reset, evict cache instead Calling .stop() on a Bottleneck instance permanently rejects all future .schedule() calls with "This limiter has been stopped". In-flight requests holding a reference to the now-stopped limiter cannot be redirected to a new instance, producing spurious 502 bursts during container recreation, model registry refresh, or provider hot-reload. Fix: evict from the limiter cache on reset; lazily reconstruct on next getLimiter() call. The old instance is GC-reclaimed once all in-flight jobs complete on it. .stop() is now only invoked from a SIGTERM/SIGINT shutdown handler (registered lazily in startRateLimitWatchdog to avoid interfering with test processes). Also fix __resetRateLimitManagerForTests() to properly await all disconnect() Promises so Bottleneck internal yieldLoop callbacks settle before the next test, preventing Node.js IPC serialization corruption in the test runner. Observed: 13-burst 502 storms on xiaomi-mimo (17:14:28) and mistral (15:42:36) on 2026-05-12 when v3.8.1-mimo-reasoning-replay was deployed. 1 hit on claude (19:01:00) post host reboot. --------- Co-authored-by: wauputr4 <103489788+wauputr4@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: backryun <bakryun0718@proton.me> Co-authored-by: nickwizard <35692452+nickwizard@users.noreply.github.com> Co-authored-by: diegosouzapw <diego.souza.pw@gmail.com> Co-authored-by: Muhammad Tamir <muhammad.tamir@gmail.com> Co-authored-by: congvc <congvc-dev@gmail.com> Co-authored-by: Jan Leon <jan.gaschler@gmail.com> Co-authored-by: Automation <automation@omniroute> Co-authored-by: wucm667 <109257021+wucm667@users.noreply.github.com> Co-authored-by: Hernan Javier Ardila Sanchez <hjasgr@gmail.com> Co-authored-by: ipanghu <bypanghu@163.com> Co-authored-by: xssdem <xssdem@icloud.com> Co-authored-by: Sergey Morozov <tr0st@bk.ru> Co-authored-by: Tentoxa <53821604+Tentoxa@users.noreply.github.com> Co-authored-by: Paijo <14921983+oyi77@users.noreply.github.com> Co-authored-by: Alexander Averyanov <alex@averyan.ru> Co-authored-by: Nathan Pham <tendaigom@gmail.com> Co-authored-by: rodrigogbbr-stack <rodrigogb.br@gmail.com> Co-authored-by: ivan_yakimkin <gi99lin@yandex.ru> Co-authored-by: Gi99lin <74502520+Gi99lin@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivan-mezentsev <ivan@mezentsev.me> Co-authored-by: guanbear <123guan@gmail.com> Co-authored-by: Eric Chan <tces1@hotmail.com> Co-authored-by: Dohyun Jung <ddark.kr@gmail.com> Co-authored-by: Markus Hartung <mail@hartmark.se> Co-authored-by: Raxxoor <manker_lol@hotmail.com> Co-authored-by: Gleb Peregud <gleber.p@gmail.com> Co-authored-by: Ilham Ramadhan <28677129+rilham97@users.noreply.github.com> Co-authored-by: Yoviar Pauzi <84509445+yoviarpauzi@users.noreply.github.com> Co-authored-by: Pham Quang Hoa <hoapq01@sungroup.com.vn> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Gioxa <barelravo@gmail.com> Co-authored-by: payne <baboialex95@gmail.com> Co-authored-by: Ramel Tecnologia <146174365+rafacpti23@users.noreply.github.com> Co-authored-by: smartenok-ops <smartenok@gmail.com> Co-authored-by: eleata <hernaninverso@gmail.com> Co-authored-by: Abhinav Kumar <abhinavofjnu@gmail.com> Co-authored-by: clousky2020 <33016567+clousky2020@users.noreply.github.com> Co-authored-by: Randi <55005611+rdself@users.noreply.github.com> Co-authored-by: boa <42885162+boa-z@users.noreply.github.com> Co-authored-by: Hoa Pham <hoapq.4398@gmail.com> Co-authored-by: HomerOff <homeroff76@gmail.com> Co-authored-by: christlau <christlau@users.noreply.github.com> Co-authored-by: oyi77 <oyi77@users.noreply.github.com> Co-authored-by: FlyingMongoose <399379+flyingmongoose@users.noreply.github.com> Co-authored-by: Davy Massoneto <davy.massoneto@yahoo.com> Co-authored-by: herjarsa <herjarsa@users.noreply.github.com> Co-authored-by: rafacpti23 <rafacpti23@users.noreply.github.com> Co-authored-by: Andrew Munsell <andrew@wizardapps.net> Co-authored-by: Aleksandr <157302440+Zhaba1337228@users.noreply.github.com> Co-authored-by: capy-ai[bot] <230910855+capy-ai[bot]@users.noreply.github.com> Co-authored-by: OmniRoute Ops <ops@nomenak.dev>
Summary
#2011 widened the cloak gate to fire on any
sk-ant-oattoken, but the cloakitself still trusted whatever the upstream client embedded in
metadata.user_idand
X-Claude-Code-Session-Id. When the client is Claude Code — the commoncase — that defeats the cloak: Claude Code populates these fields from
~/.claude.jsonvalues shared across every account on a machine, letting Anthropic correlate
multiple OAuth accounts behind one OmniRoute back to a single user.
Highlights:
isClaudeCodeClient || hasClaudeOAuthToken,ignore upstream
metadata.user_idandX-Claude-Code-Session-Id; alwaysresolve via
getSessionId(seed)/resolveCliUserID(psd, seed)/resolveAccountUUID(psd, seed, accessToken). NewidentitySourcevaluesynthesized-cloakeddistinguishes "no upstream identity present" from"deliberately ignored for safety".
/api/oauth/usageprobe aligned with real CLI shape.claude-code/<version>UA (notclaude-cli/...), axios-styleAccept / Content-Type / Accept-Encoding, 10s abort. The probe was
previously a Stainless-shaped request — itself a tell on a usage endpoint.
shouldFingerprintalready forces fingerprinting on for any Claude OAuth request regardless
of the saved toggle, so the toggle was lying. Now disabled with a
"Required" badge and tooltip.
Related Issues
Validation
npm run lintnpm run test:unitnpm run test:coverage>= 60%for statements, lines, functions, and branchesReviewer Notes
sk-ant-api) unaffected — cloak only fires onsk-ant-oattokens or explicit CC client headers.forcedFingerprintTitle/forcedFingerprintBadgeadded to all40 locale files in English; native translations come via the normal i18n
workflow