Fix: ChatEngine never actually throws 503 — every routing failure becomes 404#7
Draft
mimeding wants to merge 2 commits into
Draft
Fix: ChatEngine never actually throws 503 — every routing failure becomes 404#7mimeding wants to merge 2 commits into
mimeding wants to merge 2 commits into
Conversation
ChatEngine.EngineError already declared two cases that map to different HTTP statuses: case modelNotFound -> 404 case noServiceAvailable -> 503 …but the producer (the two switch sites in ChatEngine on ModelServiceRouter.resolve and the one in CoreModelService) only ever threw modelNotFound. The 503 case was dead code, and every routing failure — including ones where a service that handles the model existed but reported isAvailable() == false — surfaced as 404. This contradicts the audit's expectation set by PR osaurus-ai#863 / issue osaurus-ai#858: the WorkView error classifier and external API consumers use the HTTP status code to give users actionable feedback. 'install the model' vs 'service unavailable, retry' is exactly the distinction we couldn't make. Extend ModelRoute with a third case, .unavailable, that the router returns when at least one candidate service answers handles(model:) == true but every such service answers isAvailable() == false. The two ChatEngine switch sites now throw .noServiceAvailable for .unavailable, so the existing 503 path actually fires. CoreModelService collapses both to its existing modelUnavailable error since its public API doesn't currently distinguish them (callers needing the distinction go through ChatEngine). Limitation worth calling out: the router only sees the connected remote-provider list (callers pass connectedServices() at the call site). A configured-but-disconnected remote provider is therefore not visible to the router, and its absence still resolves to .none (404). Fixing that requires RemoteProviderManager to expose the configured- but-disconnected services list, which is a separate change. New focused unit tests cover all three outcomes (service / unavailable / none) plus the multi-service-some-unavailable case (router prefers the available service). Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
9 tasks
ModelManager.init kicks off an unstructured Task that calls loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization listing from Hugging Face and feeds the result through applyOsaurusOrgFetch. The unit-test runner repeatedly constructs ModelManager() to drive applyOsaurusOrgFetch directly. The background launch-time fetch races with those test calls — whichever finishes last wins, and the merge result is non-deterministic. That's the root cause of the flaky ModelManagerSuggestedTests failures seen across many of the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.). Gate the launch-time fetch on a small isRunningInTestEnvironment helper that checks for any of XCTestConfigurationFilePath, XCTestBundlePath, or XCTestSessionIdentifier in the process environment. Those variables are only present inside an xctest host process; production app launches still get the HF fetch exactly as before. This is a network call, so removing it under tests also has the side benefit of making the test suite work offline / on hermetic CI runners. Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why this matters (business)
The error model in
ChatEnginewas already designed with two distinct outcomes — "you asked for a model nobody has" (404) and "we know how to serve this model, but the backend isn't healthy right now" (503). Client SDKs, the WorkView error classifier, and any future status-aware UI rely on those codes to give the right next action ("install the model" vs "retry in a moment" vs "check your network").In practice, the 503 branch was unreachable code. Every routing failure — even ones where Foundation Models is genuinely available on the user's Mac but
isAvailable()is reporting false at this moment — surfaced as 404. Users see "Model 'foundation' is not installed or registered with any provider" and reach for the model picker when they should just be retrying.What's wrong (technical)
ChatEngine.EngineErrordeclares both cases:…but the
ModelServiceRouter.resolveit consumes only returns.serviceor.none. The.nonebranch inChatEnginealways throws.modelNotFound:So
.noServiceAvailablewas dead code: it existed but nothing ever produced it.Fix
Three small changes, one new test file:
ModelServiceRouter.resolvenow returns one of three routes:.service— at least one candidate handles the model and reportsisAvailable()..unavailable(requestedModel:)— at least one candidatehandles(model:)but every such service answersisAvailable() == false..none— no candidate handles the model at all.Both
ChatEngineswitch sites (streaming and non-streaming) now throwEngineError(.noServiceAvailable)for.unavailable.noServiceAvailable.httpStatusis already wired to 503, and PR Fix: extend ChatEngine HTTP-status mapping to Anthropic /messages and Open Responses /responses #6 propagates that to the wire response on the Anthropic / Open Responses endpoints;/v1/chat/completionsalready used it.CoreModelServicecollapses.unavailableand.noneto its existingCoreModelError.modelUnavailable. The publicCoreModelErrorAPI doesn't currently distinguish them, and the API-facing distinction is already covered viaChatEngine.New focused unit tests (
ModelServiceRouterTests.swift) cover:.servicereturned when the handler is available..nonereturned when no service claims the model..unavailablereturned when a handler exists but reports unavailable.Known limitation (called out in the commit message and in
resolve's docstring)This fixes the case where a local service reports
!isAvailable(). It does not yet fix the case where a remote provider is configured but disconnected, because callers (e.g.ChatEngine) passRemoteProviderManager.shared.connectedServices()— disconnected remotes aren't in the array at all, so the router never sees them. Distinguishing that case would needRemoteProviderManagerto expose a configured-but-disconnected list, which is a separate, larger change.Changes
.service/.noneconsumers behave identically)ModelServiceRouterTests)Test Plan
cd Packages/OsaurusCore && swift test --filter ModelServiceRouterTestsshould pass./v1/chat/completionswith"model":"foundation". Expected: 503 +noServiceAvailableerror body. Previously: 404 +modelNotFound."model":"zzz"). Expected: 404 +modelNotFound. Unchanged.Checklist
CONTRIBUTING.mdswiftc -frontend -parse)