Fix flaky ModelManagerSuggestedTests: skip launch-time HF fetch under xctest by mimeding · Pull Request #16 · mimeding/osaurus

mimeding · 2026-05-27T05:09:15Z

Summary

Why this matters (business)

ModelManagerSuggestedTests has been flaking in CI for a while. Looking at the recent CI runs across the new audit PRs (#2, #6, #7, #13, #14, #15) the same handful of test methods fail intermittently:

applyOsaurusOrgFetch_dropsStaleAutoFetchedOnReapply
applyOsaurusOrgFetch_addsNewEntriesAfterCurated

When CI is flaky, every red PR triggers a "is this me?" investigation, retry loops fill the queue, and people lose trust in the signal. The fix is one line of intent (gate a launch-time side effect on the test environment) and removes a per-run network call from the test suite.

What's wrong (technical)

ModelManager.init() kicks off an unstructured Task that fetches the OsaurusAI organization listing from Hugging Face and merges the result into suggestedModels:

        // Pull the OsaurusAI HF org listing once on launch so newly published
        // models surface in the Recommended tab without requiring a code push.
        Task { [weak self] in await self?.loadOsaurusAIOrgModels() }

loadOsaurusAIOrgModels() eventually calls back into the same applyOsaurusOrgFetch the tests are exercising directly:

    func loadOsaurusAIOrgModels() async { ... applyOsaurusOrgFetch(autoFetched: autoFetched) }

    func applyOsaurusOrgFetch(autoFetched: [MLXModel]) {
        ...
        suggestedModels = merged
        ...
    }

Test flow:

let manager = await MainActor.run { ModelManager() } — init schedules the background HF fetch.
Test calls applyOsaurusOrgFetch(autoFetched: [stale]), then applyOsaurusOrgFetch(autoFetched: [kept]).
Somewhere in there the background loadOsaurusAIOrgModels() resolves and calls applyOsaurusOrgFetch(autoFetched: <HF results>).
Whichever assignment to suggestedModels lands last wins. If the HF task lands after the test's kept call, the test reads back HF data instead of kept and the #expect(after.contains { $0.id == kept.id }) assertion fails.

This is exactly the failure pattern the xcresult bundles show across the affected PRs.

Fix

Gate the launch-time fetch on a tiny isRunningInTestEnvironment helper that checks for any of XCTestConfigurationFilePath, XCTestBundlePath, or XCTestSessionIdentifier in the process environment. Those variables are only present inside an xctest host process. The production app launch path is byte-identical to before; only the test host skips the network fetch.

Side benefit: removing a network call from the test suite means the suite works offline / on hermetic runners without depending on huggingface.co being reachable.

Verifying I'm reading the failure right

The xcresult bundles for the affected PR runs all show one of the two same-suite tests as the only failure, with messages like:

Expectation failed: after.contains { $0.id == kept.id }

Same suite, no other changes between runs that pass and runs that fail in that suite, no relationship to the source changes on the affected PRs (which touched HTTP handler, ToolSearchService, Insights, plugin auth, etc. — nothing in ModelManager).

Changes

Behavior change (xctest hosts only — production app launch is unchanged)
UI change
Refactor / chore (test hygiene)
Tests (the fix is to MAKE existing tests stop flaking; adding a new test for "init doesn't kick off a Task" would have to assert a negative which is brittle)
Docs

Test Plan

Land the PR. Subsequent CI runs of ModelManagerSuggestedTests should be green every time, on every PR, regardless of network reachability.
Run a release build of the app: the Recommended tab still auto-updates with new OsaurusAI HF entries on launch (the launch-time fetch runs because XCTestConfigurationFilePath isn't set in a regular app process).

Checklist

I have read CONTRIBUTING.md
I added/updated tests where reasonable (see above)
I updated docs/README as needed (n/a — internal test hygiene)
I verified build on macOS with Xcode 16.4+ (authored in a Linux sandbox; verified the touched file via swiftc -frontend -parse; the real verification is the next batch of CI runs being green on this suite)

ModelManager.init kicks off an unstructured Task that calls loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization listing from Hugging Face and feeds the result through applyOsaurusOrgFetch. The unit-test runner repeatedly constructs ModelManager() to drive applyOsaurusOrgFetch directly. The background launch-time fetch races with those test calls — whichever finishes last wins, and the merge result is non-deterministic. That's the root cause of the flaky ModelManagerSuggestedTests failures seen across many of the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.). Gate the launch-time fetch on a small isRunningInTestEnvironment helper that checks for any of XCTestConfigurationFilePath, XCTestBundlePath, or XCTestSessionIdentifier in the process environment. Those variables are only present inside an xctest host process; production app launches still get the HF fetch exactly as before. This is a network call, so removing it under tests also has the side benefit of making the test suite work offline / on hermetic CI runners. Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky ModelManagerSuggestedTests: skip launch-time HF fetch under xctest#16

Fix flaky ModelManagerSuggestedTests: skip launch-time HF fetch under xctest#16
mimeding wants to merge 1 commit into
mainfrom
cursor/modelmanager-test-init-race-2812

mimeding commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mimeding commented May 27, 2026

Summary

Why this matters (business)

What's wrong (technical)

Fix

Verifying I'm reading the failure right

Changes

Test Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants