Perf: ToolSearch loads only candidate rows from tool_index, not the whole table by mimeding · Pull Request #14 · mimeding/osaurus

mimeding · 2026-05-27T04:38:26Z

Summary

Why this matters (business)

ToolSearchService is on the agent hot path — it runs every time an agent decides which tools are relevant to a user message. As the user installs more plugins / MCP bundles, the tool_index table grows, and the cost of selecting tools grows with it even though the ranking step (VecturaKit) already narrows the candidate set to a handful.

Today, every tool search reads and decodes every row in tool_index from disk just to filter all-but-a-few-out in Swift. For a workspace with a large tool catalogue this is a measurable per-message tax that scales the wrong way (worse over time as the user adds tools).

What's wrong (technical)

            let toolIds = results.compactMap { reverseIdMap[$0.id.uuidString] }
            guard !toolIds.isEmpty else { return [] }

            let enabledNames = await MainActor.run {
                Set(ToolRegistry.shared.listTools().filter { $0.enabled }.map { $0.name })
            }

            let toolIdSet = Set(toolIds)
            let entries = try ToolDatabase.shared.loadAllEntries()
                .filter { toolIdSet.contains($0.id) && enabledNames.contains($0.name) }

The candidate set in toolIdSet is already small — typically topK (default ~5) plus a fan-out factor. The SQL layer can do an IN (?, ?, ?) lookup in O(candidates × index lookup) instead of returning the entire table for Swift to discard.

Fix

Add ToolDatabase.loadEntries(ids: Set<String>) that issues:

SELECT … FROM tool_index WHERE id IN (?, ?, ?, …)

…with one placeholder per id. SQLite's default SQLITE_MAX_VARIABLE_NUMBER is 999, comfortably above any plausible candidate count.

Update ToolSearchService.search to use loadEntries(ids: toolIdSet) and keep only the enabledNames filter — which depends on MainActor ToolRegistry state and can't be expressed in SQL — applied in Swift.

Same rows in the same order downstream; same ToolSearchResult shape; same scoring. Only the cost of getting there changes.

Tests

Three new cases in ToolDatabaseTests:

loadEntriesByIdsReturnsOnlyRequested — happy path with three rows, request two.
loadEntriesByIdsEmptyInputSkipsQuery — empty Set returns [] without issuing invalid SQL like IN ().
loadEntriesByIdsIgnoresUnknownIds — passing IDs that aren't in the table just drops them.

Changes

Test Plan

cd Packages/OsaurusCore && swift test --filter ToolDatabaseTests should pass.
Manually: with a large tool catalogue (e.g. multiple MCP bundles + plugins), trigger several agent tool-aware turns. Compare end-to-end latency before/after. The biggest gain is on cold-cache queries where every row decode shows up in Instruments → SQLite trace.

Checklist

I have read CONTRIBUTING.md
I added/updated tests where reasonable
I updated docs/README as needed (n/a — internal storage helper)
I verified build on macOS with Xcode 16.4+ (authored in a Linux sandbox; verified each touched file via swiftc -frontend -parse)

ToolSearchService.search ran an embedding query that already returned a small ranked set of tool IDs (topK, with a search-time fan-out of ~3*topK), then called ToolDatabase.shared.loadAllEntries() and filtered the result in Swift. On a workspace with a large tool catalogue (plugin-heavy setups, multiple MCP bundles) every RAG search paid the cost of reading and decoding every row in the tool_index table just to discard most of them, on the agent hot path. Add ToolDatabase.loadEntries(ids:) that issues 'SELECT ... WHERE id IN (?, ?, ?, ...)' with one placeholder per id. SQLite's default SQLITE_MAX_VARIABLE_NUMBER (999) bounds the safe batch size well above any plausible search candidate count. Update ToolSearchService.search to call loadEntries(ids: toolIdSet) instead of loadAllEntries().filter, then apply only the enabled-name filter (which depends on MainActor state and can't be expressed in SQL). Behavior change is purely performance: same rows returned in the same order downstream, just without the O(catalog_size) scan-and-filter. Tests added: * loadEntriesByIdsReturnsOnlyRequested - happy path * loadEntriesByIdsEmptyInputSkipsQuery - guard avoids invalid SQL * loadEntriesByIdsIgnoresUnknownIds - missing ids are dropped Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>

ModelManager.init kicks off an unstructured Task that calls loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization listing from Hugging Face and feeds the result through applyOsaurusOrgFetch. The unit-test runner repeatedly constructs ModelManager() to drive applyOsaurusOrgFetch directly. The background launch-time fetch races with those test calls — whichever finishes last wins, and the merge result is non-deterministic. That's the root cause of the flaky ModelManagerSuggestedTests failures seen across many of the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.). Gate the launch-time fetch on a small isRunningInTestEnvironment helper that checks for any of XCTestConfigurationFilePath, XCTestBundlePath, or XCTestSessionIdentifier in the process environment. Those variables are only present inside an xctest host process; production app launches still get the HF fetch exactly as before. This is a network call, so removing it under tests also has the side benefit of making the test suite work offline / on hermetic CI runners. Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>

mimeding mentioned this pull request May 27, 2026

Fix flaky ModelManagerSuggestedTests: skip launch-time HF fetch under xctest #16

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: ToolSearch loads only candidate rows from tool_index, not the whole table#14

Perf: ToolSearch loads only candidate rows from tool_index, not the whole table#14
mimeding wants to merge 2 commits into
mainfrom
cursor/tool-search-load-by-ids-2812

mimeding commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mimeding commented May 27, 2026

Summary

Why this matters (business)

What's wrong (technical)

Fix

Tests

Changes

Test Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants