Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 15 additions & 4 deletions packages/server/src/server/agent/provider-snapshot-manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,11 @@ import {
import { applyMutableProviderConfigToOverrides } from "../daemon-config-store.js";
import type { MutableDaemonConfig } from "../daemon-config-store.js";

const DEFAULT_REFRESH_TIMEOUT_MS = 30_000;
// MCP-heavy providers (omp/pi with many configured MCP servers, opencode) can
// take well over 30s to start an RPC session and enumerate models/commands when
// several providers refresh concurrently. Use a generous budget so those probes
// resolve to available instead of erroring out under contention.
const DEFAULT_REFRESH_TIMEOUT_MS = 90_000;

type ProviderSnapshotChangeListener = (entries: ProviderSnapshotEntry[], cwd: string) => void;

Expand Down Expand Up @@ -520,9 +524,16 @@ export class ProviderSnapshotManager {
}

private async loadProviders(options: ProviderLoadOptions): Promise<void> {
await Promise.allSettled(
options.providers.map((provider) => this.loadProvider({ ...options, provider })),
);
// Probe providers sequentially rather than all at once. MCP-heavy providers
// (omp/pi with many configured MCP servers, opencode) each spawn an RPC
// session and connect every configured MCP server during their probe;
// running them concurrently starves CPU/IO on smaller hosts and makes
// availability probes flake with spurious timeouts/crashes. Those failures
// are then cached and gate on-demand model fetches. Sequential probing
// trades a longer total refresh for reliable per-provider results.
for (const provider of options.providers) {
await this.loadProvider({ ...options, provider }).catch(() => undefined);
}
Comment on lines +534 to +536
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Worst-case refresh time scales linearly with provider count

refreshProvider applies this.refreshTimeoutMs twice per provider — once to isAvailable() and again to fetchModels + fetchModes. If both phases hit the ceiling, a single provider consumes up to 2 × 90 s = 3 min. For an N-provider setup in a daemon restart, the sequential loop can now block for up to N × 3 min before any snapshot is considered fresh. Lightweight providers are fine in practice (binary-presence checks return in milliseconds), but if any slow provider is positioned early in options.providers it delays all providers behind it. A small concurrency limit (e.g. 2–3 parallel probes) would bound the latency regression while still resolving the contention problem the PR targets — the PR description mentions this as a ready alternative.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +534 to +536
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No test coverage for sequential probe ordering

The behavioral change from Promise.allSettled to sequential probing is not exercised by the existing test suite (loadProviders, sequential ordering, and the timeout constant are all absent from the test file). A test with two fake providers — one fast, one slow — could verify that the slow provider's error is isolated and does not prevent the fast provider from resolving to ready, and that the status: "error" snapshot entry is correctly emitted for the timed-out provider. Without this, a regression back to concurrent probing (or a bug in the loop's .catch path) would go undetected.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

}

private loadProvider(options: ProviderLoadOptions & { provider: AgentProvider }): Promise<void> {
Expand Down