feat: model fallback system with automatic retry and background agent support #797

stranger2904 · 2026-01-14T20:57:30Z

Summary

Implements a comprehensive model fallback system with automatic retry capabilities, addressing issues found in PR #795. This is a clean replacement for #795 with improved conflict resolution and proper integration with the existing codebase.

Core Features

Model Fallback System

Automatic retry with fallback models when primary model fails
Configurable fallback chains per category and user config
Retry delay and max attempts configuration
User notification when fallback occurs

Integration Points

sisyphus_task tool: Automatic retry with fallback models
Background agent manager: Full model fallback support
Category system: Pre-configured fallback chains for default categories

Configuration Schema

Added three new config fields:

fallback: string[]              // List of fallback model strings
fallbackDelayMs: number         // Delay between retry attempts (default: 1000ms)
fallbackRetryCount: number      // Max retry attempts (default: modelChain.length)

Bug Fixes (from PR #795)

cubic-dev-ai P1/P2 Issues:

Fixed duplicate session.prompt() calls in background agent manager
Added validation for empty provider/model segments in parseModelString()
Removed pr_790_body.txt (belonged to different PR)
Fixed resolveCategoryConfig return type to include modelChain and retryConfig
Restored standard provider names in constants (google/, openai/ instead of custom)

auto-review bot Issues:

Fixed path traversal vulnerability in getMessageDir() (sessionID validation)
Fixed lsp_symbols array access when limit <= 0
Added fallback chains to all DEFAULT_CATEGORIES
Updated README.md beta link from v3.0.0-beta.1 to v3.0.0-beta.6
Fixed AGENTS.md LSP tool count from "11 tools" to "7 tools"

Testing

All 1048 tests passing
36 new unit tests for model fallback system covering:
- parseModelString() validation
- buildModelChain() chain construction
- withModelFallback() retry logic with delays
- Error detection via isModelError()
TypeScript compilation clean

Files Changed

 README.md                                 |   2 +-
 assets/oh-my-opencode.schema.json         | 256 +++++++++++++++
 src/config/schema.ts                      |   9 +-
 src/features/background-agent/manager.ts  |   7 +-
 src/features/background-agent/types.ts    |   1 +
 src/features/model-fallback/index.test.ts | 498 ++++++++++++++++++++++++++++++
 src/features/model-fallback/index.ts      | 118 +++++++
 src/tools/AGENTS.md                       |   2 +-
 src/tools/lsp/tools.ts                    |   6 +-
 src/tools/sisyphus-task/constants.ts      |   7 +
 src/tools/sisyphus-task/tools.ts          | 118 +++++--
 11 files changed, 992 insertions(+), 32 deletions(-)

Usage Example

Category Configuration:

{
  "categories": {
    "visual-engineering": {
      "model": "google/gemini-3-pro-preview",
      "fallback": ["anthropic/claude-sonnet-4-5", "google/gemini-3-flash-preview"],
      "fallbackDelayMs": 2000,
      "fallbackRetryCount": 2
    }
  }
}

Automatic Fallback:
When primary model fails, automatically retries with fallback models and notifies user:

Task completed in 15.2s.

Agent: Sisyphus-Junior (category: visual-engineering)
Model: anthropic/claude-sonnet-4-5
⚠️ Model fallback occurred: Primary model failed, used anthropic/claude-sonnet-4-5 instead.
   Failed models: google/gemini-3-pro-preview

Replaces

Closes feat: automatic model fallback system with background support #795

Testing Instructions

Cherry-pick commits from this branch (clean history from origin/dev)
Run bun test && bun run typecheck to verify

Test model fallback:

sisyphus_task(
  category="visual-engineering",
  prompt="Test fallback behavior"
)

Summary by cubic

Adds a model fallback system with automatic retries for transient failures, integrated into sisyphus_task and background agents to improve reliability and user feedback. Introduces simple config fields for fallback models, retry delay, and max attempts.

New Features
- Automatic retry with ordered fallback models (modelChain) for rate limits, 5xx, timeouts.
- New config fields: fallback[], fallbackDelayMs, fallbackRetryCount in categories and agent overrides.
- sisyphus_task uses the chain, shows the model used, and warns when fallback occurs.
- Background agent manager accepts modelChain and logs chain length.
- Default categories now include sensible fallback chains.
Bug Fixes
- Prevent path traversal in getMessageDir by validating sessionID.
- Guard lsp_symbols when limit <= 0 and when results can be empty.
- Restore standard provider IDs (google/, openai/) in constants.
- Update README beta link and correct LSP tool count in AGENTS.md.
- Validate retry limits (no zero/negative attempts), tighten agent-not-found detection, and clear sessions on early errors.

^{Written for commit 7460340. Summary will update on new commits.}

…a fields Add new optional fields to AgentOverrideConfigSchema and CategoryConfigSchema: - fallback: string[] - array of fallback model strings - fallbackDelayMs: number (100-30000) - delay between retry attempts - fallbackRetryCount: number (1-5) - maximum models to try These fields enable automatic model fallback when primary model fails due to rate limits, service unavailability, or other transient errors.

Implement core model fallback utilities in src/features/model-fallback/: - parseModelString(): Parse 'provider/model' format into ModelSpec - modelSpecToString(): Convert ModelSpec back to string - buildModelChain(): Build ordered list of models from primary + fallbacks - isModelError(): Detect retryable errors (rate limits, 429, 503, timeouts) - withModelFallback(): Generic retry wrapper for simple use cases - formatRetryErrors(): Format error list for user-friendly display The module detects these transient errors for automatic retry: - Rate limits (rate_limit, 429) - Service unavailable (503, 502, unavailable, overloaded, capacity) - Network errors (timeout, ECONNREFUSED, ENOTFOUND) Non-model errors (auth failures, invalid input) are NOT retried.

Update sisyphus-task to use the model fallback system: Integration changes: - Import parseModelString, isModelError, RetryConfig from model-fallback - Update resolveCategoryConfig() to return modelChain and retryConfig - Use configurable fallbackDelayMs and fallbackRetryCount from config - Build model chain from primary model + fallback array User notification on fallback: - Show which model was actually used in task completion message - Display warning when fallback occurred with list of failed models - Example: '⚠️ Model fallback occurred: Primary model failed, used anthropic/claude-haiku-4-5 instead. Failed models: deepseek/deepseek-v3-2' Special handling: - Agent configuration errors (agent.name) do NOT trigger fallback - Only transient model errors (rate limits, timeouts) retry with fallback - Inline retry logic preserved for session lifecycle management

Add 36 unit tests covering all model-fallback functions: parseModelString() tests: - Valid provider/model strings - Nested paths (provider/sub/model) - Invalid single-segment strings - Empty string handling buildModelChain() tests: - Primary model only - Primary with fallback array - Invalid models filtered out - Empty/undefined fallback handling isModelError() tests: - Rate limit detection (rate_limit, 429) - Service errors (503, 502, unavailable, overloaded, capacity) - Network errors (timeout, ECONNREFUSED, ENOTFOUND) - Non-model errors return false - Non-Error objects return false withModelFallback() tests: - Success on first model - Fallback on model error - No retry on non-model errors - Exhausts all models on persistent failures - Respects maxAttempts config - Empty model chain handling - Records usedModel on success/failure formatRetryErrors() tests: - Empty array handling - Single error formatting - Multiple errors as numbered list

- Fix P1: Remove duplicate session.prompt calls in manager.ts that caused double execution before task registration. Synced with origin/dev and properly integrated modelChain support. - Fix P2: parseModelString now rejects empty provider/model segments, preventing malformed model specs from entering the fallback chain. - Remove pr_790_body.txt which belonged to a different PR (HTTP transport).

- Restore constants.ts to use standard provider names (google/, openai/) instead of custom nextologies provider - Add modelChain and retryConfig to resolveCategoryConfig return type

Security & Correctness Fixes: - P1: Fix path traversal vulnerability in getMessageDir by validating sessionID - P2: Fix lsp_symbols array access when limit <= 0 by enforcing min limit of 1 - P2: Add fallback chains to all DEFAULT_CATEGORIES to enable automatic model fallback Documentation Fixes: - P2: Update README.md beta release link from v3.0.0-beta.1 to v3.0.0-beta.6 - P3: Fix AGENTS.md LSP tool count from 11 to 7 to match actual count All fixes verified with full test suite (1048/1048 passing).

cubic-dev-ai

5 issues found across 11 files

Confidence score: 2/5

Zero-attempt retry loop in src/tools/sisyphus-task/tools.ts leaves subagents without any prompt when no model is configured, so user requests silently fail.
maxAttempts in src/features/model-fallback/index.ts accepts 0/negative values, producing empty errors even when viable models exist, which risks regressions for fallback flows.
Multiple issues in src/tools/sisyphus-task/tools.ts (undefined model passed to build chain, over-broad agent-not-found detection, leaked subagentSessions) indicate fragile error handling that can crash or misreport before the user sees a useful message.
Pay close attention to src/tools/sisyphus-task/tools.ts, src/features/model-fallback/index.ts - unstable retry/mode selection logic and cleanup must be corrected.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/model-fallback/index.ts">

<violation number="1" location="src/features/model-fallback/index.ts:75">
P2: maxAttempts accepts 0/negative values, causing zero attempts and empty errors despite available models</violation>
</file>

<file name="src/tools/sisyphus-task/tools.ts">

<violation number="1" location="src/tools/sisyphus-task/tools.ts:103">
P2: buildModelChain is called with possible undefined model, causing runtime crash before user-facing error when no model is configured</violation>

<violation number="2" location="src/tools/sisyphus-task/tools.ts:551">
P1: No prompt is sent for subagent requests because the retry loop stops at 0 attempts when no model is configured.</violation>

<violation number="3" location="src/tools/sisyphus-task/tools.ts:584">
P2: Over-broad agent-not-found detection: any error containing “undefined” is treated as agent missing, aborting fallbacks and misreporting errors.</violation>

<violation number="4" location="src/tools/sisyphus-task/tools.ts:588">
P2: Early prompt failure returns leak subagent session tracking (missing subagentSessions.delete)</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/tools/sisyphus-task/tools.ts

src/features/model-fallback/index.ts

src/tools/sisyphus-task/tools.ts

cubic-dev-ai · 2026-01-14T21:08:52Z

src/tools/sisyphus-task/tools.ts

  }

+  const fallbackList = userConfig?.fallback ?? defaultConfig?.fallback
+  const modelChain = buildModelChain(config.model, fallbackList)


P2: buildModelChain is called with possible undefined model, causing runtime crash before user-facing error when no model is configured

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/sisyphus-task/tools.ts, line 103: <comment>buildModelChain is called with possible undefined model, causing runtime crash before user-facing error when no model is configured</comment> <file context> @@ -85,14 +99,22 @@ function resolveCategoryConfig( } + const fallbackList = userConfig?.fallback ?? defaultConfig?.fallback + const modelChain = buildModelChain(config.model, fallbackList) + + const retryConfig: RetryConfig = { </file context>

Suggested change

const modelChain = buildModelChain(config.model, fallbackList)

const modelChain = config.model ? buildModelChain(config.model, fallbackList) : []

src/tools/sisyphus-task/tools.ts

- Fix maxAttempts accepting 0/negative values causing zero retry attempts in both withModelFallback() and sisyphus_task retry loop - Remove over-broad "undefined" check in agent-not-found detection that could misclassify unrelated errors and abort fallbacks prematurely - Add missing subagentSessions.delete() calls on early error returns to prevent memory leaks in session tracking Fixes issues reported by cubic-dev-ai review: - P1: No prompt sent for subagent requests when maxAttempts=0 - P2: maxAttempts accepts 0/negative values - P2: Over-broad agent-not-found detection - P2: Session tracking leak on early failures Co-Authored-By: Claude Opus 4.5 <[email protected]>

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/model-fallback/index.ts">

<violation number="1" location="src/features/model-fallback/index.ts:75">
P2: `maxAttempts` is not validated—`NaN` input causes the loop to skip all models and returns `attempts: NaN`</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/features/model-fallback/index.ts

Use Number.isFinite() to validate maxAttempts before using it. If maxAttempts is NaN, Infinity, or not a number, fall back to using the model chain length instead. Adds test case for NaN input. Co-Authored-By: Claude Opus 4.5 <[email protected]>

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/model-fallback/index.test.ts">

<violation number="1" location="src/features/model-fallback/index.test.ts:415">
P2: Test does not exercise or assert chain-length fallback for NaN maxAttempts; passes even if NaN coerces to 1.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/features/model-fallback/index.test.ts

The previous test succeeded on first attempt, so it couldn't distinguish maxAttempts=1 from maxAttempts=chainLength. Now uses 3 models that all fail, verifying all 3 are tried (proving maxAttempts correctly falls back to chain length, not 1). Co-Authored-By: Claude Opus 4.5 <[email protected]>

GollyJer · 2026-01-15T07:39:41Z

This would be super cool. Are you going to keep working on it @stranger2904?

stranger2904 · 2026-01-15T17:53:36Z

@GollyJer Hi, I'm still testing a few things, it did work for my original use-case, but I want to try to break it a bit more... do you want me to convert it to a draft until than ?

GollyJer · 2026-01-15T19:12:10Z

@stranger2904 Thanks for the update. I converted to Draft for now. 👍
Keep us posted.

stranger2904 added 8 commits January 14, 2026 15:51

fix: restore standard provider names and fix type annotations

4f32b7e

- Restore constants.ts to use standard provider names (google/, openai/) instead of custom nextologies provider - Add modelChain and retryConfig to resolveCategoryConfig return type

fix: add modelChain to LaunchInput type

7ef053b

stranger2904 mentioned this pull request Jan 14, 2026

feat: automatic model fallback system with background support #795

Closed

cubic-dev-ai bot reviewed Jan 14, 2026

View reviewed changes

kdcokenny assigned code-yeongyu Jan 14, 2026

cubic-dev-ai bot reviewed Jan 14, 2026

View reviewed changes

src/features/model-fallback/index.ts Outdated Show resolved Hide resolved

cubic-dev-ai bot reviewed Jan 14, 2026

View reviewed changes

src/features/model-fallback/index.test.ts Outdated Show resolved Hide resolved

GollyJer marked this pull request as draft January 15, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: model fallback system with automatic retry and background agent support #797

feat: model fallback system with automatic retry and background agent support #797

stranger2904 commented Jan 14, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot Jan 14, 2026

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

GollyJer commented Jan 15, 2026

Uh oh!

stranger2904 commented Jan 15, 2026

Uh oh!

GollyJer commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	const modelChain = buildModelChain(config.model, fallbackList)
	const modelChain = config.model ? buildModelChain(config.model, fallbackList) : []

feat: model fallback system with automatic retry and background agent support #797

Are you sure you want to change the base?

feat: model fallback system with automatic retry and background agent support #797

Conversation

stranger2904 commented Jan 14, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Features

Configuration Schema

Bug Fixes (from PR #795)

Testing

Files Changed

Usage Example

Replaces

Testing Instructions

Related

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GollyJer commented Jan 15, 2026

Uh oh!

stranger2904 commented Jan 15, 2026

Uh oh!

GollyJer commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stranger2904 commented Jan 14, 2026 •

edited by cubic-dev-ai bot

Loading