Skip to content

Conversation

@stranger2904
Copy link
Contributor

@stranger2904 stranger2904 commented Jan 14, 2026

Summary

Implements a comprehensive model fallback system with automatic retry capabilities, addressing issues found in PR #795. This is a clean replacement for #795 with improved conflict resolution and proper integration with the existing codebase.

Core Features

Model Fallback System

  • Automatic retry with fallback models when primary model fails
  • Configurable fallback chains per category and user config
  • Retry delay and max attempts configuration
  • User notification when fallback occurs

Integration Points

  • sisyphus_task tool: Automatic retry with fallback models
  • Background agent manager: Full model fallback support
  • Category system: Pre-configured fallback chains for default categories

Configuration Schema

Added three new config fields:

fallback: string[]              // List of fallback model strings
fallbackDelayMs: number         // Delay between retry attempts (default: 1000ms)
fallbackRetryCount: number      // Max retry attempts (default: modelChain.length)

Bug Fixes (from PR #795)

cubic-dev-ai P1/P2 Issues:

  • Fixed duplicate session.prompt() calls in background agent manager
  • Added validation for empty provider/model segments in parseModelString()
  • Removed pr_790_body.txt (belonged to different PR)
  • Fixed resolveCategoryConfig return type to include modelChain and retryConfig
  • Restored standard provider names in constants (google/, openai/ instead of custom)

auto-review bot Issues:

  • Fixed path traversal vulnerability in getMessageDir() (sessionID validation)
  • Fixed lsp_symbols array access when limit <= 0
  • Added fallback chains to all DEFAULT_CATEGORIES
  • Updated README.md beta link from v3.0.0-beta.1 to v3.0.0-beta.6
  • Fixed AGENTS.md LSP tool count from "11 tools" to "7 tools"

Testing

  • All 1048 tests passing
  • 36 new unit tests for model fallback system covering:
    • parseModelString() validation
    • buildModelChain() chain construction
    • withModelFallback() retry logic with delays
    • Error detection via isModelError()
  • TypeScript compilation clean

Files Changed

 README.md                                 |   2 +-
 assets/oh-my-opencode.schema.json         | 256 +++++++++++++++
 src/config/schema.ts                      |   9 +-
 src/features/background-agent/manager.ts  |   7 +-
 src/features/background-agent/types.ts    |   1 +
 src/features/model-fallback/index.test.ts | 498 ++++++++++++++++++++++++++++++
 src/features/model-fallback/index.ts      | 118 +++++++
 src/tools/AGENTS.md                       |   2 +-
 src/tools/lsp/tools.ts                    |   6 +-
 src/tools/sisyphus-task/constants.ts      |   7 +
 src/tools/sisyphus-task/tools.ts          | 118 +++++--
 11 files changed, 992 insertions(+), 32 deletions(-)

Usage Example

Category Configuration:

{
  "categories": {
    "visual-engineering": {
      "model": "google/gemini-3-pro-preview",
      "fallback": ["anthropic/claude-sonnet-4-5", "google/gemini-3-flash-preview"],
      "fallbackDelayMs": 2000,
      "fallbackRetryCount": 2
    }
  }
}

Automatic Fallback:
When primary model fails, automatically retries with fallback models and notifies user:

Task completed in 15.2s.

Agent: Sisyphus-Junior (category: visual-engineering)
Model: anthropic/claude-sonnet-4-5
⚠️ Model fallback occurred: Primary model failed, used anthropic/claude-sonnet-4-5 instead.
   Failed models: google/gemini-3-pro-preview

Replaces

Testing Instructions

  1. Cherry-pick commits from this branch (clean history from origin/dev)
  2. Run bun test && bun run typecheck to verify
  3. Test model fallback:
    sisyphus_task(
      category="visual-engineering",
      prompt="Test fallback behavior"
    )
    

Related


Summary by cubic

Adds a model fallback system with automatic retries for transient failures, integrated into sisyphus_task and background agents to improve reliability and user feedback. Introduces simple config fields for fallback models, retry delay, and max attempts.

  • New Features

    • Automatic retry with ordered fallback models (modelChain) for rate limits, 5xx, timeouts.
    • New config fields: fallback[], fallbackDelayMs, fallbackRetryCount in categories and agent overrides.
    • sisyphus_task uses the chain, shows the model used, and warns when fallback occurs.
    • Background agent manager accepts modelChain and logs chain length.
    • Default categories now include sensible fallback chains.
  • Bug Fixes

    • Prevent path traversal in getMessageDir by validating sessionID.
    • Guard lsp_symbols when limit <= 0 and when results can be empty.
    • Restore standard provider IDs (google/, openai/) in constants.
    • Update README beta link and correct LSP tool count in AGENTS.md.
    • Validate retry limits (no zero/negative attempts), tighten agent-not-found detection, and clear sessions on early errors.

Written for commit 7460340. Summary will update on new commits.

…a fields

Add new optional fields to AgentOverrideConfigSchema and CategoryConfigSchema:
- fallback: string[] - array of fallback model strings
- fallbackDelayMs: number (100-30000) - delay between retry attempts
- fallbackRetryCount: number (1-5) - maximum models to try

These fields enable automatic model fallback when primary model fails due to
rate limits, service unavailability, or other transient errors.
Implement core model fallback utilities in src/features/model-fallback/:
- parseModelString(): Parse 'provider/model' format into ModelSpec
- modelSpecToString(): Convert ModelSpec back to string
- buildModelChain(): Build ordered list of models from primary + fallbacks
- isModelError(): Detect retryable errors (rate limits, 429, 503, timeouts)
- withModelFallback(): Generic retry wrapper for simple use cases
- formatRetryErrors(): Format error list for user-friendly display

The module detects these transient errors for automatic retry:
- Rate limits (rate_limit, 429)
- Service unavailable (503, 502, unavailable, overloaded, capacity)
- Network errors (timeout, ECONNREFUSED, ENOTFOUND)

Non-model errors (auth failures, invalid input) are NOT retried.
Update sisyphus-task to use the model fallback system:

Integration changes:
- Import parseModelString, isModelError, RetryConfig from model-fallback
- Update resolveCategoryConfig() to return modelChain and retryConfig
- Use configurable fallbackDelayMs and fallbackRetryCount from config
- Build model chain from primary model + fallback array

User notification on fallback:
- Show which model was actually used in task completion message
- Display warning when fallback occurred with list of failed models
- Example: '⚠️ Model fallback occurred: Primary model failed, used
  anthropic/claude-haiku-4-5 instead. Failed models: deepseek/deepseek-v3-2'

Special handling:
- Agent configuration errors (agent.name) do NOT trigger fallback
- Only transient model errors (rate limits, timeouts) retry with fallback
- Inline retry logic preserved for session lifecycle management
Add 36 unit tests covering all model-fallback functions:

parseModelString() tests:
- Valid provider/model strings
- Nested paths (provider/sub/model)
- Invalid single-segment strings
- Empty string handling

buildModelChain() tests:
- Primary model only
- Primary with fallback array
- Invalid models filtered out
- Empty/undefined fallback handling

isModelError() tests:
- Rate limit detection (rate_limit, 429)
- Service errors (503, 502, unavailable, overloaded, capacity)
- Network errors (timeout, ECONNREFUSED, ENOTFOUND)
- Non-model errors return false
- Non-Error objects return false

withModelFallback() tests:
- Success on first model
- Fallback on model error
- No retry on non-model errors
- Exhausts all models on persistent failures
- Respects maxAttempts config
- Empty model chain handling
- Records usedModel on success/failure

formatRetryErrors() tests:
- Empty array handling
- Single error formatting
- Multiple errors as numbered list
- Fix P1: Remove duplicate session.prompt calls in manager.ts that caused
  double execution before task registration. Synced with origin/dev and
  properly integrated modelChain support.

- Fix P2: parseModelString now rejects empty provider/model segments,
  preventing malformed model specs from entering the fallback chain.

- Remove pr_790_body.txt which belonged to a different PR (HTTP transport).
- Restore constants.ts to use standard provider names (google/, openai/)
  instead of custom nextologies provider
- Add modelChain and retryConfig to resolveCategoryConfig return type
Security & Correctness Fixes:
- P1: Fix path traversal vulnerability in getMessageDir by validating sessionID
- P2: Fix lsp_symbols array access when limit <= 0 by enforcing min limit of 1
- P2: Add fallback chains to all DEFAULT_CATEGORIES to enable automatic model fallback

Documentation Fixes:
- P2: Update README.md beta release link from v3.0.0-beta.1 to v3.0.0-beta.6
- P3: Fix AGENTS.md LSP tool count from 11 to 7 to match actual count

All fixes verified with full test suite (1048/1048 passing).
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 11 files

Confidence score: 2/5

  • Zero-attempt retry loop in src/tools/sisyphus-task/tools.ts leaves subagents without any prompt when no model is configured, so user requests silently fail.
  • maxAttempts in src/features/model-fallback/index.ts accepts 0/negative values, producing empty errors even when viable models exist, which risks regressions for fallback flows.
  • Multiple issues in src/tools/sisyphus-task/tools.ts (undefined model passed to build chain, over-broad agent-not-found detection, leaked subagentSessions) indicate fragile error handling that can crash or misreport before the user sees a useful message.
  • Pay close attention to src/tools/sisyphus-task/tools.ts, src/features/model-fallback/index.ts - unstable retry/mode selection logic and cleanup must be corrected.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/model-fallback/index.ts">

<violation number="1" location="src/features/model-fallback/index.ts:75">
P2: maxAttempts accepts 0/negative values, causing zero attempts and empty errors despite available models</violation>
</file>

<file name="src/tools/sisyphus-task/tools.ts">

<violation number="1" location="src/tools/sisyphus-task/tools.ts:103">
P2: buildModelChain is called with possible undefined model, causing runtime crash before user-facing error when no model is configured</violation>

<violation number="2" location="src/tools/sisyphus-task/tools.ts:551">
P1: No prompt is sent for subagent requests because the retry loop stops at 0 attempts when no model is configured.</violation>

<violation number="3" location="src/tools/sisyphus-task/tools.ts:584">
P2: Over-broad agent-not-found detection: any error containing “undefined” is treated as agent missing, aborting fallbacks and misreporting errors.</violation>

<violation number="4" location="src/tools/sisyphus-task/tools.ts:588">
P2: Early prompt failure returns leak subagent session tracking (missing subagentSessions.delete)</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

}

const fallbackList = userConfig?.fallback ?? defaultConfig?.fallback
const modelChain = buildModelChain(config.model, fallbackList)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: buildModelChain is called with possible undefined model, causing runtime crash before user-facing error when no model is configured

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/tools/sisyphus-task/tools.ts, line 103:

<comment>buildModelChain is called with possible undefined model, causing runtime crash before user-facing error when no model is configured</comment>

<file context>
@@ -85,14 +99,22 @@ function resolveCategoryConfig(
   }
 
+  const fallbackList = userConfig?.fallback ?? defaultConfig?.fallback
+  const modelChain = buildModelChain(config.model, fallbackList)
+
+  const retryConfig: RetryConfig = {
</file context>
Suggested change
const modelChain = buildModelChain(config.model, fallbackList)
const modelChain = config.model ? buildModelChain(config.model, fallbackList) : []

- Fix maxAttempts accepting 0/negative values causing zero retry attempts
  in both withModelFallback() and sisyphus_task retry loop
- Remove over-broad "undefined" check in agent-not-found detection that
  could misclassify unrelated errors and abort fallbacks prematurely
- Add missing subagentSessions.delete() calls on early error returns to
  prevent memory leaks in session tracking

Fixes issues reported by cubic-dev-ai review:
- P1: No prompt sent for subagent requests when maxAttempts=0
- P2: maxAttempts accepts 0/negative values
- P2: Over-broad agent-not-found detection
- P2: Session tracking leak on early failures

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/model-fallback/index.ts">

<violation number="1" location="src/features/model-fallback/index.ts:75">
P2: `maxAttempts` is not validated—`NaN` input causes the loop to skip all models and returns `attempts: NaN`</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Use Number.isFinite() to validate maxAttempts before using it.
If maxAttempts is NaN, Infinity, or not a number, fall back to
using the model chain length instead.

Adds test case for NaN input.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/features/model-fallback/index.test.ts">

<violation number="1" location="src/features/model-fallback/index.test.ts:415">
P2: Test does not exercise or assert chain-length fallback for NaN maxAttempts; passes even if NaN coerces to 1.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

The previous test succeeded on first attempt, so it couldn't
distinguish maxAttempts=1 from maxAttempts=chainLength. Now uses
3 models that all fail, verifying all 3 are tried (proving
maxAttempts correctly falls back to chain length, not 1).

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@GollyJer
Copy link
Collaborator

This would be super cool. Are you going to keep working on it @stranger2904?

@stranger2904
Copy link
Contributor Author

@GollyJer Hi, I'm still testing a few things, it did work for my original use-case, but I want to try to break it a bit more... do you want me to convert it to a draft until than ?

@GollyJer GollyJer marked this pull request as draft January 15, 2026 19:11
@GollyJer
Copy link
Collaborator

@stranger2904 Thanks for the update. I converted to Draft for now. 👍
Keep us posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants