Hotfix Soniox fallback keys for staging#3220
Conversation
📋 PR Review Helper📱 Mobile App Build⏳ Waiting for build... 🕶️ ASG Client Build⏳ Waiting for build... 🔀 Test Locallygh pr checkout 3220 |
| ); | ||
| return true; | ||
| } | ||
|
|
There was a problem hiding this comment.
Auth errors block fallback retry
Medium Severity
After this change, live Soniox transcription streams call recordCredentialFailure, which can disable or cooldown the key that was in use. isRetryableError still returns false for Soniox 401/403, so the manager does not schedule a new stream that would run the provider’s credential loop and select a fallback key. Capacity-related failures were updated to retry for that reason; auth-style failures on an already-open stream leave fallback keys unused.
Reviewed by Cursor Bugbot for commit ec26e83. Configure here.
There was a problem hiding this comment.
Leaving this open intentionally for follow-up. The ASAP prod hotfix keeps auth-style runtime errors non-retryable and focuses on the 402/budget quota fallback path that took prod down.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ec26e83126
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| this.logger, | ||
| this.config, | ||
| throw ( | ||
| lastError ?? |
There was a problem hiding this comment.
Preserve retryable credential failures after fallback auth errors
When the preferred credential fails with a retryable capacity condition (for example a concurrent-stream limit) and a later fallback credential is misconfigured or invalid, this throws only the last fallback error. Because stream creation failures reach handleStreamError with no current provider, isRetryableError sees the final 401/403-style Soniox error and gives up, so the stream is never retried when the primary key’s cooldown expires. Consider surfacing a pool-level retryable error whenever any attempted credential is cooling down instead of letting a later permanent fallback error mask it.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Leaving this open intentionally for follow-up. This is a real edge case, but I kept the outage hotfix narrow after product guidance: 402/budget cooldown plus shared fallback key rotation.
| lower.includes("unauthorized") | ||
| ) { | ||
| return { kind: "auth", cooldownMs: Number.POSITIVE_INFINITY, disabled: true }; | ||
| } |
There was a problem hiding this comment.
403 not treated as auth
Medium Severity
classifySonioxCredentialFailure disables credentials only for HTTP 401, while stream retry logic in TranscriptionManager also treats Soniox 403 as non-retryable. A Soniox error 403 response gets a short transient cooldown instead of being disabled, so the key pool can select that credential again after cooldown despite authorization-style failure.
Reviewed by Cursor Bugbot for commit 5a3d149. Configure here.
There was a problem hiding this comment.
Leaving this open intentionally for follow-up. We are not broadening 403 into process-disable behavior in the ASAP hotfix.
|
Related Soniox fallback PRs:
Current prod/main hotfix head: e06781b. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ac44ca0. Configure here.
| stream = this.createStreamForCredential(credential, language, options); | ||
| await stream.initialize(); | ||
| this.keyPool.recordSuccess(credential.id); | ||
| return stream; |
There was a problem hiding this comment.
Pool success before stream ready
Medium Severity
createTranscriptionStream and createTranslationStream call keyPool.recordSuccess as soon as initialize() resolves, but TranscriptionManager and TranslationManager still run waitForStreamReady afterward. When initialization hangs (e.g. SDK connect() returns while state stays INITIALIZING), the manager throws a timeout without recordFailure, so the pool keeps treating that credential as healthy and retries the primary instead of fallbacks.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ac44ca0. Configure here.
There was a problem hiding this comment.
Leaving this open intentionally for follow-up. The latest hotfix prevents a success from clearing an active cooldown, but moving success recording later in the manager is a larger lifecycle change than I want in the prod outage patch.


Summary
Backports the Soniox fallback key hotfix to
stagingfor legacy Cloud v1.Why
The primary Soniox org/key can hit account-level limits, causing every new transcription or translation stream to fail through the same exhausted credential. This branch mirrors the
mainhotfix so staging has the same operational behavior.What changed
SONIOX_FALLBACK_API_KEYSsupport for transcription and translation.SONIOX_API_KEYas primary while healthy.cloud/issues/108-soniox-fallback-api-keys/.Validation
bun test cloud/packages/cloud/src/services/session/soniox/__tests__/SonioxKeyPool.test.tsgit diff --check HEAD~1..HEADcd cloud && bun run buildcd cloud/packages/cloud && bun run buildNote
Medium Risk
Touches live transcription/translation stream creation and retry paths; misconfigured fallbacks or aggressive cooldowns could delay or fail streams, but behavior is additive behind env config and mirrors an existing main hotfix.
Overview
Adds
SONIOX_FALLBACK_API_KEYS(comma-separated) alongsideSONIOX_API_KEYso Cloud v1 can open Soniox streams on alternate credentials when the primary key hits limits or errors.Introduces
SonioxKeyPool: prefers primary, round-robins healthy fallbacks, classifies failures (concurrency, rate limit, quota, auth, transient) with per-key cooldowns, and logs SHA-256 fingerprints only. Cooldown state is shared in-process across transcription and translation viagetSharedSonioxKeyPool.Transcription and translation providers loop credentials on stream creation, record failures back to the pool from SDK/WebSocket streams (
credentialId), and managers retry when all keys are cooling down or on quota/concurrency/rate-limit errors so a new stream can pick another key.Default Soniox model bumps from
stt-rt-v4tostt-rt-v5in config and stream defaults. Issue docs and unit tests for the key pool are included.Reviewed by Cursor Bugbot for commit ac44ca0. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by cubic
Backports Soniox fallback API keys to
stagingfor Cloud v1, adding automatic key failover with shared cooldowns so transcription and translation keep working when the primary key hits limits. Also defaults the Soniox model tostt-rt-v5and tightens quota cooldown handling to prevent premature reuse.New Features
SONIOX_FALLBACK_API_KEYS(comma‑separated) alongsideSONIOX_API_KEY.SonioxKeyPool: prefers primary, round‑robins healthy fallbacks, classifies failures (concurrency/rate/quota/auth/transient) with per‑key cooldowns; quota cooldowns are long and a later success does not clear an active cooldown; cooldowns shared across providers in‑process; logs SHA‑256 key fingerprints only.credentialIdfailures back to the pool; one@soniox/nodeclient per credential; managers retry on capacity errors and when no credentials are temporarily available; includes unit tests.Migration
SONIOX_FALLBACK_API_KEYSin thestagingenv and redeploy.SONIOX_API_KEYremains primary.Written for commit ac44ca0. Summary will update on new commits.