Skip to content

Distributed render: probe stage crashes on transient browser errors without retry #1687

Description

@miga-heygen

Problem

The distributed render plan() stage crashes when headless Chrome encounters a transient frame detachment error during the browser probe, with no retry logic. The plan tarball is never written, and downstream chunk workers fail.

Error: errorCode=INTERNAL_ERROR message=Navigating frame was detached

This is a Puppeteer/Chrome DevTools Protocol error that occurs when Chrome's main frame navigates away or is detached during an operation — typically due to memory pressure, compositor race conditions, or Chrome warmup timing.

Impact

  • All chunk workers fail (plan artifact missing)
  • The entire distributed render fails after retry pass
  • The error is transient — a fresh browser session would succeed

Root Cause

probeStage.ts calls initializeSession() (which calls page.goto() in frameCapture.ts) with zero retry logic for transient browser errors. The error propagates straight through: initializeSessionprobeStageplan() → adapter → failure.

Known transient Puppeteer errors that should be retried:

  • Navigating frame was detached
  • Target closed
  • Protocol error ... Target closed
  • Session closed
  • Navigation failed because browser has disconnected

Fix Plan

  1. Retry logic in probeStage.ts: Wrap the browser session lifecycle (create → initialize → probe) with retry-on-transient-error. On failure: close the crashed session, create a fresh browser, retry once.
  2. Typed error classification: Add isTransientBrowserError() utility in the engine for classifying retryable vs. fatal browser errors.
  3. Observability: Structured logging at the retry boundary — error class, attempt number, elapsed time.
  4. Tests: Unit tests for the error classifier + integration test for probe retry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions