Skip to content

feat(orchestration): opt-in bounded recovery search for safe plan failures #1020

@shaun0927

Description

@shaun0927

Context

After OpenChrome can record trajectories, rank recovery candidates, and score evidence deterministically, it can safely adopt the most useful LATS idea: bounded alternative-path recovery. This must be much narrower than MCTS because OpenChrome controls real browser sessions with real side effects.

Implementation order / dependencies

Implement after #1017, #1018, and #1019 or equivalent functionality. This is intentionally P2 because it changes execution behavior; it must be opt-in and safety-gated.

Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

Goal

Add an opt-in bounded recovery search for selected failed plan/tool steps. It should try a very small number of safe, reversible recovery candidates and stop as soon as evidence or success criteria indicate recovery.

Prerequisites

This issue should be implemented only after the following foundations exist or equivalent functionality is available:

  • recovery trajectory ledger or equivalent recent-attempt history
  • evidence-based reward scorer
  • advisory recovery candidate ranking
  • side-effect/risk classification for candidate actions

Non-goals / safety constraints

  • Do not implement general MCTS over live browser actions.
  • Do not branch real browser state unless an action is read-only or safely reversible.
  • Do not auto-submit forms, purchase, delete, confirm, send messages, change account settings, or run arbitrary destructive JS.
  • Do not exceed strict budgets:
    • max depth: 2 by default
    • max candidates per failure: 3 by default
    • max extra tool calls per failed step: configurable, small default
    • max wall time per recovery attempt
  • Must be opt-in via config/plan flag/tool parameter, not silently enabled for all users.

Proposed implementation

  1. Add a recovery search executor that consumes ranked candidates.
  2. Filter candidates through a safety gate:
    • allow read_only
    • allow reversible when explicitly configured
    • block side_effect_possible unless a future explicit gate exists
  3. Execute one candidate at a time, never parallel speculative browser mutations.
  4. After each candidate:
    • evaluate success criteria or evidence reward
    • write trajectory node/reward
    • stop on success or hard negative signal
  5. Integrate first with PlanExecutor recovery handling, because it already has step boundaries and success criteria.
  6. Return a structured recovery summary in the plan result or hint output.

Acceptance criteria

  • Recovery search is disabled by default or enabled only by an explicit safe flag.
  • Search never runs blocked high-risk candidates.
  • Budgets are enforced even if tools hang or keep returning no-progress.
  • Successful recovery stops further attempts.
  • Exhausted recovery returns a clear summary of attempted candidates and why they failed.
  • Existing static recovery handlers continue to work and have well-defined precedence.

Required automated verification

  • Unit tests for:
    • budget enforcement
    • safety gate blocking side-effect candidates
    • stop-on-success behavior
    • no-progress exhaustion summary
    • static handler precedence
  • PlanExecutor integration test with mocked tools:
    • first step fails
    • first recovery candidate fails/no-progress
    • second safe candidate succeeds
    • success criteria pass
  • Timeout test proving a hung recovery candidate cannot hang the plan.
  • npm run build and targeted orchestration/recovery Jest tests.

Fixture requirements

Add or reuse controlled routes in tests/e2e/harness/fixture-server.ts:

  • /recovery/stale-ref: first candidate path fails, fresh snapshot + re-resolution succeeds.
  • /recovery/dangerous-action: page includes labels such as Delete account and Buy Now; test asserts recovery search refuses to auto-execute them.

Required real OpenChrome verification after implementation

Use OpenChrome with a local fixture that supports a safe stale-ref or delayed-render failure:

  1. Create or use a fixture where the first interaction fails due to stale ref or delayed DOM readiness.
  2. Run a plan with bounded recovery search explicitly enabled.
  3. Verify OpenChrome:
    • detects the failed step
    • tries at most the configured candidate/tool-call budget
    • recovers through a safe candidate such as fresh read_page + re-resolution
    • reaches the success condition
  4. Run the same fixture with recovery disabled and verify the original failure remains visible.
  5. Run a fixture containing a dangerous action label such as Delete account or Buy Now and verify bounded recovery refuses to auto-execute it.

Merge evidence required in PR

  • PlanExecutor mocked integration output.
  • Real OpenChrome transcript comparing recovery disabled vs enabled.
  • Evidence that high-risk candidate blocking works.
  • Confirmation of default-off or explicit opt-in behavior.

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

  • Add explicit opt-in configuration with hard limits for candidate count, depth, time, allowed tools/actions, and reversible/safe action classes.
  • Require deterministic preconditions and evidence scoring before accepting a recovery branch.
  • Stop on first sufficiently scored safe recovery or on any safety/budget violation.
  • Persist branch attempt evidence and reasons for rejected/aborted candidates.
  • Add tests for disabled default, safe success, budget exhaustion, unsafe candidate rejection, side-effect guard, and scoring stop condition.

Success criteria

  • Recovery search is impossible unless explicitly enabled and bounded.
  • Unsafe or irreversible candidates are rejected before execution.
  • Successful recovery has deterministic evidence and a clear stop reason.
  • Existing plan/tool failure behavior remains unchanged when search is disabled.

Post-merge OpenChrome live verification checklist

  • Run a local fixture failure with recovery search disabled and verify existing failure behavior remains unchanged.
  • Enable search with two safe candidates and verify only bounded candidates are attempted and evidence is recorded.
  • Include an unsafe candidate and verify it is rejected without execution.
  • Record branch attempts, scores, stop reason, and safety-rejection evidence in merge notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or requestharnessExecution harness, run lifecycle, recovery, and verificationlats-learningsImprovements inspired by LanguageAgentTreeSearch analysislive-verificationRequires live OpenChrome/browser validation after implementationoutcome-contractsVerifiable execution via pre/post-condition contracts (Q2)reliabilityReliability and stability improvementsecuritySecurity vulnerability

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions