feat(orchestration): opt-in bounded recovery search for safe plan failures

## Context

After OpenChrome can record trajectories, rank recovery candidates, and score evidence deterministically, it can safely adopt the most useful LATS idea: bounded alternative-path recovery. This must be much narrower than MCTS because OpenChrome controls real browser sessions with real side effects.



## Implementation order / dependencies

Implement after #1017, #1018, and #1019 or equivalent functionality. This is intentionally P2 because it changes execution behavior; it must be opt-in and safety-gated.

## Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

## Goal

Add an opt-in bounded recovery search for selected failed plan/tool steps. It should try a very small number of safe, reversible recovery candidates and stop as soon as evidence or success criteria indicate recovery.

## Prerequisites

This issue should be implemented only after the following foundations exist or equivalent functionality is available:

- recovery trajectory ledger or equivalent recent-attempt history
- evidence-based reward scorer
- advisory recovery candidate ranking
- side-effect/risk classification for candidate actions

## Non-goals / safety constraints

- Do not implement general MCTS over live browser actions.
- Do not branch real browser state unless an action is read-only or safely reversible.
- Do not auto-submit forms, purchase, delete, confirm, send messages, change account settings, or run arbitrary destructive JS.
- Do not exceed strict budgets:
  - max depth: 2 by default
  - max candidates per failure: 3 by default
  - max extra tool calls per failed step: configurable, small default
  - max wall time per recovery attempt
- Must be opt-in via config/plan flag/tool parameter, not silently enabled for all users.

## Proposed implementation

1. Add a recovery search executor that consumes ranked candidates.
2. Filter candidates through a safety gate:
   - allow `read_only`
   - allow `reversible` when explicitly configured
   - block `side_effect_possible` unless a future explicit gate exists
3. Execute one candidate at a time, never parallel speculative browser mutations.
4. After each candidate:
   - evaluate success criteria or evidence reward
   - write trajectory node/reward
   - stop on success or hard negative signal
5. Integrate first with `PlanExecutor` recovery handling, because it already has step boundaries and success criteria.
6. Return a structured recovery summary in the plan result or hint output.

## Acceptance criteria

- Recovery search is disabled by default or enabled only by an explicit safe flag.
- Search never runs blocked high-risk candidates.
- Budgets are enforced even if tools hang or keep returning no-progress.
- Successful recovery stops further attempts.
- Exhausted recovery returns a clear summary of attempted candidates and why they failed.
- Existing static recovery handlers continue to work and have well-defined precedence.

## Required automated verification

- Unit tests for:
  - budget enforcement
  - safety gate blocking side-effect candidates
  - stop-on-success behavior
  - no-progress exhaustion summary
  - static handler precedence
- PlanExecutor integration test with mocked tools:
  - first step fails
  - first recovery candidate fails/no-progress
  - second safe candidate succeeds
  - success criteria pass
- Timeout test proving a hung recovery candidate cannot hang the plan.
- `npm run build` and targeted orchestration/recovery Jest tests.

## Fixture requirements

Add or reuse controlled routes in `tests/e2e/harness/fixture-server.ts`:

- `/recovery/stale-ref`: first candidate path fails, fresh snapshot + re-resolution succeeds.
- `/recovery/dangerous-action`: page includes labels such as `Delete account` and `Buy Now`; test asserts recovery search refuses to auto-execute them.

## Required real OpenChrome verification after implementation

Use OpenChrome with a local fixture that supports a safe stale-ref or delayed-render failure:

1. Create or use a fixture where the first interaction fails due to stale ref or delayed DOM readiness.
2. Run a plan with bounded recovery search explicitly enabled.
3. Verify OpenChrome:
   - detects the failed step
   - tries at most the configured candidate/tool-call budget
   - recovers through a safe candidate such as fresh `read_page` + re-resolution
   - reaches the success condition
4. Run the same fixture with recovery disabled and verify the original failure remains visible.
5. Run a fixture containing a dangerous action label such as `Delete account` or `Buy Now` and verify bounded recovery refuses to auto-execute it.

## Merge evidence required in PR

- PlanExecutor mocked integration output.
- Real OpenChrome transcript comparing recovery disabled vs enabled.
- Evidence that high-risk candidate blocking works.
- Confirmation of default-off or explicit opt-in behavior.



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** opt-in bounded recovery search for safe failed plan/tool steps.
- **Primary deliverable:** very small, safety-gated alternative-path search using deterministic candidates and reward/evidence scoring.
- **Open PR:** #1108 (`feat/1020-bounded-recovery-search`). Continue there.
- **Dependency gate:** requires #1017/#1018/#1019 or equivalent trajectory ledger, candidate ranking, and reward scoring.
- **Non-goal:** MCTS/tree-search framework, autonomous broad exploration, unsafe side-effect replay, or default recovery on all failures.

### Overlap and conflict resolution
- [ ] Keep separate from #1018: #1018 ranks candidates; this issue executes a bounded opt-in subset under safety gates.
- [ ] Keep separate from #1019: #1019 scores evidence; this issue consumes scores to stop/select.
- [ ] Keep separate from #1022: learned policies can bias candidates later but are not required to execute search.
- [ ] Coordinate with #1028 PlanExecutor frontier metadata so search starts only from invalidated/safe frontier when applicable.

### Implementation checklist
- [ ] Add explicit opt-in configuration with hard limits for candidate count, depth, time, allowed tools/actions, and reversible/safe action classes.
- [ ] Require deterministic preconditions and evidence scoring before accepting a recovery branch.
- [ ] Stop on first sufficiently scored safe recovery or on any safety/budget violation.
- [ ] Persist branch attempt evidence and reasons for rejected/aborted candidates.
- [ ] Add tests for disabled default, safe success, budget exhaustion, unsafe candidate rejection, side-effect guard, and scoring stop condition.

### Success criteria
- [ ] Recovery search is impossible unless explicitly enabled and bounded.
- [ ] Unsafe or irreversible candidates are rejected before execution.
- [ ] Successful recovery has deterministic evidence and a clear stop reason.
- [ ] Existing plan/tool failure behavior remains unchanged when search is disabled.

### Post-merge OpenChrome live verification checklist
- [ ] Run a local fixture failure with recovery search disabled and verify existing failure behavior remains unchanged.
- [ ] Enable search with two safe candidates and verify only bounded candidates are attempted and evidence is recorded.
- [ ] Include an unsafe candidate and verify it is rejected without execution.
- [ ] Record branch attempts, scores, stop reason, and safety-rejection evidence in merge notes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(orchestration): opt-in bounded recovery search for safe plan failures #1020

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Prerequisites

Non-goals / safety constraints

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(orchestration): opt-in bounded recovery search for safe plan failures #1020

Description

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Prerequisites

Non-goals / safety constraints

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions