feat(memory): learn evidence-backed recovery policies across sessions

## Context

OpenChrome already has PatternLearner and ActionCache, while LATS shows the benefit of feeding failed trajectories and rewards back into future attempts. After deterministic trajectory/reward data exists, OpenChrome can improve recovery ordering across sessions without relying on LLM reflection or unsafe browser branching.

This issue is a focused follow-up: promote repeated, evidence-backed recovery outcomes into a domain/session policy that can bias future candidate ranking.



## Implementation order / dependencies

Implement after #1017, #1018, and #1019. It may optionally feed #1020 later, but the first version must only bias ranking and must not auto-execute learned actions.

## Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

## Goal

Extend existing learning surfaces so repeated recovery outcomes can adjust candidate ranking across sessions. The policy should learn “for this failure fingerprint/domain/tool context, these safe recovery candidates work best”.

## Non-goals / safety constraints

- Do not create a new independent memory system if PatternLearner or DomainMemory can be extended cleanly.
- Do not learn from unverified or ambiguous outcomes as high-confidence success.
- Do not learn raw page content, credentials, cookies, or sensitive args.
- Do not allow learned policy to bypass safety gates.
- Do not auto-execute learned actions by itself; it only biases ranking.

## Proposed implementation

1. Define a compact learned policy record:
   - failure fingerprint
   - optional domain/origin bucket
   - triggering tool category
   - recovery candidate/tool
   - attempts, successes, failures
   - confidence
   - lastSeen/firstSeen
   - safety class
2. Feed it from trajectory ledger + reward scorer outcomes.
3. Promote only when:
   - minimum attempts threshold is met
   - confidence threshold is met
   - outcomes are evidence-backed or contract-backed
4. Use policy as one scoring feature in recovery candidate ranking.
5. Decay or downgrade policy on later failures.
6. Provide tests for persistence and cross-session loading.

## Acceptance criteria

- Repeated evidence-backed recoveries promote a policy record.
- Ambiguous/no-evidence outcomes do not promote high-confidence policy.
- Later failures reduce confidence or rank contribution.
- Learned policy can improve candidate ranking but cannot bypass safety gates.
- Policies are persisted with bounded size and redacted data.
- Existing PatternLearner behavior remains compatible.

## Required automated verification

- Unit tests for:
  - promotion threshold
  - confidence calculation
  - failure decay/downgrade
  - persistence/load
  - safety gate precedence over learned ranking
- Integration test:
  - simulate repeated stale-ref → fresh-read recoveries
  - verify policy promotes and then ranks fresh-read higher in a later session
- Regression tests for existing PatternLearner and ActionCache if touched.
- `npm run build` and targeted learning/recovery tests.

## Fixture requirements

Reuse `/recovery/stale-ref` from the earlier issues so first-run learning and second-run ranking reuse are deterministic and do not depend on external websites.

## Required real OpenChrome verification after implementation

Use OpenChrome against the same local stale-ref fixture across two server runs:

1. First run:
   - trigger the same stale-ref failure/recovery sequence enough times to meet the configured promotion threshold
   - verify a learned policy file/record is created without sensitive data
2. Restart OpenChrome.
3. Second run:
   - trigger the same failure once
   - verify the learned policy contributes to candidate ranking and the known-good recovery appears above alternatives
4. Verify a deliberately failed recovery lowers confidence or does not promote.
5. Verify high-risk candidates remain blocked even if a learned policy record exists.

## Merge evidence required in PR

- Test output for promotion/persistence/ranking behavior.
- Real OpenChrome two-run transcript showing policy promotion and reuse.
- Confirmation that learned policy only biases ranking and does not auto-execute actions.



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** evidence-backed recovery policy memory across sessions.
- **Primary deliverable:** learned policy bias that promotes repeated, successful recovery outcomes into future candidate ranking.
- **Open PR:** #1109 (`feat/1022-recovery-policy-learning`). Continue there.
- **Dependency gate:** should land after #1017/#1018/#1019 or equivalent trajectory, candidate, and reward data.
- **Non-goal:** LLM reflection, auto-executing learned actions, unsafe cross-domain replay, or replacing PatternLearner/ActionCache.

### Overlap and conflict resolution
- [ ] Keep separate from #1018: #1018 ranks current candidates; this issue learns persistent policy bias from repeated evidence.
- [ ] Keep separate from #1019: reward scorer supplies deterministic outcomes; this issue stores/uses learned policy.
- [ ] Keep separate from #1020: bounded search may consume learned ranking later, but this issue must remain advisory-only.
- [ ] Integrate with PatternLearner/ActionCache where appropriate instead of adding redundant memory stores.

### Implementation checklist
- [ ] Define safe policy keying by domain/session/task pattern with redacted inputs and expiry/decay to avoid stale behavior.
- [ ] Learn only from deterministic, evidence-backed outcomes and minimum support thresholds.
- [ ] Apply learned policy as a ranking bias or explanation field, never as automatic execution.
- [ ] Add controls for disabling/clearing learned policies and for ignoring low-confidence entries.
- [ ] Add tests for learning threshold, decay/expiry, redaction, cross-domain isolation, advisory-only application, and fallback when dependencies are absent.

### Success criteria
- [ ] Repeated successful recovery outcomes can influence future ranking with clear evidence provenance.
- [ ] Learned policy never triggers browser actions by itself.
- [ ] Stale/low-confidence/cross-domain policies are bounded or ignored.
- [ ] Existing PatternLearner/ActionCache behavior remains compatible.

### Post-merge OpenChrome live verification checklist
- [ ] Run repeated local fixture recovery attempts that produce deterministic success evidence and verify a policy entry is learned after threshold.
- [ ] Run a new matching fixture attempt and verify candidate ranking includes the learned bias/explanation.
- [ ] Run a different domain/session fixture and verify the policy does not leak across boundaries.
- [ ] Inspect stored policy artifacts for redaction, support count, reward evidence link, and expiry metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): learn evidence-backed recovery policies across sessions #1022

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Non-goals / safety constraints

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(memory): learn evidence-backed recovery policies across sessions #1022

Description

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Non-goals / safety constraints

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions