Context
OpenChrome already has PatternLearner and ActionCache, while LATS shows the benefit of feeding failed trajectories and rewards back into future attempts. After deterministic trajectory/reward data exists, OpenChrome can improve recovery ordering across sessions without relying on LLM reflection or unsafe browser branching.
This issue is a focused follow-up: promote repeated, evidence-backed recovery outcomes into a domain/session policy that can bias future candidate ranking.
Implementation order / dependencies
Implement after #1017, #1018, and #1019. It may optionally feed #1020 later, but the first version must only bias ranking and must not auto-execute learned actions.
Relationship to existing issues
This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.
Goal
Extend existing learning surfaces so repeated recovery outcomes can adjust candidate ranking across sessions. The policy should learn “for this failure fingerprint/domain/tool context, these safe recovery candidates work best”.
Non-goals / safety constraints
- Do not create a new independent memory system if PatternLearner or DomainMemory can be extended cleanly.
- Do not learn from unverified or ambiguous outcomes as high-confidence success.
- Do not learn raw page content, credentials, cookies, or sensitive args.
- Do not allow learned policy to bypass safety gates.
- Do not auto-execute learned actions by itself; it only biases ranking.
Proposed implementation
- Define a compact learned policy record:
- failure fingerprint
- optional domain/origin bucket
- triggering tool category
- recovery candidate/tool
- attempts, successes, failures
- confidence
- lastSeen/firstSeen
- safety class
- Feed it from trajectory ledger + reward scorer outcomes.
- Promote only when:
- minimum attempts threshold is met
- confidence threshold is met
- outcomes are evidence-backed or contract-backed
- Use policy as one scoring feature in recovery candidate ranking.
- Decay or downgrade policy on later failures.
- Provide tests for persistence and cross-session loading.
Acceptance criteria
- Repeated evidence-backed recoveries promote a policy record.
- Ambiguous/no-evidence outcomes do not promote high-confidence policy.
- Later failures reduce confidence or rank contribution.
- Learned policy can improve candidate ranking but cannot bypass safety gates.
- Policies are persisted with bounded size and redacted data.
- Existing PatternLearner behavior remains compatible.
Required automated verification
- Unit tests for:
- promotion threshold
- confidence calculation
- failure decay/downgrade
- persistence/load
- safety gate precedence over learned ranking
- Integration test:
- simulate repeated stale-ref → fresh-read recoveries
- verify policy promotes and then ranks fresh-read higher in a later session
- Regression tests for existing PatternLearner and ActionCache if touched.
npm run build and targeted learning/recovery tests.
Fixture requirements
Reuse /recovery/stale-ref from the earlier issues so first-run learning and second-run ranking reuse are deterministic and do not depend on external websites.
Required real OpenChrome verification after implementation
Use OpenChrome against the same local stale-ref fixture across two server runs:
- First run:
- trigger the same stale-ref failure/recovery sequence enough times to meet the configured promotion threshold
- verify a learned policy file/record is created without sensitive data
- Restart OpenChrome.
- Second run:
- trigger the same failure once
- verify the learned policy contributes to candidate ranking and the known-good recovery appears above alternatives
- Verify a deliberately failed recovery lowers confidence or does not promote.
- Verify high-risk candidates remain blocked even if a learned policy record exists.
Merge evidence required in PR
- Test output for promotion/persistence/ranking behavior.
- Real OpenChrome two-run transcript showing policy promotion and reuse.
- Confirmation that learned policy only biases ranking and does not auto-execute actions.
Curated scope, overlap handling, and verification checklist
Scope classification
Overlap and conflict resolution
Implementation checklist
Success criteria
Post-merge OpenChrome live verification checklist
Context
OpenChrome already has PatternLearner and ActionCache, while LATS shows the benefit of feeding failed trajectories and rewards back into future attempts. After deterministic trajectory/reward data exists, OpenChrome can improve recovery ordering across sessions without relying on LLM reflection or unsafe browser branching.
This issue is a focused follow-up: promote repeated, evidence-backed recovery outcomes into a domain/session policy that can bias future candidate ranking.
Implementation order / dependencies
Implement after #1017, #1018, and #1019. It may optionally feed #1020 later, but the first version must only bias ranking and must not auto-execute learned actions.
Relationship to existing issues
This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.
Goal
Extend existing learning surfaces so repeated recovery outcomes can adjust candidate ranking across sessions. The policy should learn “for this failure fingerprint/domain/tool context, these safe recovery candidates work best”.
Non-goals / safety constraints
Proposed implementation
Acceptance criteria
Required automated verification
npm run buildand targeted learning/recovery tests.Fixture requirements
Reuse
/recovery/stale-reffrom the earlier issues so first-run learning and second-run ranking reuse are deterministic and do not depend on external websites.Required real OpenChrome verification after implementation
Use OpenChrome against the same local stale-ref fixture across two server runs:
Merge evidence required in PR
Curated scope, overlap handling, and verification checklist
Scope classification
feat/1022-recovery-policy-learning). Continue there.Overlap and conflict resolution
Implementation checklist
Success criteria
Post-merge OpenChrome live verification checklist