Skip to content

feat(memory): learn evidence-backed recovery policies across sessions #1022

@shaun0927

Description

@shaun0927

Context

OpenChrome already has PatternLearner and ActionCache, while LATS shows the benefit of feeding failed trajectories and rewards back into future attempts. After deterministic trajectory/reward data exists, OpenChrome can improve recovery ordering across sessions without relying on LLM reflection or unsafe browser branching.

This issue is a focused follow-up: promote repeated, evidence-backed recovery outcomes into a domain/session policy that can bias future candidate ranking.

Implementation order / dependencies

Implement after #1017, #1018, and #1019. It may optionally feed #1020 later, but the first version must only bias ranking and must not auto-execute learned actions.

Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

Goal

Extend existing learning surfaces so repeated recovery outcomes can adjust candidate ranking across sessions. The policy should learn “for this failure fingerprint/domain/tool context, these safe recovery candidates work best”.

Non-goals / safety constraints

  • Do not create a new independent memory system if PatternLearner or DomainMemory can be extended cleanly.
  • Do not learn from unverified or ambiguous outcomes as high-confidence success.
  • Do not learn raw page content, credentials, cookies, or sensitive args.
  • Do not allow learned policy to bypass safety gates.
  • Do not auto-execute learned actions by itself; it only biases ranking.

Proposed implementation

  1. Define a compact learned policy record:
    • failure fingerprint
    • optional domain/origin bucket
    • triggering tool category
    • recovery candidate/tool
    • attempts, successes, failures
    • confidence
    • lastSeen/firstSeen
    • safety class
  2. Feed it from trajectory ledger + reward scorer outcomes.
  3. Promote only when:
    • minimum attempts threshold is met
    • confidence threshold is met
    • outcomes are evidence-backed or contract-backed
  4. Use policy as one scoring feature in recovery candidate ranking.
  5. Decay or downgrade policy on later failures.
  6. Provide tests for persistence and cross-session loading.

Acceptance criteria

  • Repeated evidence-backed recoveries promote a policy record.
  • Ambiguous/no-evidence outcomes do not promote high-confidence policy.
  • Later failures reduce confidence or rank contribution.
  • Learned policy can improve candidate ranking but cannot bypass safety gates.
  • Policies are persisted with bounded size and redacted data.
  • Existing PatternLearner behavior remains compatible.

Required automated verification

  • Unit tests for:
    • promotion threshold
    • confidence calculation
    • failure decay/downgrade
    • persistence/load
    • safety gate precedence over learned ranking
  • Integration test:
    • simulate repeated stale-ref → fresh-read recoveries
    • verify policy promotes and then ranks fresh-read higher in a later session
  • Regression tests for existing PatternLearner and ActionCache if touched.
  • npm run build and targeted learning/recovery tests.

Fixture requirements

Reuse /recovery/stale-ref from the earlier issues so first-run learning and second-run ranking reuse are deterministic and do not depend on external websites.

Required real OpenChrome verification after implementation

Use OpenChrome against the same local stale-ref fixture across two server runs:

  1. First run:
    • trigger the same stale-ref failure/recovery sequence enough times to meet the configured promotion threshold
    • verify a learned policy file/record is created without sensitive data
  2. Restart OpenChrome.
  3. Second run:
    • trigger the same failure once
    • verify the learned policy contributes to candidate ranking and the known-good recovery appears above alternatives
  4. Verify a deliberately failed recovery lowers confidence or does not promote.
  5. Verify high-risk candidates remain blocked even if a learned policy record exists.

Merge evidence required in PR

  • Test output for promotion/persistence/ranking behavior.
  • Real OpenChrome two-run transcript showing policy promotion and reuse.
  • Confirmation that learned policy only biases ranking and does not auto-execute actions.

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

  • Define safe policy keying by domain/session/task pattern with redacted inputs and expiry/decay to avoid stale behavior.
  • Learn only from deterministic, evidence-backed outcomes and minimum support thresholds.
  • Apply learned policy as a ranking bias or explanation field, never as automatic execution.
  • Add controls for disabling/clearing learned policies and for ignoring low-confidence entries.
  • Add tests for learning threshold, decay/expiry, redaction, cross-domain isolation, advisory-only application, and fallback when dependencies are absent.

Success criteria

  • Repeated successful recovery outcomes can influence future ranking with clear evidence provenance.
  • Learned policy never triggers browser actions by itself.
  • Stale/low-confidence/cross-domain policies are bounded or ignored.
  • Existing PatternLearner/ActionCache behavior remains compatible.

Post-merge OpenChrome live verification checklist

  • Run repeated local fixture recovery attempts that produce deterministic success evidence and verify a policy entry is learned after threshold.
  • Run a new matching fixture attempt and verify candidate ranking includes the learned bias/explanation.
  • Run a different domain/session fixture and verify the policy does not leak across boundaries.
  • Inspect stored policy artifacts for redaction, support count, reward evidence link, and expiry metadata.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or requestharnessExecution harness, run lifecycle, recovery, and verificationlats-learningsImprovements inspired by LanguageAgentTreeSearch analysislive-verificationRequires live OpenChrome/browser validation after implementationobservabilityObservabilityreliabilityReliability and stability improvementverified-skill-memoryContract-backed skill auto-curation (Q3-Q4)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions