Skip to content

feat(recovery): LATS-inspired trajectory ledger for failed/recovered attempts #1017

@shaun0927

Description

@shaun0927

Context

The LATS review found one safe, high-value idea for OpenChrome: preserve failed and successful attempt trajectories as structured data, but do not run browser-tree branching. OpenChrome already has HintEngine, PatternLearner, ProgressTracker, EvidenceBundle, and session snapshot/resume. The missing foundation is a durable, bounded recovery trajectory ledger that records what was attempted, what evidence changed, and why a branch failed or recovered.

This should be telemetry-only in the first implementation so it cannot change browser behavior or harm existing workflows.

Implementation order / dependencies

This is the safest foundation issue and should be done before #1018, #1019, #1020, and #1022 when possible. It is telemetry-only and must not change browser behavior.

Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

Goal

Add a persistent, bounded RecoveryTrajectoryLedger that records tool-call attempt nodes for a session/workflow and can be read after restart/compaction for debugging, recovery hints, and future scoring.

Non-goals / safety constraints

  • Do not implement MCTS or speculative browser branching.
  • Do not replay actions automatically.
  • Do not store raw secrets, cookies, headers, form values, screenshots, or full DOM by default.
  • Do not increase normal tool latency by more than a small constant overhead; writes should be best-effort and bounded.
  • Do not replace existing ActivityTracker, PatternLearner, ActionCache, or EvidenceBundle; integrate with them.

Proposed implementation

  1. Add a small module, likely under src/recovery/ or src/orchestration/, with an append-only JSONL ledger.
  2. Each node should include at least:
    • sessionId, optional workflowId, optional tabId
    • nodeId, optional parentNodeId
    • timestamp
    • toolName
    • redacted/hashed args summary, not raw args
    • resultStatus: success | error | no_progress | recovered
    • progressStatus from ProgressTracker when available
    • optional failureFingerprint
    • optional recoveryTool
    • optional evidenceHandle or evidence metadata, not inline evidence payload
    • bounded observationSummary
    • numeric reward if a scorer is available; otherwise omit/null
  3. Add hard caps:
    • max nodes per session/workflow
    • max bytes per node
    • max file size or rotation policy
  4. Wire it initially at the same boundary that HintEngine/ActivityTracker sees completed tool calls.
  5. Make persistence opt-in or default-on only when the existing harness logging directory is configured; document the chosen behavior.
  6. Expose a read path for tests and future tools, but avoid creating a new public MCP API unless required.

Acceptance criteria

  • A tool success, tool error, and stuck/non-progress sequence each produce a bounded ledger node.
  • Secrets and high-risk values are redacted or hashed in stored args/result summaries.
  • Ledger write failure is non-fatal and does not fail the original tool call.
  • Ledger storage is bounded and cannot grow without limit during marathon/endurance sessions.
  • Existing HintEngine and PatternLearner tests continue to pass.
  • A restart/compaction-style test can read previously written nodes.

Required automated verification

  • Unit tests for:
    • node serialization and size bounds
    • redaction/hashing of args and result snippets
    • max-node or max-file cap behavior
    • non-fatal write failures
  • Integration test with mocked tool events:
    • one success, one failure, one recovery sequence produces expected JSONL records
  • Existing checks:
    • npm run build
    • targeted Jest tests for recovery/hints/observability modules

Fixture requirements

If no existing fixture covers this, add a controlled route in tests/e2e/harness/fixture-server.ts, for example /recovery/stale-ref, that exposes a button, mutates/removes it after the first snapshot, and provides a safe replacement button. The fixture must not require external network access.

Required real OpenChrome verification after implementation

Use OpenChrome itself against a local fixture server or controlled test page:

  1. Start the built server with ledger persistence enabled.
  2. Use an MCP client or existing E2E harness to:
    • navigate to a fixture page
    • call read_page
    • intentionally perform one invalid element interaction or stale-ref-like failure
    • recover with a fresh read_page or valid interaction
  3. Verify the ledger contains:
    • at least 3 ordered nodes
    • the failed node has a failure fingerprint or error status
    • the recovery node references the prior failure or records recovered
    • no raw cookie/header/secret/form-value leakage
  4. Restart the MCP server and verify the ledger is still readable.

Merge evidence required in PR

  • Link to unit/integration test output.
  • Include the real OpenChrome verification transcript or log excerpt.
  • Include a short statement of measured ledger overhead and storage cap behavior.

OpenChrome 실검증 체크리스트

2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.

검증 대상

최신 버전/공통 런타임 검증

  • 최신 develop 소스를 적용하고 npm run build 통과를 확인했다.
  • npm run lint:tier 통과를 확인했다.
  • npm test -- --runInBand 결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.
  • oc_connection_health가 connected 상태를 반환했다.
  • 로컬 fixture에서 OpenChrome navigate/read_page/interact/javascript_tool 경로로 DOM 상태 변화를 관찰했다.
  • 동일 fixture/동일 설정에서 핵심 결과가 재현 가능함을 확인했다.

이슈별 해결 증거

  • 최신 develop에 연결된 구현 PR: 1216, 1073
  • 관련 테스트/소스 증거가 최신 트리에 존재한다:
    • src/mcp-server.ts
    • docs/recovery/trajectory-ledger.md
    • src/recovery/trajectory-ledger.ts
    • src/core/trace/recovery-feedback.ts
    • src/harness/task-ledger.ts
    • src/hints/hint-engine.ts
  • 체크리스트에는 OpenChrome 응답/fixture/로컬 산출물로 재현할 수 없는 합격 조건을 남기지 않았다.

실패/보류 기준

  • 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
  • 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestharnessExecution harness, run lifecycle, recovery, and verificationlats-learningsImprovements inspired by LanguageAgentTreeSearch analysislive-verificationRequires live OpenChrome/browser validation after implementationobservabilityObservabilityreliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions