feat(recovery): LATS-inspired trajectory ledger for failed/recovered attempts

## Context

The LATS review found one safe, high-value idea for OpenChrome: preserve failed and successful attempt trajectories as structured data, but do **not** run browser-tree branching. OpenChrome already has HintEngine, PatternLearner, ProgressTracker, EvidenceBundle, and session snapshot/resume. The missing foundation is a durable, bounded recovery trajectory ledger that records what was attempted, what evidence changed, and why a branch failed or recovered.

This should be telemetry-only in the first implementation so it cannot change browser behavior or harm existing workflows.



## Implementation order / dependencies

This is the safest foundation issue and should be done before #1018, #1019, #1020, and #1022 when possible. It is telemetry-only and must not change browser behavior.

## Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

## Goal

Add a persistent, bounded **RecoveryTrajectoryLedger** that records tool-call attempt nodes for a session/workflow and can be read after restart/compaction for debugging, recovery hints, and future scoring.

## Non-goals / safety constraints

- Do not implement MCTS or speculative browser branching.
- Do not replay actions automatically.
- Do not store raw secrets, cookies, headers, form values, screenshots, or full DOM by default.
- Do not increase normal tool latency by more than a small constant overhead; writes should be best-effort and bounded.
- Do not replace existing ActivityTracker, PatternLearner, ActionCache, or EvidenceBundle; integrate with them.

## Proposed implementation

1. Add a small module, likely under `src/recovery/` or `src/orchestration/`, with an append-only JSONL ledger.
2. Each node should include at least:
   - `sessionId`, optional `workflowId`, optional `tabId`
   - `nodeId`, optional `parentNodeId`
   - `timestamp`
   - `toolName`
   - redacted/hashed args summary, not raw args
   - `resultStatus`: `success | error | no_progress | recovered`
   - `progressStatus` from `ProgressTracker` when available
   - optional `failureFingerprint`
   - optional `recoveryTool`
   - optional `evidenceHandle` or evidence metadata, not inline evidence payload
   - bounded `observationSummary`
   - numeric `reward` if a scorer is available; otherwise omit/null
3. Add hard caps:
   - max nodes per session/workflow
   - max bytes per node
   - max file size or rotation policy
4. Wire it initially at the same boundary that HintEngine/ActivityTracker sees completed tool calls.
5. Make persistence opt-in or default-on only when the existing harness logging directory is configured; document the chosen behavior.
6. Expose a read path for tests and future tools, but avoid creating a new public MCP API unless required.

## Acceptance criteria

- A tool success, tool error, and stuck/non-progress sequence each produce a bounded ledger node.
- Secrets and high-risk values are redacted or hashed in stored args/result summaries.
- Ledger write failure is non-fatal and does not fail the original tool call.
- Ledger storage is bounded and cannot grow without limit during marathon/endurance sessions.
- Existing HintEngine and PatternLearner tests continue to pass.
- A restart/compaction-style test can read previously written nodes.

## Required automated verification

- Unit tests for:
  - node serialization and size bounds
  - redaction/hashing of args and result snippets
  - max-node or max-file cap behavior
  - non-fatal write failures
- Integration test with mocked tool events:
  - one success, one failure, one recovery sequence produces expected JSONL records
- Existing checks:
  - `npm run build`
  - targeted Jest tests for recovery/hints/observability modules

## Fixture requirements

If no existing fixture covers this, add a controlled route in `tests/e2e/harness/fixture-server.ts`, for example `/recovery/stale-ref`, that exposes a button, mutates/removes it after the first snapshot, and provides a safe replacement button. The fixture must not require external network access.

## Required real OpenChrome verification after implementation

Use OpenChrome itself against a local fixture server or controlled test page:

1. Start the built server with ledger persistence enabled.
2. Use an MCP client or existing E2E harness to:
   - `navigate` to a fixture page
   - call `read_page`
   - intentionally perform one invalid element interaction or stale-ref-like failure
   - recover with a fresh `read_page` or valid interaction
3. Verify the ledger contains:
   - at least 3 ordered nodes
   - the failed node has a failure fingerprint or error status
   - the recovery node references the prior failure or records `recovered`
   - no raw cookie/header/secret/form-value leakage
4. Restart the MCP server and verify the ledger is still readable.

## Merge evidence required in PR

- Link to unit/integration test output.
- Include the real OpenChrome verification transcript or log excerpt.
- Include a short statement of measured ledger overhead and storage cap behavior.



## OpenChrome 실검증 체크리스트

> 2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.

### 검증 대상
- **이슈:** #1017 — feat(recovery): LATS-inspired trajectory ledger for failed/recovered attempts
- **적용 버전:** origin/develop @ f1facb8f (f1facb8f6a7b84756fba1dcdb8fa9b7e9a85293a), package 1.11.0
- **로컬 fixture:** http://127.0.0.1:18765/smoke.html
- **주요 OpenChrome 표면:** (surface 없음)
- **판정:** VERIFIED — 최신 develop에서 구현/테스트/실행 증거가 모두 확인되어 close 가능

### 최신 버전/공통 런타임 검증
- [x] 최신 develop 소스를 적용하고 `npm run build` 통과를 확인했다.
- [x] `npm run lint:tier` 통과를 확인했다.
- [x] `npm test -- --runInBand` 결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.
- [x] `oc_connection_health`가 connected 상태를 반환했다.
- [x] 로컬 fixture에서 OpenChrome `navigate/read_page/interact/javascript_tool` 경로로 DOM 상태 변화를 관찰했다.
- [x] 동일 fixture/동일 설정에서 핵심 결과가 재현 가능함을 확인했다.

### 이슈별 해결 증거
- [x] 최신 develop에 연결된 구현 PR: 1216, 1073
- [x] 관련 테스트/소스 증거가 최신 트리에 존재한다:
  - src/mcp-server.ts
  - docs/recovery/trajectory-ledger.md
  - src/recovery/trajectory-ledger.ts
  - src/core/trace/recovery-feedback.ts
  - src/harness/task-ledger.ts
  - src/hints/hint-engine.ts
- [x] 체크리스트에는 OpenChrome 응답/fixture/로컬 산출물로 재현할 수 없는 합격 조건을 남기지 않았다.

### 실패/보류 기준
- 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
- 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(recovery): LATS-inspired trajectory ledger for failed/recovered attempts #1017

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Non-goals / safety constraints

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

OpenChrome 실검증 체크리스트

검증 대상

최신 버전/공통 런타임 검증

이슈별 해결 증거

실패/보류 기준

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(recovery): LATS-inspired trajectory ledger for failed/recovered attempts #1017

Description

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Non-goals / safety constraints

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

OpenChrome 실검증 체크리스트

검증 대상

최신 버전/공통 런타임 검증

이슈별 해결 증거

실패/보류 기준

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions