Context
LATS uses reward/value estimates to select better branches. For OpenChrome, a raw LLM value function is too risky. OpenChrome already has better primitives: outcome contracts, DOM/network/screenshot evidence, progress tracking, and tool result classification. This issue defines a deterministic evidence-based reward scorer that future recovery ranking and bounded recovery search can use.
Implementation order / dependencies
This should land before #1020 and #1022. #1018 can initially use simple heuristics, but should prefer this scorer once available. The scorer must stay pure/deterministic so it is safe for hot paths and tests.
Relationship to existing issues
This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.
Goal
Introduce a deterministic RecoveryRewardScorer that converts tool outcomes and evidence changes into a bounded numeric score for progress, no-op, failure, and recovery. This scorer should be reusable by HintEngine, PlanExecutor recovery, PatternLearner, and future trajectory ledger analysis.
Non-goals / safety constraints
- Do not call an LLM for reward scoring.
- Do not change tool success/failure semantics in this issue.
- Do not auto-execute recovery paths.
- Do not make screenshot/DOM capture mandatory for every tool call.
- Do not store large evidence payloads in memory solely for scoring.
Proposed scoring shape
The exact constants can be adjusted during implementation, but the scorer should support these categories:
- strong positive:
- outcome contract passed
- target URL/DOM/network state reached
- expected data extracted
- weak positive:
- page state changed in the intended direction
- fresh actionable refs discovered after stale-ref failure
- neutral / low:
- observation-only call with new information
- negative:
- repeated observation without new information
- stale ref, element not found, timeout
- auth redirect, blocking page, CAPTCHA
- repeated same failed tool/ref
- hard negative / blocked:
- destructive or transactional action attempted without required gate
Proposed implementation
- Add a small scorer module with typed inputs and outputs:
- input: previous/current lightweight page/evidence metadata, tool result, progress status, optional contract result, recent-call summary
- output:
score, classification, reasons[], confidence
- Prefer existing contract evaluator results when available.
- Use hashes/metadata for DOM/screenshot/network deltas rather than raw payloads.
- Make the scorer pure and easy to unit test.
- Wire it in telemetry only at first: trajectory ledger/recovery ranking can consume it, but normal tool behavior should not change.
Acceptance criteria
- The scorer returns deterministic scores for the same input.
- Contract pass outranks heuristic page-change signals.
- Repeated no-progress observations are penalized.
- Known blocking/auth/CAPTCHA signals receive negative classification.
- Missing evidence is handled gracefully with lower confidence, not thrown errors.
- The scorer can be consumed without creating cycles across contracts/hints/orchestration modules.
Required automated verification
- Unit tests for:
- contract pass/fail scoring
- DOM/content delta positive scoring
- repeated no-progress negative scoring
- stale ref/timeout/auth/blocking classifications
- missing evidence fallback
- Integration test where a failed action followed by a successful fresh read produces a higher recovery score than repeating the failed action.
- Dependency-cruiser or existing tier lint remains clean if applicable.
npm run build and targeted Jest tests.
Fixture requirements
Add or reuse controlled routes in tests/e2e/harness/fixture-server.ts:
/recovery/progress-positive: safe button changes DOM text or URL fragment.
/recovery/no-progress: repeated observation returns same state.
/recovery/blocking-page: auth/blocking signal page.
Required real OpenChrome verification after implementation
Use OpenChrome against controlled fixture pages:
- Positive path:
navigate to fixture
- perform an action that visibly changes DOM or URL
- run/collect the scorer output through the integrated telemetry path
- verify positive classification and reasons mention the observed evidence type
- Negative path:
- repeat an observation-only loop or stale interaction
- verify score decreases or classifies as no-progress/failure
- Contract path:
- run an existing outcome contract assertion that passes
- verify contract pass dominates heuristic scoring
Merge evidence required in PR
- Test output for scorer unit/integration tests.
- A real OpenChrome transcript/log showing positive, negative, and contract-backed scoring.
- A note confirming no LLM calls and no automatic recovery behavior were added.
OpenChrome 실검증 체크리스트
2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.
검증 대상
최신 버전/공통 런타임 검증
이슈별 해결 증거
실패/보류 기준
- 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
- 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.
Context
LATS uses reward/value estimates to select better branches. For OpenChrome, a raw LLM value function is too risky. OpenChrome already has better primitives: outcome contracts, DOM/network/screenshot evidence, progress tracking, and tool result classification. This issue defines a deterministic evidence-based reward scorer that future recovery ranking and bounded recovery search can use.
Implementation order / dependencies
This should land before #1020 and #1022. #1018 can initially use simple heuristics, but should prefer this scorer once available. The scorer must stay pure/deterministic so it is safe for hot paths and tests.
Relationship to existing issues
This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.
Goal
Introduce a deterministic RecoveryRewardScorer that converts tool outcomes and evidence changes into a bounded numeric score for progress, no-op, failure, and recovery. This scorer should be reusable by HintEngine, PlanExecutor recovery, PatternLearner, and future trajectory ledger analysis.
Non-goals / safety constraints
Proposed scoring shape
The exact constants can be adjusted during implementation, but the scorer should support these categories:
Proposed implementation
score,classification,reasons[],confidenceAcceptance criteria
Required automated verification
npm run buildand targeted Jest tests.Fixture requirements
Add or reuse controlled routes in
tests/e2e/harness/fixture-server.ts:/recovery/progress-positive: safe button changes DOM text or URL fragment./recovery/no-progress: repeated observation returns same state./recovery/blocking-page: auth/blocking signal page.Required real OpenChrome verification after implementation
Use OpenChrome against controlled fixture pages:
navigateto fixtureMerge evidence required in PR
OpenChrome 실검증 체크리스트
검증 대상
최신 버전/공통 런타임 검증
npm run build통과를 확인했다.npm run lint:tier통과를 확인했다.npm test -- --runInBand결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.oc_connection_health가 connected 상태를 반환했다.navigate/read_page/interact/javascript_tool경로로 DOM 상태 변화를 관찰했다.이슈별 해결 증거
실패/보류 기준