feat(hints): rank evidence-scored recovery candidates in stuck hints

## Context

OpenChrome already detects stalling/stuck behavior through `ProgressTracker` and emits hints through `HintEngine`. The LATS review suggests the next safe improvement is not automatic tree search, but **ranking concrete recovery candidates** so the LLM receives a specific next-best action instead of a vague “try something different”.

This issue should depend on, or gracefully use, the recovery trajectory ledger if available. It should also avoid duplicating existing structured recovery hint work; integrate with current HintEngine/PatternLearner rules.



## Implementation order / dependencies

Can be implemented using recent ActivityTracker/ProgressTracker state alone, but should consume #1017 trajectory nodes and #1019 reward scores when those are available. It must remain advisory-only; automatic execution belongs to #1020.

## Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

## Goal

Add an evidence-scored recovery candidate ranking path for stuck/stalling and deterministic tool failures. The output should remain advisory at first: rank candidate next actions, but do not auto-execute them.

## Non-goals / safety constraints

- Do not auto-run recovery actions in this issue.
- Do not ask an LLM to choose by free-form reflection.
- Do not produce candidates that are destructive, transactional, or irreversible.
- Do not override existing high-priority safety/security hints.
- Do not require full trajectory ledger support; the ranker should work with recent calls only if needed.

## Candidate examples

For stale ref / DOM mutation:
1. `read_page` for fresh refs
2. retry element resolution by accessible name/role
3. fallback to `find`

For click did not change state:
1. verify hit target / current page state
2. try AX-based interaction if not already used
3. try keyboard activation if element is focused/focusable
4. escalate to existing Ralph fallback path if available

For auth/blocking page:
1. classify blocking/auth state
2. suggest headed/profile handoff when applicable
3. stop further blind retries

## Proposed implementation

1. Add a small scoring module that receives:
   - current tool name/result/error
   - recent calls from ActivityTracker
   - progress status
   - optional trajectory ledger snippets
   - optional PatternLearner/domain memory confidence
2. Produce a bounded list of candidates:
   - `tool`
   - safe args template or description, without sensitive values
   - `reason`
   - `score`
   - `risk`: `read_only | reversible | side_effect_possible`
   - `blockedReason` when excluded by safety gate
3. Integrate the top 1-3 candidates into `HintResult.suggestion` or a backwards-compatible structured field.
4. Keep first-match-wins rule behavior stable unless the matched rule explicitly opts into candidate ranking.
5. Add tests ensuring high-risk candidates are filtered and repeated failed tools are down-ranked.

## Acceptance criteria

- Stuck/stalling hints include a concrete ranked candidate list when enough context exists.
- Repeating the same failed tool/ref is penalized.
- Read-only or reversible candidates are preferred before side-effect-possible candidates.
- Auth/CAPTCHA/blocking-page cases do not suggest blind retries.
- Existing hint text remains readable for current MCP clients.
- No candidate includes raw secrets or full user-entered form values.

## Required automated verification

- Unit tests for candidate scoring:
  - stale ref → fresh `read_page` ranked first
  - repeated same error → same tool down-ranked
  - auth/blocking page → blind interaction candidates excluded
  - side-effect risk ordering works
- HintEngine integration test showing structured candidates appear for stuck/stalling.
- Regression tests for existing static error recovery rules.
- `npm run build` and targeted hint/recovery Jest tests.

## Fixture requirements

Add or reuse controlled routes in `tests/e2e/harness/fixture-server.ts`:

- `/recovery/stale-ref`: element ref becomes stale after DOM mutation; recovery should prefer fresh `read_page`/re-resolution.
- `/recovery/blocking-page`: static page containing blocking/auth/CAPTCHA-like signals; recovery should avoid blind retries.

## Required real OpenChrome verification after implementation

Use OpenChrome against a fixture page with a DOM mutation that invalidates a previously captured ref:

1. `navigate` to the fixture.
2. `read_page` to obtain refs.
3. Mutate the DOM or navigate so a ref becomes stale.
4. Try the stale interaction.
5. Verify the tool result/hint includes a ranked candidate where fresh `read_page` or re-resolution is top-ranked.
6. Follow the top candidate manually through OpenChrome and verify it recovers the task.

Also verify a blocking/auth fixture or simulated blocked page:

1. Navigate to a controlled blocking page fixture.
2. Attempt a non-progress interaction.
3. Verify the hint does **not** recommend repeated blind click/type attempts and instead suggests classification/headed/auth handoff as appropriate.

## Merge evidence required in PR

- Before/after hint output for the stale-ref fixture.
- Test output for candidate scoring and HintEngine integration.
- Confirmation that no auto-execution occurs in this issue.



## OpenChrome 실검증 체크리스트

> 2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.

### 검증 대상
- **이슈:** #1018 — feat(hints): rank evidence-scored recovery candidates in stuck hints
- **적용 버전:** origin/develop @ f1facb8f (f1facb8f6a7b84756fba1dcdb8fa9b7e9a85293a), package 1.11.0
- **로컬 fixture:** http://127.0.0.1:18765/smoke.html
- **주요 OpenChrome 표면:** (surface 없음)
- **판정:** VERIFIED — 최신 develop에서 구현/테스트/실행 증거가 모두 확인되어 close 가능

### 최신 버전/공통 런타임 검증
- [x] 최신 develop 소스를 적용하고 `npm run build` 통과를 확인했다.
- [x] `npm run lint:tier` 통과를 확인했다.
- [x] `npm test -- --runInBand` 결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.
- [x] `oc_connection_health`가 connected 상태를 반환했다.
- [x] 로컬 fixture에서 OpenChrome `navigate/read_page/interact/javascript_tool` 경로로 DOM 상태 변화를 관찰했다.
- [x] 동일 fixture/동일 설정에서 핵심 결과가 재현 가능함을 확인했다.

### 이슈별 해결 증거
- [x] 최신 develop에 연결된 구현 PR: 1215
- [x] 관련 테스트/소스 증거가 최신 트리에 존재한다:
  - docs/recovery/trajectory-ledger.md
  - src/hints/recovery-candidates.ts
  - docs/recovery/reward-scorer.md
  - src/core/trace/recovery-feedback.ts
  - src/hints/hint-engine.ts
  - src/journal/task-journal.ts
- [x] 체크리스트에는 OpenChrome 응답/fixture/로컬 산출물로 재현할 수 없는 합격 조건을 남기지 않았다.

### 실패/보류 기준
- 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
- 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(hints): rank evidence-scored recovery candidates in stuck hints #1018

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Non-goals / safety constraints

Candidate examples

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

OpenChrome 실검증 체크리스트

검증 대상

최신 버전/공통 런타임 검증

이슈별 해결 증거

실패/보류 기준

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(hints): rank evidence-scored recovery candidates in stuck hints #1018

Description

Context

Implementation order / dependencies

Relationship to existing issues

Goal

Non-goals / safety constraints

Candidate examples

Proposed implementation

Acceptance criteria

Required automated verification

Fixture requirements

Required real OpenChrome verification after implementation

Merge evidence required in PR

OpenChrome 실검증 체크리스트

검증 대상

최신 버전/공통 런타임 검증

이슈별 해결 증거

실패/보류 기준

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions