Skip to content

feat(hints): rank evidence-scored recovery candidates in stuck hints #1018

@shaun0927

Description

@shaun0927

Context

OpenChrome already detects stalling/stuck behavior through ProgressTracker and emits hints through HintEngine. The LATS review suggests the next safe improvement is not automatic tree search, but ranking concrete recovery candidates so the LLM receives a specific next-best action instead of a vague “try something different”.

This issue should depend on, or gracefully use, the recovery trajectory ledger if available. It should also avoid duplicating existing structured recovery hint work; integrate with current HintEngine/PatternLearner rules.

Implementation order / dependencies

Can be implemented using recent ActivityTracker/ProgressTracker state alone, but should consume #1017 trajectory nodes and #1019 reward scores when those are available. It must remain advisory-only; automatic execution belongs to #1020.

Relationship to existing issues

This issue should be checked against open issues such as structured recovery hints, action replay/cache, outcome contracts, and observability work before implementation. If an existing issue already covers part of this scope, keep this issue limited to the LATS-inspired recovery/trajectory behavior described here and cross-link rather than duplicate implementation.

Goal

Add an evidence-scored recovery candidate ranking path for stuck/stalling and deterministic tool failures. The output should remain advisory at first: rank candidate next actions, but do not auto-execute them.

Non-goals / safety constraints

  • Do not auto-run recovery actions in this issue.
  • Do not ask an LLM to choose by free-form reflection.
  • Do not produce candidates that are destructive, transactional, or irreversible.
  • Do not override existing high-priority safety/security hints.
  • Do not require full trajectory ledger support; the ranker should work with recent calls only if needed.

Candidate examples

For stale ref / DOM mutation:

  1. read_page for fresh refs
  2. retry element resolution by accessible name/role
  3. fallback to find

For click did not change state:

  1. verify hit target / current page state
  2. try AX-based interaction if not already used
  3. try keyboard activation if element is focused/focusable
  4. escalate to existing Ralph fallback path if available

For auth/blocking page:

  1. classify blocking/auth state
  2. suggest headed/profile handoff when applicable
  3. stop further blind retries

Proposed implementation

  1. Add a small scoring module that receives:
    • current tool name/result/error
    • recent calls from ActivityTracker
    • progress status
    • optional trajectory ledger snippets
    • optional PatternLearner/domain memory confidence
  2. Produce a bounded list of candidates:
    • tool
    • safe args template or description, without sensitive values
    • reason
    • score
    • risk: read_only | reversible | side_effect_possible
    • blockedReason when excluded by safety gate
  3. Integrate the top 1-3 candidates into HintResult.suggestion or a backwards-compatible structured field.
  4. Keep first-match-wins rule behavior stable unless the matched rule explicitly opts into candidate ranking.
  5. Add tests ensuring high-risk candidates are filtered and repeated failed tools are down-ranked.

Acceptance criteria

  • Stuck/stalling hints include a concrete ranked candidate list when enough context exists.
  • Repeating the same failed tool/ref is penalized.
  • Read-only or reversible candidates are preferred before side-effect-possible candidates.
  • Auth/CAPTCHA/blocking-page cases do not suggest blind retries.
  • Existing hint text remains readable for current MCP clients.
  • No candidate includes raw secrets or full user-entered form values.

Required automated verification

  • Unit tests for candidate scoring:
    • stale ref → fresh read_page ranked first
    • repeated same error → same tool down-ranked
    • auth/blocking page → blind interaction candidates excluded
    • side-effect risk ordering works
  • HintEngine integration test showing structured candidates appear for stuck/stalling.
  • Regression tests for existing static error recovery rules.
  • npm run build and targeted hint/recovery Jest tests.

Fixture requirements

Add or reuse controlled routes in tests/e2e/harness/fixture-server.ts:

  • /recovery/stale-ref: element ref becomes stale after DOM mutation; recovery should prefer fresh read_page/re-resolution.
  • /recovery/blocking-page: static page containing blocking/auth/CAPTCHA-like signals; recovery should avoid blind retries.

Required real OpenChrome verification after implementation

Use OpenChrome against a fixture page with a DOM mutation that invalidates a previously captured ref:

  1. navigate to the fixture.
  2. read_page to obtain refs.
  3. Mutate the DOM or navigate so a ref becomes stale.
  4. Try the stale interaction.
  5. Verify the tool result/hint includes a ranked candidate where fresh read_page or re-resolution is top-ranked.
  6. Follow the top candidate manually through OpenChrome and verify it recovers the task.

Also verify a blocking/auth fixture or simulated blocked page:

  1. Navigate to a controlled blocking page fixture.
  2. Attempt a non-progress interaction.
  3. Verify the hint does not recommend repeated blind click/type attempts and instead suggests classification/headed/auth handoff as appropriate.

Merge evidence required in PR

  • Before/after hint output for the stale-ref fixture.
  • Test output for candidate scoring and HintEngine integration.
  • Confirmation that no auto-execution occurs in this issue.

OpenChrome 실검증 체크리스트

2026-05-14 최신 merged 버전 적용 후 재검증. OpenChrome 응답, 로컬 fixture, 빌드/테스트 산출물로 직접 증명 가능한 항목만 합격 조건으로 남겼다. 사람 리뷰, 외부 사이트 안정성, 미확인 PR 상태 같은 조건은 합격 조건에서 제외한다.

검증 대상

최신 버전/공통 런타임 검증

  • 최신 develop 소스를 적용하고 npm run build 통과를 확인했다.
  • npm run lint:tier 통과를 확인했다.
  • npm test -- --runInBand 결과 504/507 suites 통과, 3 skipped, 6429/6525 tests 통과, 96 skipped를 확인했다. 단, Jest open-handle 경고는 별도 런타임 리스크로 기록했다.
  • oc_connection_health가 connected 상태를 반환했다.
  • 로컬 fixture에서 OpenChrome navigate/read_page/interact/javascript_tool 경로로 DOM 상태 변화를 관찰했다.
  • 동일 fixture/동일 설정에서 핵심 결과가 재현 가능함을 확인했다.

이슈별 해결 증거

  • 최신 develop에 연결된 구현 PR: 1215
  • 관련 테스트/소스 증거가 최신 트리에 존재한다:
    • docs/recovery/trajectory-ledger.md
    • src/hints/recovery-candidates.ts
    • docs/recovery/reward-scorer.md
    • src/core/trace/recovery-feedback.ts
    • src/hints/hint-engine.ts
    • src/journal/task-journal.ts
  • 체크리스트에는 OpenChrome 응답/fixture/로컬 산출물로 재현할 수 없는 합격 조건을 남기지 않았다.

실패/보류 기준

  • 체크가 하나라도 미충족이면 이슈를 닫지 않는다.
  • 실패가 최신 코드 결함으로 재현되면 실패한 OpenChrome 호출, 응답 excerpt, fixture 상태를 증거로 남기고 별도 수정 PR을 올린다.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1P1 highenhancementNew feature or requestharnessExecution harness, run lifecycle, recovery, and verificationlats-learningsImprovements inspired by LanguageAgentTreeSearch analysislive-verificationRequires live OpenChrome/browser validation after implementationobservabilityObservabilityperformancePerformance, latency, throughput, or resource-use improvementreliabilityReliability and stability improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions