feat(core): deterministic selector-chain replay layer for skill memory

**Tier**: `core` (additive; new opt-in tool, augmented `oc_skill_record` schema with backfill)
**PR target**: `develop`

## Why

`oc_skill_record` / `oc_skill_recall` (#785, #807) store skill *intent* — step ordering and the contract id — but **not the artifacts needed to re-execute deterministically**. Today the only way to re-run a recalled skill is for the host LLM to read the steps, re-resolve each target via `read_page` + `find`, and re-issue every action. Every successful re-run therefore costs the same token budget as the first run.

HyperAgent (`@hyperbrowser/agent`) addresses the same gap with an "action cache": after a successful agent run it persists `(xpath, frameIndex, method, args, elementId)` per step, and `runFromActionCache()` replays via XPath without LLM calls, falling back to LLM only when XPath fails. The result is **0 LLM calls** on the happy path for recurring tasks.

OpenChrome cannot embed an LLM (P3) and cannot orchestrate (P1), so it cannot port the HyperAgent loop verbatim. The portable piece is **the artifact format and a stateless replay executor that returns structured errors when resolution fails**, letting the host LLM decide what to do next.

This issue closes a measurable gap distinct from #820 (graph state-hash skipper, *higher* layer) and #824 (auto-recall on navigate, *recall* surface). #820 decides "which steps to skip"; this issue decides "given a step, how to re-execute it without re-resolving from scratch".

## What

Three additive changes inside the existing skill-memory module, all tier:core, all opt-in by per-call argument:

1. Enrich every recorded step with a `replay_artifact`: an ordered list of selector strategies captured at record time, plus optional `backendNodeId` hint.
2. Add a new tool `oc_skill_replay` that, given a `skill_id`, executes each step's artifact deterministically — no host round-trip per step — and returns a structured result envelope. On any artifact-resolution failure, it returns control to the host with `code: "ARTIFACT_RESOLUTION_FAILED"` and the offending step index, so the host LLM can fall back to standard `read_page` + `interact`.
3. Wire `oc_assert` into the replay loop so that, if the recorded contract id is reachable, the run terminates with a verdict regardless of replay path.

`oc_skill_record` schema gains the optional `replay_artifact` per step (default absent → existing behaviour). Recording the artifact is opt-in via a new `capture_artifact: true` arg on the existing actions that already feed the recorder (`interact`, `fill_form`, `form_input`, `navigate`, `tabs_create`), or via the upcoming codegen aggregator from #836 when that lands.

## Background — verified facts in repo

- `src/tools/oc-skill-record.ts` — idempotent on `(domain, name)`, JSON-per-domain store at `~/.openchrome/skill-memory/<encodedDomain>/skills.json` (`src/core/skill-memory/store.ts`).
- `src/tools/oc-skill-recall.ts` — recency-sorted, no LLM ranking.
- `src/core/contracts/evidence-bundle.ts` — already supplies a structured failure-capture format usable here.
- `src/utils/ralph/ralph-engine.ts` — S1–S7 strategy fallback; the artifact format MUST be a strict subset of the strategies Ralph can issue, so replay reuses the same execution path.
- `src/core/perception/backend-node-registry.ts` (introduced by #844) — replay artifact MAY reference a `nodeRef` for the first attempt; on miss, falls through to the selector chain.

## Contract

```ts
// src/core/skill-memory/replay-artifact.ts (new)
export interface ReplayArtifactStep {
  /** Strict subset of Ralph S1–S6; HITL (S7) is never persisted. */
  kind: 'click' | 'fill' | 'navigate' | 'press' | 'select' | 'submit' | 'scroll';
  /** Tried in order. First successful resolution wins. */
  selectors: Array<
    | { type: 'node_ref'; value: string }          // #844 nodeRef hint
    | { type: 'xpath'; value: string }
    | { type: 'css'; value: string }
    | { type: 'role_name'; role: string; name: string }
    | { type: 'accessible_name'; value: string }
    | { type: 'text'; value: string }
  >;
  /**
   * 0 = main frame; non-zero values are per-target ordinals assigned at first
   * observation and used only internally by the replay artifact (NOT surfaced
   * in tools/list responses). If a follow-up issue formalizes frame addressing
   * in the public surface, this field re-aligns there.
   */
  frameOrdinal?: number;
  args?: Record<string, unknown>;
  /** Optional inline contract check; if present, evaluated via oc_assert after the step. */
  post_assert?: { contract_id: string };
}

export interface ReplayArtifact {
  schema_version: 1;
  recorded_at: number;
  recorder: { openchrome_version: string };
  steps: ReplayArtifactStep[];
}
```

```ts
// src/tools/oc-skill-replay.ts (new)
interface OcSkillReplayArgs {
  skill_id: string;
  /** Optional override; default = all steps. */
  step_range?: { from: number; to: number };
  /** If true, stop on first step that fails the embedded contract check. Default true. */
  stop_on_contract_failure?: boolean;
  /** Default 5s per step; honors src/config/defaults.ts existing budgets. */
  step_timeout_ms?: number;
}

interface OcSkillReplayResult {
  ok: boolean;
  steps_executed: number;
  steps_total: number;
  /** Set when ok === false. */
  failure?: {
    code:
      | 'ARTIFACT_MISSING'
      | 'ARTIFACT_RESOLUTION_FAILED'
      | 'CONTRACT_FAILED'
      | 'STEP_TIMEOUT'
      | 'TARGET_NAVIGATED_AWAY'
      | 'DISABLED';
    step_index: number;
    detail: string;
    evidence_bundle_path?: string; // populated for CONTRACT_FAILED via existing evidence-bundle module
  };
  /** Per-step resolution telemetry, for curator promote signals. */
  step_results: Array<{
    index: number;
    resolved_via: 'node_ref' | 'xpath' | 'css' | 'role_name' | 'accessible_name' | 'text';
    selector_attempts: number;
    elapsed_ms: number;
  }>;
}
```

**Invariants**
1. `oc_skill_replay` MUST NOT call any LLM and MUST NOT orchestrate beyond the persisted step list (P1, P3).
2. Skills recorded under v1.11 (no `replay_artifact`) return `code: "ARTIFACT_MISSING"` immediately; no implicit upgrade attempt.
3. The selector list is tried *in order* and stops on first resolution; this guarantees deterministic replay across runs given an unchanged DOM.
4. The artifact format never carries raw secrets — `args` for `fill` actions store the substituted-placeholder form `${SECRET:NAME}` whenever the existing secrets layer (#834) is in play.
5. When the feature flag is off, `tools/list` includes `oc_skill_replay` (P2 schema parity) but invocations return a `{ ok: false, failure: { code: "DISABLED" } }` fact. New `replay_artifact` fields on `oc_skill_record` responses are `null` at runtime when off.

## Proposed Implementation

1. **Artifact module** (`src/core/skill-memory/replay-artifact.ts`):
   - Pure types + validator. Schema version pinned at `1`.
   - JSON-Schema dump used by both `oc_skill_record` and `oc_skill_replay` for input validation.

2. **Recorder hook** (modifications to `src/tools/interact.ts`, `src/tools/fill-form.ts`, `src/tools/form-input.ts`, `src/tools/navigate.ts`):
   - New optional input arg `capture_artifact: true`.
   - When set, after a successful Ralph resolution, write the **winning** selector strategy + 2 sibling candidates (in order of robustness: role+name → accessible name → xpath → css) into a session-scoped buffer.
   - **Buffer scope**: per CDP target. **Bounded**: at most 100 step entries (FIFO eviction beyond that). **Flushed destructively** by `oc_skill_record` or on target close — no cross-skill leakage.

3. **Replay tool** (`src/tools/oc-skill-replay.ts`):
   - For each step, resolve via the artifact's selector list using **existing** `src/utils/ralph/ralph-engine.ts` building blocks (no new locator code).
   - On success, dispatch the action via the *same* CDP path the original tool uses.
   - On embedded `post_assert`, call `oc_assert` against the recorded `contract_id`.
   - Emit step telemetry via `src/core/trace/storage.ts` so the pilot curator (`src/pilot/curator/`) can read replay-success rate per skill and use it as a promote-pass signal — this connects the artifact loop to the existing curator without changing curator code in this PR.

4. **Schema migration**: bump `SkillRecord` JSON shape with `schema_version: 1 → 2` (additive `replay_artifact` on each step). Existing files load as `version: 1` and are read-compatible; on next idempotent re-record they upgrade in place. No destructive migration script.

5. **Registration**: register `oc_skill_replay` in `src/tools/index.ts`. Add to `tools/list` unconditionally. No new env flag needed for the recorder side (per-call arg); for the replay tool, gate full execution behind `OPENCHROME_SKILL_REPLAY` (default **on**, opt-out for parity testing) via the new `isCoreFeatureEnabled` helper introduced by #844. When the flag is off the tool returns a `DISABLED` fact.

## Boundary

**New**:
- `src/core/skill-memory/replay-artifact.ts`
- `src/tools/oc-skill-replay.ts`
- `tests/core/skill-memory/replay-artifact.test.ts`
- `tests/tools/oc-skill-replay.test.ts`
- `tests/e2e/scenarios/skill-replay.e2e.ts`
- `scripts/verify/skill-replay.mjs`
- `tests/fixtures/skill-replay/index.html`

**Modified**:
- `src/core/skill-memory/store.ts` (schema_version bump, validator)
- `src/tools/oc-skill-record.ts` (accept `replay_artifact` per step; backfill from recorder buffer)
- `src/tools/oc-skill-recall.ts` (return `replay_artifact` when present)
- `src/tools/interact.ts`, `src/tools/fill-form.ts`, `src/tools/form-input.ts`, `src/tools/navigate.ts` (capture_artifact arg + session buffer)
- `src/tools/index.ts` (register `oc_skill_replay`)
- `src/harness/flags.ts` (reuse `isCoreFeatureEnabled` from #844; if #844 unmerged, copy the helper inline and remove on merge)

## Acceptance Criteria

- [ ] `replay-artifact.ts` exports types + a strict JSON-Schema validator; unit tests reject malformed artifacts.
- [ ] `oc_skill_record` accepts `replay_artifact` per step; idempotent re-record preserves `skill_id` and **updates** the artifact when supplied; back-compat with v1 records (no `replay_artifact` field).
- [ ] `oc_skill_replay` lands; calls return one of `{ ok: true, ... }` or `{ ok: false, failure: {...} }`. Never throws.
- [ ] `interact` / `fill_form` / `form_input` / `navigate` accept `capture_artifact`; default false; when false, response bytes are byte-identical to v1.11.0 on a frozen fixture (P2 zero-impact test).
- [ ] `OPENCHROME_SKILL_REPLAY=0` → `oc_skill_replay` returns `code: "DISABLED"`; `tools/list` includes the tool either way (P2 schema parity).
- [ ] No outbound HTTP, no LLM API (P3) — covered by `tests/core/skill-memory/no-network.test.ts` blocking `fetch` / `http` / `https`.
- [ ] Trace storage records `{ skill_id, step_index, resolved_via, selector_attempts, elapsed_ms, ok }` per step.
- [ ] `npm run build && npm test && npm run lint && npm run lint:tier` green.
- [ ] PR targets `develop`.

## Verification (post-merge, via openchrome MCP)

### Setup
Bundled fixture page at `tests/fixtures/skill-replay/index.html` — a 4-step form (name → email → captcha-text → submit). Served via `npm run fixture-serve` (port 4173). Avoids public-site flakiness.

### Scenario 1 — record-then-replay happy path
1. `mcp__openchrome__navigate { url: "http://localhost:4173/skill-replay/" }`
2. `mcp__openchrome__interact { action: "fill", target: { text: "Name" }, value: "Alice", capture_artifact: true }`
3. `mcp__openchrome__interact { action: "fill", target: { text: "Email" }, value: "a@b.co", capture_artifact: true }`
4. `mcp__openchrome__interact { action: "fill", target: { text: "Captcha" }, value: "1234", capture_artifact: true }`
5. `mcp__openchrome__interact { action: "click", target: { text: "Submit" }, capture_artifact: true }`
6. `mcp__openchrome__oc_skill_record { domain: "localhost", name: "form-flow", contract_id: "<id>" }`
7. **Reset**: `mcp__openchrome__page_reload`, then `mcp__openchrome__navigate` back to the fixture.
8. `mcp__openchrome__oc_skill_recall { domain: "localhost", name: "form-flow" }` → returns the skill with `replay_artifact` populated for all 4 steps.
9. `mcp__openchrome__oc_skill_replay { skill_id: <id> }` → `{ ok: true, steps_executed: 4, steps_total: 4 }`. Each `step_results[i].resolved_via` is **not** `text` (proves a more-robust strategy than the recorded text-match was used).

**Pass**: step 9 returns ok; the form is filled and submitted; **zero `mcp__openchrome__find` and zero `mcp__openchrome__read_page` calls between steps 8 and the end** (verified via `mcp__openchrome__oc_journal filter=tool=read_page,find since=<step9_start>`).

### Scenario 2 — artifact-resolution failure returns control to host
1. Repeat Scenario 1 steps 1–8.
2. Mutate the fixture page: rename the "Captcha" label to "Verification code".
3. `mcp__openchrome__oc_skill_replay { skill_id: <id> }`.

**Pass**: returns `{ ok: false, failure: { code: "ARTIFACT_RESOLUTION_FAILED", step_index: 2 } }`. Steps 0–1 executed (form has "Alice" and "a@b.co"). `evidence_bundle_path` populated. **No exception thrown.**

### Scenario 3 — P3 compliance (zero outbound)
1. Block all egress at the OS level (`pf` on macOS / `iptables` on Linux to a 127.0.0.1-only allow list).
2. Replay the skill from Scenario 1 (cached storage is local).

**Pass**: completes successfully. No DNS lookup, no TCP to non-loopback (verify via `tcpdump -n -i any` capture pinned in PR description as evidence).

### Scenario 4 — P2 byte-parity when capture_artifact omitted
1. Record a baseline trace running an automation **without** `capture_artifact` on any call.
2. Run the same automation against the merged PR build, also without `capture_artifact`.

**Pass**: `diff` on the `result.*` payload sections of the JSONL traces is empty.

### Scenario 5 — kill-switch
1. `OPENCHROME_SKILL_REPLAY=0 node dist/cli/index.js serve`.
2. `mcp__openchrome__oc_skill_replay { skill_id: <id> }`.

**Pass**: `{ ok: false, failure: { code: "DISABLED" } }`. `tools/list` still includes `oc_skill_replay` (schema parity).

### Scenario 6 — schema migration (v1 → v2 records)
1. Pre-seed `~/.openchrome/skill-memory/localhost/skills.json` with a v1.11 record (no `replay_artifact`).
2. `mcp__openchrome__oc_skill_recall { domain: "localhost" }` → returns the record with `replay_artifact: null` per step (no synthesis).
3. `mcp__openchrome__oc_skill_replay { skill_id: <v1_id> }`.

**Pass**: step 3 returns `code: "ARTIFACT_MISSING"`. Storage file remains valid JSON; no destructive write.

A reproducer for all six scenarios lives at `scripts/verify/skill-replay.mjs` and is referenced in the PR description.

## Out of scope

- Curator scoring change to use replay-success rate (separate small follow-up issue; this PR only emits the telemetry).
- Automatic artifact synthesis from v1 records (intentionally absent — keep migration trivial).
- Cross-domain artifact reuse.
- Multi-tab replay (single tab per replay call; multi-tab is host orchestration).
- Codegen export of replay scripts — owned by #836; the artifact format is intentionally a strict subset of what #836 can emit.

## Dependencies

- Soft-depends on #844 `nodeRef`: if landed, artifacts include a `node_ref` selector as the first try. If #844 unmerged at PR time, omit that strategy; artifact remains valid.
- Sibling: #820 (graph state-hash skipper). #820 decides *which* step list to execute; this issue executes *one* step deterministically. They compose: #820 → `oc_skill_replay` → host fallback.
- Sibling: #824 (auto-recall on navigate). Recall returns skills that this tool can immediately replay.

## Effort

M (~5–7 dev days). Artifact validator + recorder buffer + new tool + 4 modified action tools + 6 verification scenarios. No new native dep, no Chrome-launch changes.

## References

- HyperAgent action cache: `@hyperbrowser/agent` `src/agent/shared/run-cached-action.ts` (AGPL-3.0 — implementation idea only; no code copied).
- Internal comparison analysis: HyperAgent ↔ openchrome (chat thread, 2026-05-12).
- Sibling issues: #820 graph executor, #824 auto-recall, #836 codegen, #844 nodeRef contract.
- `docs/roadmap/portability-harness-contract.md` (P1–P5 portability-harness contract).



## Curated scope, overlap handling, and verification checklist

### Scope classification
- **Canonical lane:** deterministic skill replay artifacts.
- **Primary deliverable:** selector-chain replay layer for skill memory with recorded stable selectors and evidence-backed replay.
- **Open PR:** #925 (`feat/875-replay-layer`). Continue there; avoid duplicate work.
- **Non-goal:** blind replay, replacing skill intent memory, unsafe destructive re-execution, or requiring host LLM to re-resolve every target.

### Overlap and conflict resolution
- [ ] Coordinate with #824 auto-recall: recall may surface replayable skills, but this issue owns replay artifacts/execution layer.
- [ ] Coordinate with #836 codegen/export so replay scripts can be generated from the same artifacts.
- [ ] Coordinate with #834/#844 only for selector/ref durability; keep scope to skill-memory replay.

### Implementation checklist
- [ ] Extend skill record schema/backfill with selector chains, page signatures, action metadata, and evidence handles needed for deterministic replay.
- [ ] Add opt-in replay tool/path with safety checks for page signature, selector resolution, and risky actions.
- [ ] Reject replay with clear reason when selectors/page signatures do not match.
- [ ] Add tests for record/backfill, successful replay, stale selector rejection, page mismatch, risky action gate, and token savings evidence.
- [ ] Document replay eligibility and fallback to normal recall steps.

### Success criteria
- [ ] A recalled skill can replay deterministically when page and safety evidence match.
- [ ] Replay refuses stale/mismatched/risky conditions instead of guessing.
- [ ] Existing skill recall remains available for non-replayable skills.
- [ ] Token/latency cost of successful repeat workflows is reduced.

### Post-merge OpenChrome live verification checklist
- [ ] Record a local fixture skill, replay it on the same page, and verify deterministic success.
- [ ] Change page structure and verify replay is rejected with selector/page mismatch.
- [ ] Exercise a risky action fixture and verify safety gate/fallback.
- [ ] Capture replay artifact, reject reason, and before/after tool-call count.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): deterministic selector-chain replay layer for skill memory #875

Why

What

Background — verified facts in repo

Contract

Proposed Implementation

Boundary

Acceptance Criteria

Verification (post-merge, via openchrome MCP)

Setup

Scenario 1 — record-then-replay happy path

Scenario 2 — artifact-resolution failure returns control to host

Scenario 3 — P3 compliance (zero outbound)

Scenario 4 — P2 byte-parity when capture_artifact omitted

Scenario 5 — kill-switch

Scenario 6 — schema migration (v1 → v2 records)

Out of scope

Dependencies

Effort

References

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(core): deterministic selector-chain replay layer for skill memory #875

Description

Why

What

Background — verified facts in repo

Contract

Proposed Implementation

Boundary

Acceptance Criteria

Verification (post-merge, via openchrome MCP)

Setup

Scenario 1 — record-then-replay happy path

Scenario 2 — artifact-resolution failure returns control to host

Scenario 3 — P3 compliance (zero outbound)

Scenario 4 — P2 byte-parity when capture_artifact omitted

Scenario 5 — kill-switch

Scenario 6 — schema migration (v1 → v2 records)

Out of scope

Dependencies

Effort

References

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

Success criteria

Post-merge OpenChrome live verification checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions