Skip to content

test(harness): live-verification playbook pack for browser recipes #1044

@shaun0927

Description

@shaun0927

Why

The Goose comparison highlighted recipes/sub-recipes as a useful way to reduce repeated planning and make long-running work reproducible. OpenChrome already has the right implementation issue for this: #854 (oc playbook — declarative YAML scenario runner with inline Outcome Contracts). Creating a second recipe runner would be duplicate scope.

This issue adds the missing hardening layer around #854: a small, merge-blocking live-verification pack that proves browser recipes are useful as deterministic OpenChrome harness artifacts after implementation.

Scope

After #854 lands, add runnable browser recipe fixtures and verification documentation that exercise the playbook runner through a real OpenChrome MCP server.

Required fixtures

Add a small local fixture site under tests/fixtures/playbook/ or equivalent:

  • index.html: links to a details page and contains a safe form.
  • details.html: has stable text and at least one interactive control.
  • optional submit.html: receives local-only form navigation; no external network side effects.

Required playbooks

Add 3 reviewable YAML playbooks under docs/recipes/ or tests/fixtures/playbook/recipes/:

  1. basic-navigation.yaml — navigate, read/assert title/text, follow a link, assert URL/text.
  2. safe-form.yaml — fill a local form, submit to a local fixture endpoint/page, assert post-submit state.
  3. failure-recovery.yaml — intentionally assert a missing element first, then demonstrate fail-fast output with enough evidence for the host to recover.

Each playbook must use inline Outcome Contracts instead of LLM judgement.

Non-goals

Acceptance criteria

  • The issue references feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854 and does not duplicate its runner implementation scope.
  • At least 3 runnable playbooks exist and are committed as text fixtures.
  • Each playbook has deterministic expected outcomes through oc_assert or equivalent contracts.
  • A documented command or test path exists for serving the local fixture and running the playbooks.
  • Failure output includes step index, tool name, assertion failure details, and enough page state to reproduce.
  • npm run build, targeted playbook tests, and npm run lint:tier pass.

Merge-blocking live verification with OpenChrome

After #854 and this issue are implemented, the PR must include live verification against a real OpenChrome MCP server:

  1. Build OpenChrome.
  2. Start the local fixture server.
  3. Run basic-navigation.yaml through oc playbook run and confirm all assertions pass.
  4. Run safe-form.yaml and confirm the local-only submit path succeeds without external network or account side effects.
  5. Run failure-recovery.yaml and confirm it fails at the intended step with structured evidence.
  6. Re-run basic-navigation.yaml to prove the failure playbook did not corrupt the browser/session state.

Attach console output or an MCP transcript showing all three playbooks.

Self-review checklist for implementer

Curated scope, overlap handling, and verification checklist

Scope classification

Overlap and conflict resolution

Implementation checklist

  • Add a minimal set of YAML playbooks covering navigation/read/assertion, form or interaction flow, and a failure/diagnostic path.
  • Add local fixture pages/data needed by those playbooks.
  • Wire the playbooks into the repo's live-verification or harness command only after feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854's runner surface exists.
  • Document how to run the pack locally and what artifacts/logs indicate success.
  • Add tests/static validation for playbook schema and fixture availability.

Success criteria

Post-merge OpenChrome live verification checklist

  • Run the documented oc playbook/harness command against the local fixture pack.
  • Verify all expected passing playbooks succeed and the intentional diagnostic case reports the expected failure shape.
  • Confirm artifacts include playbook name, fixture URL/path, assertion results, and failure diagnostics.
  • Add command output summary and artifact paths to merge verification notes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityenhancementNew feature or requestharnessExecution harness, run lifecycle, recovery, and verificationhost-integrationWires module cores into host (CDP, MCP, tools, transports, OS APIs)live-verificationRequires live OpenChrome/browser validation after implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions