test(harness): local playbook verification pack (#1044) by shaun0927 · Pull Request #1112 · shaun0927/openchrome

shaun0927 · 2026-05-12T15:36:57Z

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field	Value
Branch	`test/1044-playbook-live-pack` → `develop`
Draft	no
CI	⏳ 8/9 passing — 1 pending
Mergeable	✅ MERGEABLE
Review decision	—
Codex (latest)	—
Other reviewers (latest)	—
Head	`d3a8cb2` — Make browser playbooks reproducible evidence
Commits	1

_{Owner comment cleanup: 0 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.}

Summary

Closes #1044.

This PR keeps the #854 playbook runner as the single recipe surface and adds the missing local live-verification pack around it:

local fixture site under tests/fixtures/playbook/site/
three deterministic YAML recipes under tests/fixtures/playbook/recipes/
documented merge-time commands in docs/recipes/live-verification-playbooks.md
assert expansion aligned with oc_assert's { contract, evidence } input shape
same-tab tabId reuse in oc playbook run after a tool returns tabId, so reviewable playbooks do not hard-code ephemeral browser tab IDs

Direction / duplication review

feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854 is already merged and owns the declarative YAML runner; this PR does not introduce another runner, LLM step generator, or server-side orchestration tier.
Open PR scan before implementation showed no existing test(harness): live-verification playbook pack for browser recipes #1044 playbook live pack. The closest work was the already-merged feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts (#854) #933/feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854 runner.
The only runner behavior change is narrowly required for live browser verification: reuse returned tabId for later same-tab verbs when the YAML omits tabId; explicit tabId remains authoritative.

Success criteria covered

References and strengthens feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854 without duplicating runner scope.
Adds 3 runnable local playbooks: basic navigation, safe form, intentional failure/fail-fast.
Each playbook uses deterministic inline oc_assert contracts and explicit evidence snapshots.
Documents local fixture server and playbook commands.
Failure output includes step index, tool name, assertion failure details, and skipped downstream step.
Build, targeted tests, dependency-tier lint, and ESLint pass.
Live OpenChrome verification was run against a local fixture server.

Validation

Automated:

npm test -- --runTestsByPath tests/cli/playbook/expand.test.ts tests/cli/playbook/parse.test.ts tests/cli/playbook/run.test.ts tests/cli/playbook/live-fixtures.test.ts
npm run build
npm run lint:tier
npm run lint -- --quiet

Live OpenChrome smoke against local fixture server:

python3 -m http.server 8765 --directory tests/fixtures/playbook/site
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/basic-navigation.yaml --json
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/safe-form.yaml --json
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/failure-recovery.yaml --json
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/basic-navigation.yaml --json

Observed summaries:

basic:       ok=true  total=4 passed=4 failed=0 skipped=0
safe-form:  ok=true  total=5 passed=5 failed=0 skipped=0
failure:    ok=false total=3 passed=1 failed=1 skipped=1
basic rerun:ok=true  total=4 passed=4 failed=0 skipped=0

Intentional failure evidence included:

failed step index 1
tool oc_assert
error Step 1 (assert): assert verdict="fail"
failed assertion expected selector: h1, contains: This Text Is Intentionally Missing
actual text_preview: Playbook Fixture Home
downstream navigate step skipped

Non-goals preserved

No external websites or real accounts.
No LLM-based recipe interpretation.
No new harness/orchestration tier beyond the existing playbook CLI.

gemini-code-assist · 2026-05-12T15:37:03Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

qodo-code-review · 2026-05-12T15:37:03Z

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

chatgpt-codex-connector · 2026-05-12T15:37:07Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Add a local-only fixture pack and align playbook assertions with oc_assert's contract/evidence input shape so post-merge verification can exercise pass, safe-form, and fail-fast paths without external sites or secrets. Constraint: #1044 must validate the existing playbook runner without adding a second harness or LLM judgement path. Rejected: create a new recipe runner | duplicates the merged #854 playbook surface and would broaden maintenance risk. Confidence: high Scope-risk: narrow Directive: keep playbook verification fixtures local, deterministic, and explicit about oc_assert evidence snapshots. Tested: npm test -- --runTestsByPath tests/cli/playbook/expand.test.ts tests/cli/playbook/parse.test.ts tests/cli/playbook/run.test.ts tests/cli/playbook/live-fixtures.test.ts; npm run build; npm run lint:tier; npm run lint -- --quiet Not-tested: manual browser smoke with a live Chrome instance; documented commands are included for merge-time verification.

On Windows GH Actions runners, os.tmpdir() can return either the short (C:\Users\RUNNER~1\...) or long (C:\Users\runneradmin\...) form depending on the runner image. path.resolve() in cli/replay.ts is purely lexical and preserves whatever form it received, so when --out happens to be in a different form than what stdout returns, the replay-report test fails on `expect(stdout.trim()).toBe(destPath)`. Wrap mkdtempSync in fs.realpathSync so destPath is always in canonical form. Resolves the Windows-18 build-and-test failure on PR #1112.

* Make browser playbooks reproducible evidence Add a local-only fixture pack and align playbook assertions with oc_assert's contract/evidence input shape so post-merge verification can exercise pass, safe-form, and fail-fast paths without external sites or secrets. Constraint: #1044 must validate the existing playbook runner without adding a second harness or LLM judgement path. Rejected: create a new recipe runner | duplicates the merged #854 playbook surface and would broaden maintenance risk. Confidence: high Scope-risk: narrow Directive: keep playbook verification fixtures local, deterministic, and explicit about oc_assert evidence snapshots. Tested: npm test -- --runTestsByPath tests/cli/playbook/expand.test.ts tests/cli/playbook/parse.test.ts tests/cli/playbook/run.test.ts tests/cli/playbook/live-fixtures.test.ts; npm run build; npm run lint:tier; npm run lint -- --quiet Not-tested: manual browser smoke with a live Chrome instance; documented commands are included for merge-time verification. * test(cli/replay): canonicalize tmpDir to fix Windows short-path mismatch On Windows GH Actions runners, os.tmpdir() can return either the short (C:\Users\RUNNER~1\...) or long (C:\Users\runneradmin\...) form depending on the runner image. path.resolve() in cli/replay.ts is purely lexical and preserves whatever form it received, so when --out happens to be in a different form than what stdout returns, the replay-report test fails on `expect(stdout.trim()).toBe(destPath)`. Wrap mkdtempSync in fs.realpathSync so destPath is always in canonical form. Resolves the Windows-18 build-and-test failure on PR #1112.

shaun0927 mentioned this pull request May 13, 2026

test(harness): live-verification playbook pack for browser recipes #1044

Closed

26 tasks

shaun0927 force-pushed the test/1044-playbook-live-pack branch from fea2fac to d3a8cb2 Compare May 13, 2026 09:34

shaun0927 merged commit 423a92a into develop May 13, 2026
9 checks passed

shaun0927 mentioned this pull request May 13, 2026

feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854

Closed

48 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(harness): local playbook verification pack (#1044)#1112

test(harness): local playbook verification pack (#1044)#1112
shaun0927 merged 2 commits into
developfrom
test/1044-playbook-live-pack

shaun0927 commented May 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaun0927 commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress / Review status

Summary

Direction / duplication review

Success criteria covered

Validation

Non-goals preserved

Uh oh!

gemini-code-assist Bot commented May 12, 2026

Uh oh!

qodo-code-review Bot commented May 12, 2026

Qodo reviews are paused for this user.

Uh oh!

chatgpt-codex-connector Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shaun0927 commented May 12, 2026 •

edited

Loading