Skip to content

test(harness): local playbook verification pack (#1044)#1112

Merged
shaun0927 merged 2 commits into
developfrom
test/1044-playbook-live-pack
May 13, 2026
Merged

test(harness): local playbook verification pack (#1044)#1112
shaun0927 merged 2 commits into
developfrom
test/1044-playbook-live-pack

Conversation

@shaun0927
Copy link
Copy Markdown
Owner

@shaun0927 shaun0927 commented May 12, 2026

Progress / Review status

Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.

Field Value
Branch test/1044-playbook-live-packdevelop
Draft no
CI ⏳ 8/9 passing — 1 pending
Mergeable ✅ MERGEABLE
Review decision
Codex (latest)
Other reviewers (latest)
Head d3a8cb2 — Make browser playbooks reproducible evidence
Commits 1

Owner comment cleanup: 0 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.


Summary

Closes #1044.

This PR keeps the #854 playbook runner as the single recipe surface and adds the missing local live-verification pack around it:

  • local fixture site under tests/fixtures/playbook/site/
  • three deterministic YAML recipes under tests/fixtures/playbook/recipes/
  • documented merge-time commands in docs/recipes/live-verification-playbooks.md
  • assert expansion aligned with oc_assert's { contract, evidence } input shape
  • same-tab tabId reuse in oc playbook run after a tool returns tabId, so reviewable playbooks do not hard-code ephemeral browser tab IDs

Direction / duplication review

Success criteria covered

  • References and strengthens feat(cli): oc playbook — declarative YAML scenario runner with inline Outcome Contracts #854 without duplicating runner scope.
  • Adds 3 runnable local playbooks: basic navigation, safe form, intentional failure/fail-fast.
  • Each playbook uses deterministic inline oc_assert contracts and explicit evidence snapshots.
  • Documents local fixture server and playbook commands.
  • Failure output includes step index, tool name, assertion failure details, and skipped downstream step.
  • Build, targeted tests, dependency-tier lint, and ESLint pass.
  • Live OpenChrome verification was run against a local fixture server.

Validation

Automated:

npm test -- --runTestsByPath tests/cli/playbook/expand.test.ts tests/cli/playbook/parse.test.ts tests/cli/playbook/run.test.ts tests/cli/playbook/live-fixtures.test.ts
npm run build
npm run lint:tier
npm run lint -- --quiet

Live OpenChrome smoke against local fixture server:

python3 -m http.server 8765 --directory tests/fixtures/playbook/site
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/basic-navigation.yaml --json
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/safe-form.yaml --json
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/failure-recovery.yaml --json
node dist/cli/index.js playbook run tests/fixtures/playbook/recipes/basic-navigation.yaml --json

Observed summaries:

basic:       ok=true  total=4 passed=4 failed=0 skipped=0
safe-form:  ok=true  total=5 passed=5 failed=0 skipped=0
failure:    ok=false total=3 passed=1 failed=1 skipped=1
basic rerun:ok=true  total=4 passed=4 failed=0 skipped=0

Intentional failure evidence included:

  • failed step index 1
  • tool oc_assert
  • error Step 1 (assert): assert verdict="fail"
  • failed assertion expected selector: h1, contains: This Text Is Intentionally Missing
  • actual text_preview: Playbook Fixture Home
  • downstream navigate step skipped

Non-goals preserved

  • No external websites or real accounts.
  • No LLM-based recipe interpretation.
  • No new harness/orchestration tier beyond the existing playbook CLI.

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Repo admins can enable using credits for code reviews in their settings.

Add a local-only fixture pack and align playbook assertions with oc_assert's contract/evidence input shape so post-merge verification can exercise pass, safe-form, and fail-fast paths without external sites or secrets.

Constraint: #1044 must validate the existing playbook runner without adding a second harness or LLM judgement path.

Rejected: create a new recipe runner | duplicates the merged #854 playbook surface and would broaden maintenance risk.

Confidence: high

Scope-risk: narrow

Directive: keep playbook verification fixtures local, deterministic, and explicit about oc_assert evidence snapshots.

Tested: npm test -- --runTestsByPath tests/cli/playbook/expand.test.ts tests/cli/playbook/parse.test.ts tests/cli/playbook/run.test.ts tests/cli/playbook/live-fixtures.test.ts; npm run build; npm run lint:tier; npm run lint -- --quiet

Not-tested: manual browser smoke with a live Chrome instance; documented commands are included for merge-time verification.
@shaun0927 shaun0927 force-pushed the test/1044-playbook-live-pack branch from fea2fac to d3a8cb2 Compare May 13, 2026 09:34
On Windows GH Actions runners, os.tmpdir() can return either the short
(C:\Users\RUNNER~1\...) or long (C:\Users\runneradmin\...) form depending
on the runner image. path.resolve() in cli/replay.ts is purely lexical
and preserves whatever form it received, so when --out happens to be in
a different form than what stdout returns, the replay-report test fails
on `expect(stdout.trim()).toBe(destPath)`.

Wrap mkdtempSync in fs.realpathSync so destPath is always in canonical
form. Resolves the Windows-18 build-and-test failure on PR #1112.
@shaun0927 shaun0927 merged commit 423a92a into develop May 13, 2026
9 checks passed
shaun0927 added a commit that referenced this pull request May 13, 2026
* Make browser playbooks reproducible evidence

Add a local-only fixture pack and align playbook assertions with oc_assert's contract/evidence input shape so post-merge verification can exercise pass, safe-form, and fail-fast paths without external sites or secrets.

Constraint: #1044 must validate the existing playbook runner without adding a second harness or LLM judgement path.

Rejected: create a new recipe runner | duplicates the merged #854 playbook surface and would broaden maintenance risk.

Confidence: high

Scope-risk: narrow

Directive: keep playbook verification fixtures local, deterministic, and explicit about oc_assert evidence snapshots.

Tested: npm test -- --runTestsByPath tests/cli/playbook/expand.test.ts tests/cli/playbook/parse.test.ts tests/cli/playbook/run.test.ts tests/cli/playbook/live-fixtures.test.ts; npm run build; npm run lint:tier; npm run lint -- --quiet

Not-tested: manual browser smoke with a live Chrome instance; documented commands are included for merge-time verification.

* test(cli/replay): canonicalize tmpDir to fix Windows short-path mismatch

On Windows GH Actions runners, os.tmpdir() can return either the short
(C:\Users\RUNNER~1\...) or long (C:\Users\runneradmin\...) form depending
on the runner image. path.resolve() in cli/replay.ts is purely lexical
and preserves whatever form it received, so when --out happens to be in
a different form than what stdout returns, the replay-report test fails
on `expect(stdout.trim()).toBe(destPath)`.

Wrap mkdtempSync in fs.realpathSync so destPath is always in canonical
form. Resolves the Windows-18 build-and-test failure on PR #1112.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant