Skip to content

feat(benchmarks): Add Claude UI benchmark harness #145

feat(benchmarks): Add Claude UI benchmark harness

feat(benchmarks): Add Claude UI benchmark harness #145

Triggered via pull request May 23, 2026 11:30
Status Success
Total duration 16m 48s
Artifacts

warden.yml

on: pull_request
Fit to window
Zoom out
Zoom in

Annotations

4 errors and 5 warnings
Hardcoded local filesystem path for parser script will silently fail on all other machines: src/benchmarks/claude-ui/harness.ts#L34
The `parserPath` constant is hardcoded to `/Volumes/Developer/parse_claude_conversation.py`, an absolute path specific to one developer's machine; on any other machine `python3` will exit non-zero and `runParser` will silently return a failure code rather than throwing, causing benchmark runs to proceed with no parsed transcript data.
[ZCJ-JRN] Hardcoded local filesystem path for parser script will silently fail on all other machines (additional location): src/benchmarks/claude-ui/types.ts#L34
The `parserPath` constant is hardcoded to `/Volumes/Developer/parse_claude_conversation.py`, an absolute path specific to one developer's machine; on any other machine `python3` will exit non-zero and `runParser` will silently return a failure code rather than throwing, causing benchmark runs to proceed with no parsed transcript data.
Parser script path is hardcoded to a single developer's local filesystem: src/benchmarks/claude-ui/harness.ts#L436
`parserPath` is set to `'/Volumes/Developer/parse_claude_conversation.py'` (line 34), so `runParser` will silently produce a non-zero exit code on every other contributor's machine, causing every benchmark run to report `parserExitCode ≠ 0` and mark the suite as failed.
[JKX-PK2] Parser script path is hardcoded to a single developer's local filesystem (additional location): src/benchmarks/claude-ui/harness.ts#L34
`parserPath` is set to `'/Volumes/Developer/parse_claude_conversation.py'` (line 34), so `runParser` will silently produce a non-zero exit code on every other contributor's machine, causing every benchmark run to report `parserExitCode ≠ 0` and mark the suite as failed.
Temporary simulator leaked if boot/bootstatus/open fails after create succeeds: src/benchmarks/claude-ui/simulator-lifecycle.ts#L269
If `simctl create` succeeds but any subsequent step (`boot`, `bootstatus`, or `open`) throws, `prepareTemporarySimulator` propagates the error without returning the simulator object, so `temporarySimulator` in the harness stays `undefined` and the `finally` cleanup block never calls `deleteTemporarySimulator`, leaving the simulator orphaned.
Timeout error thrown after successful prompt dismissal when deadline is crossed during last operation: src/benchmarks/claude-ui/first-run-preflight.ts#L199
The `if (timing.now() >= deadline)` check after the `while` loop fires even when the loop exited via `break` (the success path), so if the last `describe-ui` call or `tap` + `sleep(500)` pushes `timing.now()` past the deadline, a spurious timeout error is thrown despite all prompts being dismissed successfully. Use a flag to distinguish a deadline-expired exit from a `not-found` break.
Log write failure causes `runLoggedCommand` to discard a successful command result: src/benchmarks/claude-ui/simulator-lifecycle.ts#L186
If `appendLifecycleLog` throws (e.g. disk full, bad permissions), the `.catch(reject)` on line 187 rejects the entire promise even though the subprocess already succeeded — the caller (e.g. `prepareTemporarySimulator`) will see an I/O error and abort the benchmark instead of receiving the valid `exitCode`/`stdout`. Consider using `.then(() => resolve(result)).catch(() => resolve(result))` or the existing `tryAppendLifecycleLog` helper so a log failure never discards the command result.
Simulator leaked when post-create steps fail inside `prepareTemporarySimulator`: src/benchmarks/claude-ui/simulator-lifecycle.ts#L270
If `simctl boot`, `simctl bootstatus`, or `open Simulator.app` throws or exits non-zero, the function propagates the error before returning, leaving `temporarySimulator` unassigned in `runSuite`; the `finally` block's `if (temporarySimulator)` guard then skips cleanup and the newly created simulator is never deleted.
review
Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/