feat(benchmarks): Add Claude UI benchmark harness · getsentry/XcodeBuildMCP@93b21e7

Triggered via pull request May 23, 2026 11:30

cameroncooke

opened #427

cam/feat/claude-ui-benchmark-harness

Status Success

Total duration 16m 48s

Artifacts –

warden.yml

on: pull_request

review

16m 45s

Annotations

4 errors and 5 warnings

Hardcoded local filesystem path for parser script will silently fail on all other machines: src/benchmarks/claude-ui/harness.ts#L34

The `parserPath` constant is hardcoded to `/Volumes/Developer/parse_claude_conversation.py`, an absolute path specific to one developer's machine; on any other machine `python3` will exit non-zero and `runParser` will silently return a failure code rather than throwing, causing benchmark runs to proceed with no parsed transcript data.

[ZCJ-JRN] Hardcoded local filesystem path for parser script will silently fail on all other machines (additional location): src/benchmarks/claude-ui/types.ts#L34

The `parserPath` constant is hardcoded to `/Volumes/Developer/parse_claude_conversation.py`, an absolute path specific to one developer's machine; on any other machine `python3` will exit non-zero and `runParser` will silently return a failure code rather than throwing, causing benchmark runs to proceed with no parsed transcript data.

Parser script path is hardcoded to a single developer's local filesystem: src/benchmarks/claude-ui/harness.ts#L436

`parserPath` is set to `'/Volumes/Developer/parse_claude_conversation.py'` (line 34), so `runParser` will silently produce a non-zero exit code on every other contributor's machine, causing every benchmark run to report `parserExitCode ≠ 0` and mark the suite as failed.

[JKX-PK2] Parser script path is hardcoded to a single developer's local filesystem (additional location): src/benchmarks/claude-ui/harness.ts#L34

`parserPath` is set to `'/Volumes/Developer/parse_claude_conversation.py'` (line 34), so `runParser` will silently produce a non-zero exit code on every other contributor's machine, causing every benchmark run to report `parserExitCode ≠ 0` and mark the suite as failed.

Temporary simulator leaked if boot/bootstatus/open fails after create succeeds: src/benchmarks/claude-ui/simulator-lifecycle.ts#L269

If `simctl create` succeeds but any subsequent step (`boot`, `bootstatus`, or `open`) throws, `prepareTemporarySimulator` propagates the error without returning the simulator object, so `temporarySimulator` in the harness stays `undefined` and the `finally` cleanup block never calls `deleteTemporarySimulator`, leaving the simulator orphaned.

Timeout error thrown after successful prompt dismissal when deadline is crossed during last operation: src/benchmarks/claude-ui/first-run-preflight.ts#L199

The `if (timing.now() >= deadline)` check after the `while` loop fires even when the loop exited via `break` (the success path), so if the last `describe-ui` call or `tap` + `sleep(500)` pushes `timing.now()` past the deadline, a spurious timeout error is thrown despite all prompts being dismissed successfully. Use a flag to distinguish a deadline-expired exit from a `not-found` break.

Log write failure causes `runLoggedCommand` to discard a successful command result: src/benchmarks/claude-ui/simulator-lifecycle.ts#L186

If `appendLifecycleLog` throws (e.g. disk full, bad permissions), the `.catch(reject)` on line 187 rejects the entire promise even though the subprocess already succeeded — the caller (e.g. `prepareTemporarySimulator`) will see an I/O error and abort the benchmark instead of receiving the valid `exitCode`/`stdout`. Consider using `.then(() => resolve(result)).catch(() => resolve(result))` or the existing `tryAppendLifecycleLog` helper so a log failure never discards the command result.

Simulator leaked when post-create steps fail inside `prepareTemporarySimulator`: src/benchmarks/claude-ui/simulator-lifecycle.ts#L270

If `simctl boot`, `simctl bootstatus`, or `open Simulator.app` throws or exits non-zero, the function propagates the error before returning, leaving `temporarySimulator` unassigned in `runSuite`; the `finally` block's `if (temporarySimulator)` guard then skips cleanup and the newly created simulator is never deleted.

review

Node.js 20 actions are deprecated. The following actions are running on Node.js 20 and may not work as expected: actions/checkout@v4. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026. Please check if updated versions of these actions are available that support Node.js 24. To opt into Node.js 24 now, set the FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true environment variable on the runner or in your workflow file. Once Node.js 24 becomes the default, you can temporarily opt out by setting ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION=true. For more information see: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(benchmarks): Add Claude UI benchmark harness #145

Summary

feat(benchmarks): Add Claude UI benchmark harness #145

Uh oh!

warden.yml

Annotations