test(harness): add bounded parallel runner with straggler reporting#1097
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
ⓘ You've reached your Qodo monthly free-tier limit. Reviews pause until next month — upgrade your plan to continue now, or link your paid account if you already have one. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf1951c9f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const timer = setTimeout(resolve, ms); | ||
| const abort = () => { | ||
| clearTimeout(timer); | ||
| reject(new Error('aborted')); | ||
| }; | ||
| signal?.addEventListener('abort', abort, { once: true }); |
There was a problem hiding this comment.
Detach abort handler when sleep settles
The sleep helper adds an abort listener but never removes it when the timer resolves normally, so repeated sleep(..., sameSignal) calls accumulate listeners until the signal is eventually aborted (or never aborted), which can trigger MaxListenersExceededWarning and unnecessary memory growth in long-running harness scenarios. This is especially relevant for polling/retry loops that reuse one AbortSignal across many sleeps.
Useful? React with 👍 / 👎.
Add a test-only parallel runner and smoke command so long-running harness suites can report stragglers, preserve partial evidence, and stop after error budgets without changing production OpenChrome behavior. Constraint: Issue #1051 requires a dev/test harness only, with no DSPy/Python dependency and no production scheduling changes. Rejected: Reusing production workflow/session orchestration | would couple certification mechanics to server runtime behavior. Confidence: high Scope-risk: narrow Directive: Keep this runner under tests/harness; do not import it from src or production CLI paths. Tested: npx jest tests/harness/parallel-runner.test.ts --runInBand; npm run harness:parallel-smoke -- --concurrency 3 --include-straggler-fixture --output artifacts/harness-parallel/latest.json; timeout and max-errors smoke jq checks; npm run build; npm run lint:tier. Not-tested: Full npm test suite. Co-authored-by: OmX <omx@oh-my-codex.dev>
bf1951c to
07f4508
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 07f450838a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (!this.options.partialWritePath) return; | ||
| const filePath = this.options.partialWritePath; | ||
| const snapshot = JSON.stringify({ parallel: result }, null, 2); | ||
| this.writeChain = this.writeChain.then(async () => { |
There was a problem hiding this comment.
Recover write queue after partial-write failures
A single filesystem error during writePartial permanently poisons this.writeChain: once one write rejects, this.writeChain.then(...) at this line never runs future write callbacks, so later partial writes are skipped and the final await this.writePartial(result) in run() can fail even after scenarios completed. This is especially problematic because intermediate write failures are intentionally swallowed in the runner loop, so a transient write error can still surface as a terminal run failure at the end.
Useful? React with 👍 / 👎.
Address Codex P2: when sleep() resolved normally, the abort listener was never detached, leaking listeners across repeated sleep(..., signal) calls reusing the same AbortSignal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d0fd4855e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }; | ||
|
|
||
| try { | ||
| const value = await scenario.run(controller.signal); |
There was a problem hiding this comment.
Enforce scenario timeout by racing scenario promise
The timeout logic only calls controller.abort() but still awaits scenario.run(controller.signal) directly, so a scenario that does not observe the signal (for example, any long promise that ignores abort) will never transition to timedOut and can block the whole runner indefinitely. This breaks the advertised per-scenario timeout and can stall bounded parallel runs in real harness failures where user code is non-cooperative.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3d0fd4855e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| }; | ||
|
|
||
| try { | ||
| const value = await scenario.run(controller.signal); |
There was a problem hiding this comment.
Enforce scenario timeout independently of cooperative abort
The runner's timeout only triggers controller.abort(), but this line still awaits scenario.run(...) directly, so any scenario that ignores the AbortSignal (or is stuck in non-abortable work) can run far past scenarioTimeoutMs and even be recorded as completed rather than timedOut. In those cases maxErrors cancellation may never make progress and the entire harness run can hang; this needs a hard timeout race that settles the scenario result on deadline even when the scenario does not cooperate.
Useful? React with 👍 / 👎.
Progress / Review status
Auto-refreshed 2026-05-13 — owner comments cleaned up to reduce review noise.
test/1051-harness-parallel-runner→develop3d0fd48— fix(harness): remove sleep abort listener on settleOwner comment cleanup: 1 issue + 0 inline review comments deleted. Outstanding feedback from automated/external reviewers above is unchanged.
Summary
HarnessParallelRunnerwith bounded concurrency, per-scenario timeouts, max-error cancellation, straggler reporting, active-scenario abort, cleanup hooks, and atomic partial JSON writes.npm run harness:parallel-smokewith deterministic fast, straggler, timeout, and failure fixtures for post-merge OpenChrome validation.Direction / scope fit
tests/harnessplus one package script; it does not affect production OpenChrome runtime paths.Closes #1051
Verification
npx jest tests/harness/parallel-runner.test.ts --runInBandnpm run harness:parallel-smoke -- --concurrency 3 --include-straggler-fixture --output artifacts/harness-parallel/latest.jsonjqchecks for straggler evidence and no timeout inlatest.jsonnpm run harness:parallel-smoke -- --include-timeout-fixture --scenario-timeout-ms 3000 --output artifacts/harness-parallel/timeout.json || truejqtimeout evidence checknpm run harness:parallel-smoke -- --include-failing-fixtures --max-errors 1 --output artifacts/harness-parallel/max-errors.json || truejqfailed/cancelled evidence checknpm run buildnpm run lint:tier