feat: add benchmark framework for collection mount performance#7915
feat: add benchmark framework for collection mount performance#7915chirag-bruno wants to merge 6 commits intousebruno:mainfrom
Conversation
- Playwright benchmark tests measuring collection mount time across bru/yml formats and sizes (50-5000 requests) - IPC listener approach for precise mount-complete signal - Generic benchmark utils: stats, results I/O, baseline comparison, PR commenting - Collection generator using @usebruno/filestore serializers - CI workflow running on ubuntu, macos, and windows with PR comment reporting - Regression detection against committed baselines with configurable threshold
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR introduces a comprehensive benchmarking infrastructure for measuring collection mount performance across operating systems. It adds GitHub Actions workflows, Playwright benchmark configuration, collection generation utilities, statistical helpers, result comparison tooling, PR integration, and baseline metrics for Ubuntu, macOS, and Windows. ChangesBenchmark Infrastructure & Test Suite
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant CAct as Composite Action
participant Pw as Playwright Runner
participant Bench as Benchmark Test
participant CmpJS as compare.js
participant CommentJS as pr-comment.js
participant GHApi as GitHub API
GHA->>CAct: Trigger with os & update-baseline
CAct->>Pw: Run benchmarks (with xvfb on Linux)
Pw->>Bench: Execute mount tests
Bench->>Bench: Measure timing, generate results.json
CAct->>CmpJS: Load results & baseline
alt update-baseline == true
CmpJS->>CmpJS: Compute new baseline from results
CmpJS->>Pw: Write baseline.json
else compare mode
CmpJS->>CmpJS: Compare against baseline
CmpJS->>CmpJS: Check for regressions
end
GHA->>GHA: Upload results & baseline artifacts
alt Pull Request event
GHA->>CommentJS: Load results & baseline
CommentJS->>CommentJS: Build comparison table
CommentJS->>GHApi: Post/update PR comment
GHApi->>GHApi: Display benchmark results
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (5)
tests/benchmarks/mounting/collection-mount.bench.ts (2)
81-81: ⚡ Quick winReplace
page.waitForTimeout(500)with a deterministic wait.Per coding guidelines,
page.waitForTimeout()should only be used when no suitableexpect()locator-based wait is available.If
closeAllCollectionsleaves a detectable UI change (e.g., the sidebar collection list becomes empty), prefer:- await page.waitForTimeout(500); + await expect(page.locator('#sidebar-collection-name')).toHaveCount(0);If there's genuinely no stable locator to await after close, add a brief comment justifying the timeout.
As per coding guidelines: "Try to reduce usage of
page.waitForTimeout();in code unless absolutely necessary and the locator cannot be found using existingexpect()playwright calls."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/mounting/collection-mount.bench.ts` at line 81, Replace the non-deterministic sleep (page.waitForTimeout(500)) in the collection-mount benchmark with a locator-based expect or a justified comment: after calling closeAllCollections() wait for a concrete UI change (e.g., expect(sidebarCollectionList).toBeEmpty() or expect(collectionItem.locator('text=...')).not.toBeVisible()) that signals completion; if there truly is no stable DOM change to await, keep a very brief comment explaining why a fixed timeout is required and reduce the timeout to the minimum necessary while noting the reason.
70-95: ⚡ Quick winWrap test body steps with
test.stepfor clearer reports.The inner test body performs distinct phases — generate, measure (× N), record, log — but has no
test.stepwrapping, making failure traces harder to read.♻️ Example restructuring with test.step
test(`mount ${format} collection with ${size} requests`, async ({ page, electronApp, createTmpDir }) => { test.setTimeout((2 + Math.ceil(size / 100) * 2) * 60_000); const timings: number[] = []; for (let i = 0; i < ITERATIONS_PER_SIZE; i++) { - const collectionName = `bench-${format}-${size}-iter-${i}`; - const collectionDir = await createTmpDir(`bench-${format}-${size}-${i}`); - generateCollection({ dir: collectionDir, name: collectionName, requestCount: size, format }); - - const elapsed = await measureCollectionMount(page, electronApp, collectionDir, collectionName); - timings.push(Math.round(elapsed)); - await page.waitForTimeout(500); + await test.step(`iteration ${i + 1}`, async () => { + const collectionName = `bench-${format}-${size}-iter-${i}`; + const collectionDir = await createTmpDir(`bench-${format}-${size}-${i}`); + await test.step('generate collection', () => { + generateCollection({ dir: collectionDir, name: collectionName, requestCount: size, format }); + }); + const elapsed = await test.step('measure mount', () => + measureCollectionMount(page, electronApp, collectionDir, collectionName) + ); + timings.push(Math.round(elapsed)); + }); }As per coding guidelines: "Promote the use of
test.stepas much as possible so the generated reports are easier to read."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/mounting/collection-mount.bench.ts` around lines 70 - 95, The test body for the `test("mount ${format} collection with ${size} requests", ...)` case should be split into explicit Playwright `test.step` calls to improve report clarity: wrap the collection generation loop (calls to `generateCollection` and `createTmpDir`) in a "generate collections" step, wrap each measurement call to `measureCollectionMount` (and the per-iteration wait) inside a "measure mount (iteration i)" step or a single "measure mounts" step, and wrap result aggregation/annotation (uses of `timings`, `resultKey`, `summarize`, and `test.info().annotations.push`) and the final console log into a "record results" step; update the test body (the anonymous async function passed to `test(...)`) to call `await test.step(name, async () => { ... })` around those phases so failures show as separate steps while preserving existing logic and variables.tests/benchmarks/utils/compare.js (1)
62-62: ⚡ Quick win
baseline.collectionsfallback is undocumented dead code.The baseline schema only defines
entries; thebaseline.collectionsfallback has no corresponding writer and will silently swallow a missingentrieskey instead of failing loudly.-const baselineEntries = baseline.entries || baseline.collections || {}; +const baselineEntries = baseline.entries || {};🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/utils/compare.js` at line 62, Replace the undocumented fallback that uses baseline.collections with a strict requirement for baseline.entries: change the initialization of baselineEntries (the variable named baselineEntries) to only read baseline.entries and add a guard that throws (or returns a clear error) when baseline.entries is missing so the function fails loudly instead of silently using a dead-path; this ensures the code path in compare.js validates the expected schema rather than swallowing a missing entries key.tests/benchmarks/utils/pr-comment.js (1)
29-33: ⚡ Quick win
pctis a string — parse it before comparing to numbers.
.toFixed(1)returns astring, so all comparisons on lines 30–31 and 33 (pct > threshold,pct < -threshold,pct > 0) silently rely on JS coercion. Whenbase.meanis0,pctbecomes"Infinity"(or potentially"NaN") and the coercion is unreliable. Keeppctnumeric; format only at the point of interpolation.♻️ Suggested fix
- const pct = ((data.mean - base.mean) / base.mean * 100).toFixed(1); - const status = pct > threshold ? '🔴 REGRESSION' : pct < -threshold ? '🟢 IMPROVED' : '✅ OK'; - if (pct > threshold) hasRegression = true; - - body += `| ${key} | ${Math.round(data.mean)} | ${base.mean} | ${pct > 0 ? '+' : ''}${pct}% | ${status} |\n`; + const pct = (data.mean - base.mean) / base.mean * 100; + const pctStr = pct.toFixed(1); + const status = pct > threshold ? '🔴 REGRESSION' : pct < -threshold ? '🟢 IMPROVED' : '✅ OK'; + if (pct > threshold) hasRegression = true; + + body += `| ${key} | ${Math.round(data.mean)} | ${base.mean} | ${pct > 0 ? '+' : ''}${pctStr}% | ${status} |\n`;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/utils/pr-comment.js` around lines 29 - 33, pct is being set to a string via .toFixed(1) and then compared numerically; change the code so pct is a numeric value used for comparisons (compute pct = (data.mean - base.mean) / base.mean * 100 without .toFixed), perform numeric comparisons against threshold and -threshold (affecting the checks using pct in the hasRegression assignment and status calculation), and only format pct with toFixed(1) when building the output interpolation for body (e.g., use a formattedPct variable for the string). Ensure the same named symbols are updated: pct (numeric), hasRegression, status, and the body interpolation where `${pct}` is inserted.tests/benchmarks/mounting/baseline.json (1)
1-46: ⚡ Quick winPlaceholder baseline values —
bru-*andyml-*entries are identical, which may mask format-specific regressions.All ten entries share the same
mean/p50figures (e.g., bothbru-50andyml-50havemean: 2000). Committing equal baselines for two formats that likely have distinct I/O characteristics means the comparison is not meaningful until real numbers replace these. Consider running the suite once on a reference machine and replacing these with the actual measured medians before merging, or at minimum add a prominent"status": "placeholder"field and fail CI baseline comparison when that flag is present.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/benchmarks/mounting/baseline.json` around lines 1 - 46, The baseline file currently uses placeholder identical values for all entries (see entries "bru-50"..."bru-5000" and "yml-50"..."yml-5000"), which masks format-specific regressions; replace these placeholders by running the benchmark on a reference machine and updating the entries with the real measured mean/p50 (use the provided script node tests/benchmarks/mounting/compare.js --update-baseline) so each "bru-*" and "yml-*" has accurate numbers, or if you cannot run benchmarks yet add a top-level "status":"placeholder" flag in the JSON and update CI to fail baseline comparisons when status is "placeholder" to prevent merging until real values are recorded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 61-73: The "Comment Benchmark Results on PR" workflow step (uses:
actions/github-script@v7 and calling run({ resultsPath:
'tests/benchmarks/results/mounting.json', baselinePath:
'tests/benchmarks/mounting/baseline.json', title: 'Benchmark Results —
Collection Mount' })) can fail with 403 for forked PRs; update that step to
include continue-on-error: true to avoid failing the whole job on permission
errors (alternatively replace the event with a secure
pull_request_target/workflow_run pattern if you need to support writing comments
from privileged runs).
In `@tests/benchmarks/utils/collection-generator.ts`:
- Around line 46-47: The ymlContent produced by stringifyCollection may be
null/undefined and is written directly via fs.writeFileSync, so add a fallback
like the existing one for the 'bru' output: compute ymlContent =
stringifyCollection(...) || `meta: { name: "${name}", type: "collection",
opencollection: "1.0.0" }` (or a minimal valid YAML string), then pass that safe
ymlContent to fs.writeFileSync; update references to stringifyCollection and
ymlContent accordingly so opencollection.yml never contains "undefined"/"null".
In `@tests/benchmarks/utils/compare.js`:
- Line 21: The file uses ESM import syntax (import { existsSync, readFileSync,
writeFileSync } from 'fs') inside tests/benchmarks/utils/compare.js which causes
a SyntaxError under default CommonJS Node; either rename compare.js →
compare.mjs and update .github/workflows/benchmarks.yml to invoke the .mjs path
(Option A), or convert the file to CommonJS by replacing the import with const {
existsSync, readFileSync, writeFileSync } = require('fs') and ensure any other
ESM syntax is changed accordingly (Option B); pick one option and make the
matching change to the CI workflow if you rename the file.
In `@tests/benchmarks/utils/pr-comment.js`:
- Around line 77-78: The code reads baselinePath without checking existence,
causing an uncaught ENOENT; wrap the baseline read in the same guard/handling
used for resultsPath — either check fs.existsSync(baselinePath) before
JSON.parse or perform the fs.readFileSync/JSON.parse inside a try/catch, and on
error emit a clear log/error message and exit or return; update the lines that
call JSON.parse(fs.readFileSync(baselinePath, 'utf-8')) (and any surrounding
logic using baseline) to use this existence check or try/catch so missing or
invalid baseline files produce a helpful message instead of an unhandled
exception.
- Around line 46-52: The current call using github.rest.issues.listComments(...)
and then searching comments.find(...) can miss the existing marker on PRs with
>100 comments; update the call to either include per_page: 100 and handle
pagination, or replace the listComments call with
github.paginate(github.rest.issues.listComments, { owner: context.repo.owner,
repo: context.repo.repo, issue_number: context.issue.number, per_page: 100 })
and then search the returned array for the marker (remove the .data unwrap since
paginate returns items directly); adjust the variable currently named comments
and the comments.find(...) usage accordingly so duplicate benchmark comments are
not created.
---
Nitpick comments:
In `@tests/benchmarks/mounting/baseline.json`:
- Around line 1-46: The baseline file currently uses placeholder identical
values for all entries (see entries "bru-50"..."bru-5000" and
"yml-50"..."yml-5000"), which masks format-specific regressions; replace these
placeholders by running the benchmark on a reference machine and updating the
entries with the real measured mean/p50 (use the provided script node
tests/benchmarks/mounting/compare.js --update-baseline) so each "bru-*" and
"yml-*" has accurate numbers, or if you cannot run benchmarks yet add a
top-level "status":"placeholder" flag in the JSON and update CI to fail baseline
comparisons when status is "placeholder" to prevent merging until real values
are recorded.
In `@tests/benchmarks/mounting/collection-mount.bench.ts`:
- Line 81: Replace the non-deterministic sleep (page.waitForTimeout(500)) in the
collection-mount benchmark with a locator-based expect or a justified comment:
after calling closeAllCollections() wait for a concrete UI change (e.g.,
expect(sidebarCollectionList).toBeEmpty() or
expect(collectionItem.locator('text=...')).not.toBeVisible()) that signals
completion; if there truly is no stable DOM change to await, keep a very brief
comment explaining why a fixed timeout is required and reduce the timeout to the
minimum necessary while noting the reason.
- Around line 70-95: The test body for the `test("mount ${format} collection
with ${size} requests", ...)` case should be split into explicit Playwright
`test.step` calls to improve report clarity: wrap the collection generation loop
(calls to `generateCollection` and `createTmpDir`) in a "generate collections"
step, wrap each measurement call to `measureCollectionMount` (and the
per-iteration wait) inside a "measure mount (iteration i)" step or a single
"measure mounts" step, and wrap result aggregation/annotation (uses of
`timings`, `resultKey`, `summarize`, and `test.info().annotations.push`) and the
final console log into a "record results" step; update the test body (the
anonymous async function passed to `test(...)`) to call `await test.step(name,
async () => { ... })` around those phases so failures show as separate steps
while preserving existing logic and variables.
In `@tests/benchmarks/utils/compare.js`:
- Line 62: Replace the undocumented fallback that uses baseline.collections with
a strict requirement for baseline.entries: change the initialization of
baselineEntries (the variable named baselineEntries) to only read
baseline.entries and add a guard that throws (or returns a clear error) when
baseline.entries is missing so the function fails loudly instead of silently
using a dead-path; this ensures the code path in compare.js validates the
expected schema rather than swallowing a missing entries key.
In `@tests/benchmarks/utils/pr-comment.js`:
- Around line 29-33: pct is being set to a string via .toFixed(1) and then
compared numerically; change the code so pct is a numeric value used for
comparisons (compute pct = (data.mean - base.mean) / base.mean * 100 without
.toFixed), perform numeric comparisons against threshold and -threshold
(affecting the checks using pct in the hasRegression assignment and status
calculation), and only format pct with toFixed(1) when building the output
interpolation for body (e.g., use a formattedPct variable for the string).
Ensure the same named symbols are updated: pct (numeric), hasRegression, status,
and the body interpolation where `${pct}` is inserted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b03d3316-2bd5-4d22-9187-5e9061d11d3b
📒 Files selected for processing (13)
.github/actions/tests/run-benchmark-tests/action.yml.github/workflows/benchmarks.yml.gitignorepackage.jsonplaywright.benchmark.config.tsplaywright.config.tstests/benchmarks/mounting/baseline.jsontests/benchmarks/mounting/collection-mount.bench.tstests/benchmarks/utils/collection-generator.tstests/benchmarks/utils/compare.jstests/benchmarks/utils/pr-comment.jstests/benchmarks/utils/results.tstests/benchmarks/utils/stats.ts
| const ymlContent = stringifyCollection({ name } as any, { name, type: 'collection', opencollection: '1.0.0' }, { format: 'yml' }); | ||
| fs.writeFileSync(path.join(dir, 'opencollection.yml'), ymlContent); |
There was a problem hiding this comment.
Missing fallback for yml stringifyCollection result.
Line 44 guards against a falsy bru return with || \meta {...}`, but line 46 writes ymlContentdirectly. IfstringifyCollectionreturnsnull/undefinedfor the yml format,writeFileSyncwill write the string"undefined"or"null"`, producing an unreadable fixture file.
🛡️ Proposed fix
- const ymlContent = stringifyCollection({ name } as any, { name, type: 'collection', opencollection: '1.0.0' }, { format: 'yml' });
- fs.writeFileSync(path.join(dir, 'opencollection.yml'), ymlContent);
+ const ymlContent = stringifyCollection({ name } as any, { name, type: 'collection', opencollection: '1.0.0' }, { format: 'yml' })
+ || `name: ${name}\ntype: collection\nopencollection: '1.0.0'\n`;
+ fs.writeFileSync(path.join(dir, 'opencollection.yml'), ymlContent);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const ymlContent = stringifyCollection({ name } as any, { name, type: 'collection', opencollection: '1.0.0' }, { format: 'yml' }); | |
| fs.writeFileSync(path.join(dir, 'opencollection.yml'), ymlContent); | |
| const ymlContent = stringifyCollection({ name } as any, { name, type: 'collection', opencollection: '1.0.0' }, { format: 'yml' }) | |
| || `name: ${name}\ntype: collection\nopencollection: '1.0.0'\n`; | |
| fs.writeFileSync(path.join(dir, 'opencollection.yml'), ymlContent); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/benchmarks/utils/collection-generator.ts` around lines 46 - 47, The
ymlContent produced by stringifyCollection may be null/undefined and is written
directly via fs.writeFileSync, so add a fallback like the existing one for the
'bru' output: compute ymlContent = stringifyCollection(...) || `meta: { name:
"${name}", type: "collection", opencollection: "1.0.0" }` (or a minimal valid
YAML string), then pass that safe ymlContent to fs.writeFileSync; update
references to stringifyCollection and ymlContent accordingly so
opencollection.yml never contains "undefined"/"null".
| * --update-baseline | ||
| */ | ||
|
|
||
| import { existsSync, readFileSync, writeFileSync } from 'fs'; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm root package.json has no "type": "module"
cat package.json | python3 -c "import sys, json; d=json.load(sys.stdin); print('type:', d.get('type', '<not set>'))"Repository: usebruno/bruno
Length of output: 74
🏁 Script executed:
# Check if the file exists and view its imports
cat tests/benchmarks/utils/compare.js | head -30Repository: usebruno/bruno
Length of output: 1057
🏁 Script executed:
# Check the GitHub action file for how compare.js is invoked
find .github -name "*benchmark*" -type fRepository: usebruno/bruno
Length of output: 91
🏁 Script executed:
# If action file found, check the invocation
cat .github/actions/tests/run-benchmark-tests/action.yml | grep -A 5 -B 5 "compare.js" || echo "Pattern not found in action.yml"Repository: usebruno/bruno
Length of output: 325
ESM import in .js file breaks CI — GitHub Actions benchmark comparison will fail with SyntaxError
Node.js treats .js files as CommonJS by default. Without "type": "module" in package.json, the node tests/benchmarks/utils/compare.js command will immediately throw SyntaxError: Cannot use import statement outside a module, killing the "Compare Against Baseline" CI step.
Choose one fix:
Option A: Rename to `.mjs`
Rename compare.js → compare.mjs and update .github/workflows/benchmarks.yml:
- node tests/benchmarks/utils/compare.js \
+ node tests/benchmarks/utils/compare.mjs \Option B: Convert to CommonJS
-import { existsSync, readFileSync, writeFileSync } from 'fs';
+const { existsSync, readFileSync, writeFileSync } = require('fs');📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import { existsSync, readFileSync, writeFileSync } from 'fs'; | |
| const { existsSync, readFileSync, writeFileSync } = require('fs'); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/benchmarks/utils/compare.js` at line 21, The file uses ESM import
syntax (import { existsSync, readFileSync, writeFileSync } from 'fs') inside
tests/benchmarks/utils/compare.js which causes a SyntaxError under default
CommonJS Node; either rename compare.js → compare.mjs and update
.github/workflows/benchmarks.yml to invoke the .mjs path (Option A), or convert
the file to CommonJS by replacing the import with const { existsSync,
readFileSync, writeFileSync } = require('fs') and ensure any other ESM syntax is
changed accordingly (Option B); pick one option and make the matching change to
the CI workflow if you rename the file.
| const { data: comments } = await github.rest.issues.listComments({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| issue_number: context.issue.number | ||
| }); | ||
|
|
||
| const existing = comments.find((c) => c.body.startsWith(marker)); |
There was a problem hiding this comment.
listComments is not paginated — duplicate benchmark comments will accumulate on long-lived PRs.
The GitHub API returns at most 100 comments per page by default. On a PR with more than 100 comments the existing benchmark comment may lie on a later page, causing comments.find(...) to return undefined and a brand-new comment to be posted every benchmark run. Add per_page: 100 as a start; for full robustness, paginate with github.paginate.
🛠️ Suggested minimal fix
- const { data: comments } = await github.rest.issues.listComments({
- owner: context.repo.owner,
- repo: context.repo.repo,
- issue_number: context.issue.number
- });
+ const comments = await github.paginate(github.rest.issues.listComments, {
+ owner: context.repo.owner,
+ repo: context.repo.repo,
+ issue_number: context.issue.number,
+ per_page: 100
+ });Update the find call accordingly (no .data unwrap needed after paginate):
- const existing = comments.find((c) => c.body.startsWith(marker));
+ const existing = comments.find((c) => c.body && c.body.startsWith(marker));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/benchmarks/utils/pr-comment.js` around lines 46 - 52, The current call
using github.rest.issues.listComments(...) and then searching comments.find(...)
can miss the existing marker on PRs with >100 comments; update the call to
either include per_page: 100 and handle pagination, or replace the listComments
call with github.paginate(github.rest.issues.listComments, { owner:
context.repo.owner, repo: context.repo.repo, issue_number: context.issue.number,
per_page: 100 }) and then search the returned array for the marker (remove the
.data unwrap since paginate returns items directly); adjust the variable
currently named comments and the comments.find(...) usage accordingly so
duplicate benchmark comments are not created.
| const results = JSON.parse(fs.readFileSync(resultsPath, 'utf-8')); | ||
| const baseline = JSON.parse(fs.readFileSync(baselinePath, 'utf-8')); |
There was a problem hiding this comment.
baselinePath has no existence guard — crashes with an unhandled exception if the file is missing.
resultsPath is guarded (lines 72–75) but baselinePath is not. A missing or mistyped baseline path throws a synchronous ENOENT that will surface as an opaque CI failure instead of a helpful log message.
🛡️ Suggested fix
+ if (!fs.existsSync(baselinePath)) {
+ console.log(`No baseline found at ${baselinePath}, skipping comment.`);
+ return;
+ }
+
const results = JSON.parse(fs.readFileSync(resultsPath, 'utf-8'));
const baseline = JSON.parse(fs.readFileSync(baselinePath, 'utf-8'));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/benchmarks/utils/pr-comment.js` around lines 77 - 78, The code reads
baselinePath without checking existence, causing an uncaught ENOENT; wrap the
baseline read in the same guard/handling used for resultsPath — either check
fs.existsSync(baselinePath) before JSON.parse or perform the
fs.readFileSync/JSON.parse inside a try/catch, and on error emit a clear
log/error message and exit or return; update the lines that call
JSON.parse(fs.readFileSync(baselinePath, 'utf-8')) (and any surrounding logic
using baseline) to use this existence check or try/catch so missing or invalid
baseline files produce a helpful message instead of an unhandled exception.
…fix review issues - Same collection mounted/unmounted across iterations for cold vs cached comparison - workflow_dispatch has update-baseline boolean input for manual baseline updates - Fix string comparison bug in pr-comment.js (pct was string from toFixed) - Remove dead baseline.collections fallback in compare.js and pr-comment.js - Remove unnecessary waitForTimeout between iterations - Rename pct/pctChange to changePercent/percentChange for readability
…mmit on update-baseline - Reduce max collection size from 5000 to 3000 to keep CI runtime reasonable - Update baseline values from actual CI run data (worst case across ubuntu/macos/windows) - Auto-commit updated baseline.json when update-baseline is triggered via workflow_dispatch - Reuse same collection across iterations for cold vs cached comparison - Fix string comparison bug and remove dead code from review feedback - Rename pct variables to changePercent for readability - Remove unnecessary waitForTimeout between iterations
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/benchmarks.yml (1)
75-87:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrevent fork PR permission errors from failing the benchmark job
Line 75–87 can still fail on fork-originated
pull_requestruns when PR comment write is blocked. Make the comment step non-blocking so benchmark execution/artifacts remain available.Suggested patch
- name: Comment Benchmark Results on PR if: github.event_name == 'pull_request' && !cancelled() && matrix.os-name == 'ubuntu' + continue-on-error: true uses: actions/github-script@v7 with: script: | const run = require('./tests/benchmarks/utils/pr-comment.js'); await run({🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/benchmarks.yml around lines 75 - 87, The "Comment Benchmark Results on PR" step currently calls the pr-comment script and can fail on forked PRs when comment permissions are blocked; make this step non-blocking by adding continue-on-error: true to that step (the step with name "Comment Benchmark Results on PR" that uses actions/github-script@v7 and invokes the run(...) call) so failures to post the comment won't fail the whole job and artifacts/results remain available.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In @.github/workflows/benchmarks.yml:
- Around line 75-87: The "Comment Benchmark Results on PR" step currently calls
the pr-comment script and can fail on forked PRs when comment permissions are
blocked; make this step non-blocking by adding continue-on-error: true to that
step (the step with name "Comment Benchmark Results on PR" that uses
actions/github-script@v7 and invokes the run(...) call) so failures to post the
comment won't fail the whole job and artifacts/results remain available.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c9bd73f9-89e6-4048-a88a-9f2e6dad5879
📒 Files selected for processing (3)
.github/workflows/benchmarks.ymltests/benchmarks/mounting/baseline.jsontests/benchmarks/mounting/collection-mount.bench.ts
✅ Files skipped from review due to trivial changes (1)
- tests/benchmarks/mounting/baseline.json
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/benchmarks/mounting/collection-mount.bench.ts
- Add continue-on-error to PR comment step since GITHUB_TOKEN lacks write access on cross-fork PRs
- Split baseline.json into baseline.ubuntu/macos/windows.json with real CI data - Action and workflow dynamically reference baseline per OS - PR comment posted per OS with OS-specific comparison - Auto-commit updated baseline on workflow_dispatch with update-baseline flag
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/benchmarks.yml:
- Around line 67-73: Replace the per-matrix push in the "Commit Updated
Baseline" step with an artifacts workflow: in the matrix job (the step keyed by
name "Commit Updated Baseline" that currently checks
github.event.inputs.update-baseline == 'true' and references ${ { matrix.os-name
} }), stop running git commit/push and instead upload the updated baseline file
as an artifact named with the OS (use actions/upload-artifact, upload
tests/benchmarks/mounting/baseline.${{ matrix.os-name }}.json). Add a single
downstream job (e.g., "merge-baseline-updates") that runs once when
github.event.inputs.update-baseline == 'true', depends on the matrix job(s) via
needs, downloads all per-OS artifacts with actions/download-artifact, places
them into tests/benchmarks/mounting/, then performs git config, a single git
add/commit -m "chore: update baseline" and git push so one atomic commit handles
all baseline files and avoids concurrent non-fast-forward failures.
- Around line 67-73: The commit step named "Commit Updated Baseline" currently
uses the chained shell expression `git diff --staged --quiet || git commit -m
... && git push`, which can run `git push` even when no commit was created;
replace this operator chaining with an explicit conditional that checks the
staged-diff result and only runs the commit-and-push sequence when changes exist
(i.e., test `git diff --staged --quiet` and, if it indicates changes, run `git
commit -m "chore: update ${{ matrix.os-name }} benchmark baseline"` followed by
`git push`), ensuring push is executed only after a successful commit.
In `@tests/benchmarks/mounting/baseline.windows.json`:
- Around line 3-43: The baseline JSON "entries" currently only includes sizes up
to "bru-3000" and "yml-3000", but the benchmark coverage should span 50–5000;
run the mount benchmarks for the 5000-size collections and add new baseline
records named "bru-5000" and "yml-5000" to the entries object (including mean
and p50 stats) so regression detection covers the largest collection size.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0c3f77dd-22e8-4b5e-bcbc-78927b8e9bd9
📒 Files selected for processing (5)
.github/actions/tests/run-benchmark-tests/action.yml.github/workflows/benchmarks.ymltests/benchmarks/mounting/baseline.macos.jsontests/benchmarks/mounting/baseline.ubuntu.jsontests/benchmarks/mounting/baseline.windows.json
✅ Files skipped from review due to trivial changes (2)
- tests/benchmarks/mounting/baseline.macos.json
- tests/benchmarks/mounting/baseline.ubuntu.json
| - name: Commit Updated Baseline | ||
| if: github.event.inputs.update-baseline == 'true' | ||
| run: | | ||
| git config user.name "github-actions[bot]" | ||
| git config user.email "github-actions[bot]@users.noreply.github.com" | ||
| git add tests/benchmarks/mounting/baseline.${{ matrix.os-name }}.json | ||
| git diff --staged --quiet || git commit -m "chore: update ${{ matrix.os-name }} benchmark baseline" && git push |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify commit/push exists inside matrix benchmark job
rg -n -C3 --type=yaml 'strategy:|matrix:|Commit Updated Baseline|git push' .github/workflows/benchmarks.ymlRepository: usebruno/bruno
Length of output: 1003
🏁 Script executed:
cat -n .github/workflows/benchmarks.ymlRepository: usebruno/bruno
Length of output: 3702
Matrix jobs push concurrently to the same branch, risking non-fast-forward failures.
When update-baseline is true, all three OS jobs push simultaneously to the same branch. Since each updates a different baseline file, git rejects subsequent pushes as not fast-forward, causing intermittent workflow failures.
Safer approach: Have matrix jobs upload per-OS baseline artifacts, then run a single follow-up job to download all baselines and perform one atomic commit/push.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/benchmarks.yml around lines 67 - 73, Replace the
per-matrix push in the "Commit Updated Baseline" step with an artifacts
workflow: in the matrix job (the step keyed by name "Commit Updated Baseline"
that currently checks github.event.inputs.update-baseline == 'true' and
references ${ { matrix.os-name } }), stop running git commit/push and instead
upload the updated baseline file as an artifact named with the OS (use
actions/upload-artifact, upload tests/benchmarks/mounting/baseline.${{
matrix.os-name }}.json). Add a single downstream job (e.g.,
"merge-baseline-updates") that runs once when
github.event.inputs.update-baseline == 'true', depends on the matrix job(s) via
needs, downloads all per-OS artifacts with actions/download-artifact, places
them into tests/benchmarks/mounting/, then performs git config, a single git
add/commit -m "chore: update baseline" and git push so one atomic commit handles
all baseline files and avoids concurrent non-fast-forward failures.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the risky pattern exists in workflow files
rg -n --type=yaml 'git diff --staged --quiet \|\| git commit .*&& git push'Repository: usebruno/bruno
Length of output: 40
🏁 Script executed:
# Check if the file exists and read the relevant section
if [ -f ".github/workflows/benchmarks.yml" ]; then
echo "=== File exists, showing lines 60-80 with context ==="
sed -n '60,80p' .github/workflows/benchmarks.yml | cat -n
else
echo "=== File not found, listing .github/workflows directory ==="
ls -la .github/workflows/ 2>/dev/null || echo "Directory not found"
fiRepository: usebruno/bruno
Length of output: 1106
🏁 Script executed:
# Create a test to demonstrate the operator precedence issue
cat > /tmp/test_operators.sh << 'EOF'
#!/bin/bash
echo "=== Scenario 1: File with no changes (git diff exits 0) ==="
mkdir -p /tmp/git_test1
cd /tmp/git_test1
git init
echo "initial" > file.txt
git add file.txt
git commit -m "initial"
# Simulate: git diff --staged --quiet (no staged changes)
echo "Running: git diff --staged --quiet || echo 'COMMIT' && echo 'PUSH'"
git diff --staged --quiet || echo "COMMIT executed" && echo "PUSH executed"
echo "Exit code of full chain: $?"
echo ""
echo "=== Scenario 2: File with staged changes (git diff exits 1) ==="
cd /tmp/git_test1
echo "modified" > file.txt
git add file.txt
# Simulate: git diff --staged --quiet (has staged changes)
echo "Running: git diff --staged --quiet || echo 'COMMIT' && echo 'PUSH'"
git diff --staged --quiet || echo "COMMIT executed" && echo "PUSH executed"
echo "Exit code of full chain: $?"
EOF
bash /tmp/test_operators.shRepository: usebruno/bruno
Length of output: 1387
🏁 Script executed:
# Test bash operator precedence with simple commands
cat > /tmp/test_precedence.sh << 'EOF'
#!/bin/bash
echo "=== Testing: false || echo 'B' && echo 'C' ==="
false || echo 'B' && echo 'C'
echo "---"
echo "=== Testing: true || echo 'B' && echo 'C' ==="
true || echo 'B' && echo 'C'
echo "---"
echo "=== Testing: (false || echo 'B') && echo 'C' [explicit grouping] ==="
(false || echo 'B') && echo 'C'
echo "---"
echo "=== Testing: (true || echo 'B') && echo 'C' [explicit grouping] ==="
(true || echo 'B') && echo 'C'
echo "---"
EOF
bash /tmp/test_precedence.shRepository: usebruno/bruno
Length of output: 316
🏁 Script executed:
cat > /tmp/test_fix.sh << 'EOF'
#!/bin/bash
echo "=== Simulating the FIXED version ==="
echo ""
echo "Case 1: No staged changes"
if ! true; then # git diff --staged --quiet would return 0 (true/success)
echo " commit would run"
echo " push would run"
else
echo " commit skipped"
echo " push skipped"
fi
echo ""
echo "Case 2: Staged changes exist"
if ! false; then # git diff --staged --quiet would return 1 (false/failure)
echo " commit would run"
echo " push would run"
else
echo " commit skipped"
echo " push skipped"
fi
EOF
bash /tmp/test_fix.shRepository: usebruno/bruno
Length of output: 220
Fix line 73 operator chaining to prevent pushing when nothing was committed.
The git diff --staged --quiet || git commit ... && git push pattern runs git push even when there are no staged changes, making the step unreliable. With matrix jobs pushing simultaneously, this increases failure risk.
Suggested fix
- git diff --staged --quiet || git commit -m "chore: update ${{ matrix.os-name }} benchmark baseline" && git push
+ if ! git diff --staged --quiet; then
+ git commit -m "chore: update ${{ matrix.os-name }} benchmark baseline"
+ git push
+ fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Commit Updated Baseline | |
| if: github.event.inputs.update-baseline == 'true' | |
| run: | | |
| git config user.name "github-actions[bot]" | |
| git config user.email "github-actions[bot]@users.noreply.github.com" | |
| git add tests/benchmarks/mounting/baseline.${{ matrix.os-name }}.json | |
| git diff --staged --quiet || git commit -m "chore: update ${{ matrix.os-name }} benchmark baseline" && git push | |
| - name: Commit Updated Baseline | |
| if: github.event.inputs.update-baseline == 'true' | |
| run: | | |
| git config user.name "github-actions[bot]" | |
| git config user.email "github-actions[bot]@users.noreply.github.com" | |
| git add tests/benchmarks/mounting/baseline.${{ matrix.os-name }}.json | |
| if ! git diff --staged --quiet; then | |
| git commit -m "chore: update ${{ matrix.os-name }} benchmark baseline" | |
| git push | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/benchmarks.yml around lines 67 - 73, The commit step named
"Commit Updated Baseline" currently uses the chained shell expression `git diff
--staged --quiet || git commit -m ... && git push`, which can run `git push`
even when no commit was created; replace this operator chaining with an explicit
conditional that checks the staged-diff result and only runs the commit-and-push
sequence when changes exist (i.e., test `git diff --staged --quiet` and, if it
indicates changes, run `git commit -m "chore: update ${{ matrix.os-name }}
benchmark baseline"` followed by `git push`), ensuring push is executed only
after a successful commit.
| "entries": { | ||
| "bru-50": { | ||
| "mean": 2700, | ||
| "p50": 800 | ||
| }, | ||
| "bru-200": { | ||
| "mean": 1500, | ||
| "p50": 1400 | ||
| }, | ||
| "bru-500": { | ||
| "mean": 3500, | ||
| "p50": 3500 | ||
| }, | ||
| "bru-1000": { | ||
| "mean": 9500, | ||
| "p50": 9400 | ||
| }, | ||
| "bru-3000": { | ||
| "mean": 195000, | ||
| "p50": 190000 | ||
| }, | ||
| "yml-50": { | ||
| "mean": 600, | ||
| "p50": 570 | ||
| }, | ||
| "yml-200": { | ||
| "mean": 1350, | ||
| "p50": 1300 | ||
| }, | ||
| "yml-500": { | ||
| "mean": 3800, | ||
| "p50": 3700 | ||
| }, | ||
| "yml-1000": { | ||
| "mean": 11000, | ||
| "p50": 11000 | ||
| }, | ||
| "yml-3000": { | ||
| "mean": 90000, | ||
| "p50": 88000 | ||
| } |
There was a problem hiding this comment.
Missing 5000-size baseline entries reduces promised coverage
entries currently stops at *-3000, but the PR objective says mount benchmarks cover sizes 50–5000. Please add bru-5000 and yml-5000 baseline records (via benchmark run) so regression detection includes the largest collection size too.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/benchmarks/mounting/baseline.windows.json` around lines 3 - 43, The
baseline JSON "entries" currently only includes sizes up to "bru-3000" and
"yml-3000", but the benchmark coverage should span 50–5000; run the mount
benchmarks for the 5000-size collections and add new baseline records named
"bru-5000" and "yml-5000" to the entries object (including mean and p50 stats)
so regression detection covers the largest collection size.
…ults - writeResults now accepts SuiteMeta with name, unit, and direction - Results JSON includes suite field for the visualization dashboard to ingest - Mounting benchmark outputs unit: ms, direction: smaller
Description
Contribution Checklist:
Note: Keeping the PR small and focused helps make it easier to review and merge. If you have multiple changes you want to make, please consider submitting them as separate pull requests.
Publishing to New Package Managers
Please see here for more information.
Summary by CodeRabbit
New Features
test:benchmarknpm script for running performance tests.Chores
.gitignore.