perf(ci): wire StressPerf.KeyedList into /perf as a third macro leg#698
Merged
Conversation
Add the already-merged (#694) StressPerf.KeyedList workload to the /perf comparison as a third interleaved A/B macro leg, so /perf can resolve keyed-diff optimizations the positional StocksGrid cells never exercise. StocksGrid (StressPerf.ReactorOptimized) mutates cells in place by index, so its child diff always takes ChildReconciler.ReconcilePositional. The keyed LIS arm (ReconcileKeyed -> ReconcileKeyedMiddle) is invisible to it by construction -- the same blind spot that made the headline-only comparison unable to resolve keyed work (P2/#657, keyed structural-skip P4-C2). StressPerf.KeyedList renders ~500 stably keyed rows reordered/inserted/ removed each tick, driving that keyed arm on every tick. Harness changes (measurement tooling only; 0 src/Reactor): - Run-PerfBenchmark.ps1: AppRegistry KeyedList entry; -IncludeKeyedList (default on) + 'KeyedList' in the local -Apps set; best-effort build (try/catch -> omit on failure, like micro); third interleaved leg at the headline -Percent with the same pair-aligned drop-both collection and full Warmup+Reps budget; aggregation, ctx samples, result.json, and Format-PerfComment threading. - PerfLib.ps1: Format-PerfKeyedListSection (renders the 4 headline metrics with the same paired-delta 95% CI / direction-aware status as the headline table; empty when either side is null); Format-PerfComment -MainKeyed/ -PrKeyed params, rendered after the Allocation table and before the micro suite. - Tests: PerfLib.Tests.ps1 (216 assertions) + RunPerfBenchmark.Tests.ps1 (34 assertions) cover the new section + the static wiring contract. - Docs: README params table + "The comment" bullet; METHODOLOGY subsection; perf-compare.yml header + timeout 60 -> 75 (third macro leg + build). Verified: both test suites green; a local Run-PerfBenchmark.ps1 smoke (stub harness exes on this ARM64/OneDrive box) renders the keyed table in the right position with a direction-correct paired delta, and omits it under -IncludeKeyedList:$false. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Internal pr-review (8 dimensions + multi-model cross-check) on #698 surfaced 1 low + 3 medium, all confirmed by the cross-check, all in this PR's own tooling/tests. No critical/high. Addressed: - api-ergonomics (low): the run-mode log line printed `keyed-list={on/off}` from -IncludeKeyedList even in LOCAL mode, where the workload set actually comes from -Apps (so a default local run logged keyed-list=on while not running KeyedList). Make the mode line mode-aware: COMPARE keeps skip-floor/keyed-list tied to the include switches; LOCAL reports `apps=<-Apps>` instead. - test-coverage (medium x3) in the keyed leg's tests: * RunPerfBenchmark.Tests.ps1: lock the opt-out + best-effort build fallback (build guarded by -IncludeKeyedList; a build failure flips it $false; the run leg is skipped when off) and the one-sided-failure drop-both pair alignment ($mm -and $pm appends both; $mm -or $pm drops both). * PerfLib.Tests.ps1: assert keyed verdicts for ALL four headline metrics -- add Avg Diff (improvement) and Avg Memory (within noise). The keyed fixture now gives memory a small symmetric per-pair jitter so its paired CI straddles 0 (a proper within-noise case, not a degenerate zero-variance delta). PerfLib.Tests.ps1 218/218 (+2), RunPerfBenchmark.Tests.ps1 39/39 (+5); all four scripts parse clean; 0 src/Reactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Wires the StressPerf.KeyedList macro workload into the /perf CI orchestration as a third interleaved A/B leg, so perf-compare can measure the reconciler’s keyed child-diff path (which the positional StocksGrid workload cannot exercise).
Changes:
- Add
-IncludeKeyedList(default$true) and registry wiring forStressPerf.KeyedList, with best-effort build and an interleaved main/pr run loop that preserves paired-sample alignment. - Add
PerfLibrendering for a new “Keyed-list workload” section (same paired-Δ 95% CI + direction-aware status as the headline table) and thread it into the sticky comment layout. - Update tests/docs and increase the perf workflow timeout to account for the added build + macro leg.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
tests/stress_perf/METHODOLOGY.md |
Documents why a keyed-list macro leg is needed and how it’s measured/controlled. |
tests/stress_perf/ci/Run-PerfBenchmark.ps1 |
Adds KeyedList registry entry, -IncludeKeyedList, best-effort build, and third interleaved A/B leg; threads aggregates into comment + result.json. |
tests/stress_perf/ci/PerfLib.ps1 |
Adds Format-PerfKeyedListSection and renders it in Format-PerfComment (after alloc, before micro-suite). |
tests/stress_perf/ci/PerfLib.Tests.ps1 |
Adds coverage for keyed-list section rendering/omission, direction-awareness, and placement in the comment. |
tests/stress_perf/ci/RunPerfBenchmark.Tests.ps1 |
Adds AST/string-contract tests for the new param defaults, registry mapping, interleaved tags, and comment threading. |
tests/stress_perf/ci/README.md |
Updates CLI parameter docs and comment layout explanation to include the keyed-list table. |
.github/workflows/perf-compare.yml |
Updates header docs and increases timeout to 75 minutes to cover the added macro leg/build. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wires the already-merged (#694)
StressPerf.KeyedListworkload into the/perfslash-command as a third interleaved A/B macro leg, so
/perfcan measure thekeyed child-diff path that the StocksGrid macro workload never exercises.
This is the next step in the
/perfharness fleet (after #691 alloc metric + reps/CI,#693 micro-suite, #697 skip-floor column) — it closes the last big macro blind spot.
Why
The headline StocksGrid workload (
StressPerf.ReactorOptimized) mutates cells inplace by index, so its child diff always takes
ChildReconciler.ReconcilePositional. The reconciler's keyed arm —ReconcileKeyed→ReconcileKeyedMiddle, the LIS-based minimal-move pass — isinvisible to it by construction. That is the same class of structural blind spot
the methodology validation (n=12 local analysis) flagged on the original 2-run-median
harness: optimizations the macro workload can't route through simply don't appear,
so a "within noise" verdict can't be trusted to mean "no win."
StressPerf.KeyedListrenders ~500 stably keyed rows and reorders / inserts /removes them each tick (deterministic: fixed RNG seed, constant N via paired
insert+remove, content-stable labels) — driving the keyed LIS diff on every tick.
That makes it the sensitive macro signal for the in-flight keyed work the positional
cells can never reach:
How
Measurement tooling only — 0
src/Reactorchanges (7 files, all undertests/stress_perf/**+.github/workflows/perf-compare.yml):Run-PerfBenchmark.ps1—AppRegistryKeyedListentry;-IncludeKeyedList(default on, so fleet PRs'
/perfauto-measures the keyed path) +'KeyedList'in the local
-Appsset; best-effort build (try/catch → omit the table onfailure, exactly like the micro leg, so a keyed build failure never breaks the
StocksGrid comparison); a third interleaved leg at the headline
-Percentwiththe same pair-aligned drop-both collection and the full Warmup+Reps budget; ctx
samples +
result.json+Format-PerfCommentthreading.PerfLib.ps1—Format-PerfKeyedListSectionrenders the four headline metricswith the same paired-Δ 95% CI / direction-aware status as the headline table
(empty array when either side is null → caller renders nothing);
Format-PerfCommentgains
-MainKeyed/-PrKeyed, rendered after the Allocation table, before themicro-suite.
PerfLib.Tests.ps1(216 assertions, +13) covers the new section,direction-awareness,
-Percentthreading, and comment placement;RunPerfBenchmark.Tests.ps1(34 assertions, +7) covers the static wiringcontract (param default, registry mapping, interleaved leg, comment threading).
ci/README.mdparams table + "The comment" bullet;METHODOLOGY.mdkeyed-list subsection;
perf-compare.ymlheader +timeout-minutes60 → 75(third macro leg adds a build + ~Reps runs alongside the Rust cold-build).
Verification
PerfLib.Tests.ps1216/216,RunPerfBenchmark.Tests.ps134/34.
Run-PerfBenchmark.ps1compare-mode smoke (stub harness exes, since thereal WinUI exe can't build on this ARM64/OneDrive box): the keyed-list table renders
in the correct position with a direction-correct paired Δ + 95% CI, and is omitted
under
-IncludeKeyedList:$false(skip-floor + the rest unaffected).Scope / discipline
Harness change — no
/perfrun needed on this PR (it changes/perfitself; we re-run/perfon the fleet after it lands). DRAFT until the gate is clean. I own theorchestration files (
perf-compare.yml,Run-PerfBenchmark.ps1,PerfLib.ps1) to keepthem conflict-free across the roadmap.