Skip to content

perf(ci): wire StressPerf.KeyedList into /perf as a third macro leg#698

Merged
azchohfi merged 2 commits into
mainfrom
azchohfi-perf-keyedlist
Jun 26, 2026
Merged

perf(ci): wire StressPerf.KeyedList into /perf as a third macro leg#698
azchohfi merged 2 commits into
mainfrom
azchohfi-perf-keyedlist

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

What

Wires the already-merged (#694) StressPerf.KeyedList workload into the /perf
slash-command as a third interleaved A/B macro leg, so /perf can measure the
keyed child-diff path that the StocksGrid macro workload never exercises.

This is the next step in the /perf harness fleet (after #691 alloc metric + reps/CI,
#693 micro-suite, #697 skip-floor column) — it closes the last big macro blind spot.

Why

The headline StocksGrid workload (StressPerf.ReactorOptimized) mutates cells in
place by index
, so its child diff always takes
ChildReconciler.ReconcilePositional. The reconciler's keyed arm —
ReconcileKeyedReconcileKeyedMiddle, the LIS-based minimal-move pass — is
invisible to it by construction. That is the same class of structural blind spot
the methodology validation (n=12 local analysis) flagged on the original 2-run-median
harness: optimizations the macro workload can't route through simply don't appear,
so a "within noise" verdict can't be trusted to mean "no win."

StressPerf.KeyedList renders ~500 stably keyed rows and reorders / inserts /
removes them each tick (deterministic: fixed RNG seed, constant N via paired
insert+remove, content-stable labels) — driving the keyed LIS diff on every tick.
That makes it the sensitive macro signal for the in-flight keyed work the positional
cells can never reach:

How

Measurement tooling only — 0 src/Reactor changes (7 files, all under
tests/stress_perf/** + .github/workflows/perf-compare.yml):

  • Run-PerfBenchmark.ps1AppRegistry KeyedList entry; -IncludeKeyedList
    (default on, so fleet PRs' /perf auto-measures the keyed path) + 'KeyedList'
    in the local -Apps set; best-effort build (try/catch → omit the table on
    failure, exactly like the micro leg, so a keyed build failure never breaks the
    StocksGrid comparison); a third interleaved leg at the headline -Percent with
    the same pair-aligned drop-both collection and the full Warmup+Reps budget; ctx
    samples + result.json + Format-PerfComment threading.
  • PerfLib.ps1Format-PerfKeyedListSection renders the four headline metrics
    with the same paired-Δ 95% CI / direction-aware status as the headline table
    (empty array when either side is null → caller renders nothing); Format-PerfComment
    gains -MainKeyed/-PrKeyed, rendered after the Allocation table, before the
    micro-suite
    .
  • TestsPerfLib.Tests.ps1 (216 assertions, +13) covers the new section,
    direction-awareness, -Percent threading, and comment placement;
    RunPerfBenchmark.Tests.ps1 (34 assertions, +7) covers the static wiring
    contract (param default, registry mapping, interleaved leg, comment threading).
  • Docsci/README.md params table + "The comment" bullet; METHODOLOGY.md
    keyed-list subsection; perf-compare.yml header + timeout-minutes 60 → 75
    (third macro leg adds a build + ~Reps runs alongside the Rust cold-build).

Verification

  • Both perf-lib suites green: PerfLib.Tests.ps1 216/216, RunPerfBenchmark.Tests.ps1
    34/34.
  • Local Run-PerfBenchmark.ps1 compare-mode smoke (stub harness exes, since the
    real WinUI exe can't build on this ARM64/OneDrive box): the keyed-list table renders
    in the correct position with a direction-correct paired Δ + 95% CI, and is omitted
    under -IncludeKeyedList:$false (skip-floor + the rest unaffected).

Scope / discipline

Harness change — no /perf run needed on this PR (it changes /perf itself; we re-run
/perf on the fleet after it lands). DRAFT until the gate is clean. I own the
orchestration files (perf-compare.yml, Run-PerfBenchmark.ps1, PerfLib.ps1) to keep
them conflict-free across the roadmap.

Copilot AI added 2 commits June 26, 2026 03:38
Add the already-merged (#694) StressPerf.KeyedList workload to the /perf
comparison as a third interleaved A/B macro leg, so /perf can resolve
keyed-diff optimizations the positional StocksGrid cells never exercise.

StocksGrid (StressPerf.ReactorOptimized) mutates cells in place by index, so
its child diff always takes ChildReconciler.ReconcilePositional. The keyed
LIS arm (ReconcileKeyed -> ReconcileKeyedMiddle) is invisible to it by
construction -- the same blind spot that made the headline-only comparison
unable to resolve keyed work (P2/#657, keyed structural-skip P4-C2).
StressPerf.KeyedList renders ~500 stably keyed rows reordered/inserted/
removed each tick, driving that keyed arm on every tick.

Harness changes (measurement tooling only; 0 src/Reactor):
- Run-PerfBenchmark.ps1: AppRegistry KeyedList entry; -IncludeKeyedList
  (default on) + 'KeyedList' in the local -Apps set; best-effort build
  (try/catch -> omit on failure, like micro); third interleaved leg at the
  headline -Percent with the same pair-aligned drop-both collection and full
  Warmup+Reps budget; aggregation, ctx samples, result.json, and
  Format-PerfComment threading.
- PerfLib.ps1: Format-PerfKeyedListSection (renders the 4 headline metrics
  with the same paired-delta 95% CI / direction-aware status as the headline
  table; empty when either side is null); Format-PerfComment -MainKeyed/
  -PrKeyed params, rendered after the Allocation table and before the micro
  suite.
- Tests: PerfLib.Tests.ps1 (216 assertions) + RunPerfBenchmark.Tests.ps1
  (34 assertions) cover the new section + the static wiring contract.
- Docs: README params table + "The comment" bullet; METHODOLOGY subsection;
  perf-compare.yml header + timeout 60 -> 75 (third macro leg + build).

Verified: both test suites green; a local Run-PerfBenchmark.ps1 smoke (stub
harness exes on this ARM64/OneDrive box) renders the keyed table in the right
position with a direction-correct paired delta, and omits it under
-IncludeKeyedList:$false.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Internal pr-review (8 dimensions + multi-model cross-check) on #698 surfaced
1 low + 3 medium, all confirmed by the cross-check, all in this PR's own
tooling/tests. No critical/high. Addressed:

- api-ergonomics (low): the run-mode log line printed `keyed-list={on/off}`
  from -IncludeKeyedList even in LOCAL mode, where the workload set actually
  comes from -Apps (so a default local run logged keyed-list=on while not
  running KeyedList). Make the mode line mode-aware: COMPARE keeps
  skip-floor/keyed-list tied to the include switches; LOCAL reports
  `apps=<-Apps>` instead.
- test-coverage (medium x3) in the keyed leg's tests:
  * RunPerfBenchmark.Tests.ps1: lock the opt-out + best-effort build fallback
    (build guarded by -IncludeKeyedList; a build failure flips it $false; the
    run leg is skipped when off) and the one-sided-failure drop-both pair
    alignment ($mm -and $pm appends both; $mm -or $pm drops both).
  * PerfLib.Tests.ps1: assert keyed verdicts for ALL four headline metrics --
    add Avg Diff (improvement) and Avg Memory (within noise). The keyed fixture
    now gives memory a small symmetric per-pair jitter so its paired CI
    straddles 0 (a proper within-noise case, not a degenerate zero-variance
    delta).

PerfLib.Tests.ps1 218/218 (+2), RunPerfBenchmark.Tests.ps1 39/39 (+5); all
four scripts parse clean; 0 src/Reactor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Wires the StressPerf.KeyedList macro workload into the /perf CI orchestration as a third interleaved A/B leg, so perf-compare can measure the reconciler’s keyed child-diff path (which the positional StocksGrid workload cannot exercise).

Changes:

  • Add -IncludeKeyedList (default $true) and registry wiring for StressPerf.KeyedList, with best-effort build and an interleaved main/pr run loop that preserves paired-sample alignment.
  • Add PerfLib rendering for a new “Keyed-list workload” section (same paired-Δ 95% CI + direction-aware status as the headline table) and thread it into the sticky comment layout.
  • Update tests/docs and increase the perf workflow timeout to account for the added build + macro leg.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/stress_perf/METHODOLOGY.md Documents why a keyed-list macro leg is needed and how it’s measured/controlled.
tests/stress_perf/ci/Run-PerfBenchmark.ps1 Adds KeyedList registry entry, -IncludeKeyedList, best-effort build, and third interleaved A/B leg; threads aggregates into comment + result.json.
tests/stress_perf/ci/PerfLib.ps1 Adds Format-PerfKeyedListSection and renders it in Format-PerfComment (after alloc, before micro-suite).
tests/stress_perf/ci/PerfLib.Tests.ps1 Adds coverage for keyed-list section rendering/omission, direction-awareness, and placement in the comment.
tests/stress_perf/ci/RunPerfBenchmark.Tests.ps1 Adds AST/string-contract tests for the new param defaults, registry mapping, interleaved tags, and comment threading.
tests/stress_perf/ci/README.md Updates CLI parameter docs and comment layout explanation to include the keyed-list table.
.github/workflows/perf-compare.yml Updates header docs and increases timeout to 75 minutes to cover the added macro leg/build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@azchohfi azchohfi marked this pull request as ready for review June 26, 2026 11:07
@azchohfi azchohfi merged commit 41e41d7 into main Jun 26, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants