perf(ci): wire StressPerf.KeyedList into /perf as a third macro leg by azchohfi · Pull Request #698 · microsoft/microsoft-ui-reactor

azchohfi · 2026-06-26T10:39:10Z

What

Wires the already-merged (#694) StressPerf.KeyedList workload into the /perf
slash-command as a third interleaved A/B macro leg, so /perf can measure the
keyed child-diff path that the StocksGrid macro workload never exercises.

This is the next step in the /perf harness fleet (after #691 alloc metric + reps/CI,
#693 micro-suite, #697 skip-floor column) — it closes the last big macro blind spot.

Why

The headline StocksGrid workload (StressPerf.ReactorOptimized) mutates cells in
place by index, so its child diff always takes
ChildReconciler.ReconcilePositional. The reconciler's keyed arm —
ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass — is
invisible to it by construction. That is the same class of structural blind spot
the methodology validation (n=12 local analysis) flagged on the original 2-run-median
harness: optimizations the macro workload can't route through simply don't appear,
so a "within noise" verdict can't be trusted to mean "no win."

StressPerf.KeyedList renders ~500 stably keyed rows and reorders / inserts /
removes them each tick (deterministic: fixed RNG seed, constant N via paired
insert+remove, content-stable labels) — driving the keyed LIS diff on every tick.
That makes it the sensitive macro signal for the in-flight keyed work the positional
cells can never reach:

P2 / perf: eliminate per-diff allocations in ChildReconciler + KeyedListDiff #657 (keyed-list diff)
keyed structural-skip P4-C2

How

Measurement tooling only — 0 src/Reactor changes (7 files, all under
tests/stress_perf/** + .github/workflows/perf-compare.yml):

Run-PerfBenchmark.ps1 — AppRegistry KeyedList entry; -IncludeKeyedList
(default on, so fleet PRs' /perf auto-measures the keyed path) + 'KeyedList'
in the local -Apps set; best-effort build (try/catch → omit the table on
failure, exactly like the micro leg, so a keyed build failure never breaks the
StocksGrid comparison); a third interleaved leg at the headline -Percent with
the same pair-aligned drop-both collection and the full Warmup+Reps budget; ctx
samples + result.json + Format-PerfComment threading.
PerfLib.ps1 — Format-PerfKeyedListSection renders the four headline metrics
with the same paired-Δ 95% CI / direction-aware status as the headline table
(empty array when either side is null → caller renders nothing); Format-PerfComment
gains -MainKeyed/-PrKeyed, rendered after the Allocation table, before the
micro-suite.
Tests — PerfLib.Tests.ps1 (216 assertions, +13) covers the new section,
direction-awareness, -Percent threading, and comment placement;
RunPerfBenchmark.Tests.ps1 (34 assertions, +7) covers the static wiring
contract (param default, registry mapping, interleaved leg, comment threading).
Docs — ci/README.md params table + "The comment" bullet; METHODOLOGY.md
keyed-list subsection; perf-compare.yml header + timeout-minutes 60 → 75
(third macro leg adds a build + ~Reps runs alongside the Rust cold-build).

Verification

Both perf-lib suites green: PerfLib.Tests.ps1 216/216, RunPerfBenchmark.Tests.ps1
34/34.
Local Run-PerfBenchmark.ps1 compare-mode smoke (stub harness exes, since the
real WinUI exe can't build on this ARM64/OneDrive box): the keyed-list table renders
in the correct position with a direction-correct paired Δ + 95% CI, and is omitted
under -IncludeKeyedList:$false (skip-floor + the rest unaffected).

Scope / discipline

Harness change — no /perf run needed on this PR (it changes /perf itself; we re-run
/perf on the fleet after it lands). DRAFT until the gate is clean. I own the
orchestration files (perf-compare.yml, Run-PerfBenchmark.ps1, PerfLib.ps1) to keep
them conflict-free across the roadmap.

Add the already-merged (#694) StressPerf.KeyedList workload to the /perf comparison as a third interleaved A/B macro leg, so /perf can resolve keyed-diff optimizations the positional StocksGrid cells never exercise. StocksGrid (StressPerf.ReactorOptimized) mutates cells in place by index, so its child diff always takes ChildReconciler.ReconcilePositional. The keyed LIS arm (ReconcileKeyed -> ReconcileKeyedMiddle) is invisible to it by construction -- the same blind spot that made the headline-only comparison unable to resolve keyed work (P2/#657, keyed structural-skip P4-C2). StressPerf.KeyedList renders ~500 stably keyed rows reordered/inserted/ removed each tick, driving that keyed arm on every tick. Harness changes (measurement tooling only; 0 src/Reactor): - Run-PerfBenchmark.ps1: AppRegistry KeyedList entry; -IncludeKeyedList (default on) + 'KeyedList' in the local -Apps set; best-effort build (try/catch -> omit on failure, like micro); third interleaved leg at the headline -Percent with the same pair-aligned drop-both collection and full Warmup+Reps budget; aggregation, ctx samples, result.json, and Format-PerfComment threading. - PerfLib.ps1: Format-PerfKeyedListSection (renders the 4 headline metrics with the same paired-delta 95% CI / direction-aware status as the headline table; empty when either side is null); Format-PerfComment -MainKeyed/ -PrKeyed params, rendered after the Allocation table and before the micro suite. - Tests: PerfLib.Tests.ps1 (216 assertions) + RunPerfBenchmark.Tests.ps1 (34 assertions) cover the new section + the static wiring contract. - Docs: README params table + "The comment" bullet; METHODOLOGY subsection; perf-compare.yml header + timeout 60 -> 75 (third macro leg + build). Verified: both test suites green; a local Run-PerfBenchmark.ps1 smoke (stub harness exes on this ARM64/OneDrive box) renders the keyed table in the right position with a direction-correct paired delta, and omits it under -IncludeKeyedList:$false. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Internal pr-review (8 dimensions + multi-model cross-check) on #698 surfaced 1 low + 3 medium, all confirmed by the cross-check, all in this PR's own tooling/tests. No critical/high. Addressed: - api-ergonomics (low): the run-mode log line printed `keyed-list={on/off}` from -IncludeKeyedList even in LOCAL mode, where the workload set actually comes from -Apps (so a default local run logged keyed-list=on while not running KeyedList). Make the mode line mode-aware: COMPARE keeps skip-floor/keyed-list tied to the include switches; LOCAL reports `apps=<-Apps>` instead. - test-coverage (medium x3) in the keyed leg's tests: * RunPerfBenchmark.Tests.ps1: lock the opt-out + best-effort build fallback (build guarded by -IncludeKeyedList; a build failure flips it $false; the run leg is skipped when off) and the one-sided-failure drop-both pair alignment ($mm -and $pm appends both; $mm -or $pm drops both). * PerfLib.Tests.ps1: assert keyed verdicts for ALL four headline metrics -- add Avg Diff (improvement) and Avg Memory (within noise). The keyed fixture now gives memory a small symmetric per-pair jitter so its paired CI straddles 0 (a proper within-noise case, not a degenerate zero-variance delta). PerfLib.Tests.ps1 218/218 (+2), RunPerfBenchmark.Tests.ps1 39/39 (+5); all four scripts parse clean; 0 src/Reactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Wires the StressPerf.KeyedList macro workload into the /perf CI orchestration as a third interleaved A/B leg, so perf-compare can measure the reconciler’s keyed child-diff path (which the positional StocksGrid workload cannot exercise).

Changes:

Add -IncludeKeyedList (default $true) and registry wiring for StressPerf.KeyedList, with best-effort build and an interleaved main/pr run loop that preserves paired-sample alignment.
Add PerfLib rendering for a new “Keyed-list workload” section (same paired-Δ 95% CI + direction-aware status as the headline table) and thread it into the sticky comment layout.
Update tests/docs and increase the perf workflow timeout to account for the added build + macro leg.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`tests/stress_perf/METHODOLOGY.md`	Documents why a keyed-list macro leg is needed and how it’s measured/controlled.
`tests/stress_perf/ci/Run-PerfBenchmark.ps1`	Adds KeyedList registry entry, `-IncludeKeyedList`, best-effort build, and third interleaved A/B leg; threads aggregates into comment + result.json.
`tests/stress_perf/ci/PerfLib.ps1`	Adds `Format-PerfKeyedListSection` and renders it in `Format-PerfComment` (after alloc, before micro-suite).
`tests/stress_perf/ci/PerfLib.Tests.ps1`	Adds coverage for keyed-list section rendering/omission, direction-awareness, and placement in the comment.
`tests/stress_perf/ci/RunPerfBenchmark.Tests.ps1`	Adds AST/string-contract tests for the new param defaults, registry mapping, interleaved tags, and comment threading.
`tests/stress_perf/ci/README.md`	Updates CLI parameter docs and comment layout explanation to include the keyed-list table.
`.github/workflows/perf-compare.yml`	Updates header docs and increases timeout to 75 minutes to cover the added macro leg/build.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI added 2 commits June 26, 2026 03:38

azchohfi requested a review from Copilot June 26, 2026 10:51

Copilot started reviewing on behalf of azchohfi June 26, 2026 10:51 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

azchohfi marked this pull request as ready for review June 26, 2026 11:07

azchohfi requested a review from codemonkeychris as a code owner June 26, 2026 11:07

azchohfi merged commit 41e41d7 into main Jun 26, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(ci): wire StressPerf.KeyedList into /perf as a third macro leg#698

perf(ci): wire StressPerf.KeyedList into /perf as a third macro leg#698
azchohfi merged 2 commits into
mainfrom
azchohfi-perf-keyedlist

azchohfi commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

azchohfi commented Jun 26, 2026

What

Why

How

Verification

Scope / discipline

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants