Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions .github/workflows/perf-compare.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,13 @@
#
# Each metric shows main, PR, a signed Δ (with a 95% CI), and an improvement/
# regression/within-noise status. Further tables follow: a low-mutation skip-floor
# leg, an allocation comparison, the reconciler micro-suite, and a cross-framework
# reference (vanilla WinUI3 (StressPerf.Direct) + the Rust `windows-reactor` port,
# test_reactor_perf from microsoft/windows-rs — both measured live on the same
# runner). See tests/stress_perf/ci/README.md for the authoritative comment layout.
# leg, an allocation comparison, a keyed-list leg (StressPerf.KeyedList — a ~500-row
# stably keyed list whose reorder/insert/remove ticks drive the reconciler's keyed
# LIS diff arm the positional StocksGrid cells never hit), the reconciler micro-suite,
# and a cross-framework reference (vanilla WinUI3 (StressPerf.Direct) + the Rust
# `windows-reactor` port, test_reactor_perf from microsoft/windows-rs — both measured
# live on the same runner). See tests/stress_perf/ci/README.md for the authoritative
# comment layout.
#
# ── Why `issue_comment` (not a label) ────────────────────────────────────────
# `issue_comment` always runs the workflow from the DEFAULT branch, so the perf
Expand Down Expand Up @@ -75,8 +78,10 @@ jobs:
contains(fromJSON('["OWNER","MEMBER","COLLABORATOR"]'), github.event.comment.author_association))
}}
runs-on: windows-latest
# 60 (not 40): the Rust cross-framework leg cold-builds the windows-rs port.
timeout-minutes: 60
# 75 (not 60): the Rust cross-framework leg cold-builds the windows-rs port, and
# the macro comparison now runs three interleaved A/B legs (headline + skip-floor
# + keyed-list) plus the micro-suite — the keyed leg adds a build + ~Reps runs.
timeout-minutes: 75
env:
GH_REPO: ${{ github.repository }}
# Pinned microsoft/windows-rs commit whose `test_reactor_perf` crate backs
Expand Down
22 changes: 22 additions & 0 deletions tests/stress_perf/METHODOLOGY.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,28 @@ each leg's delta independently cancels time-correlated drift); it is opt-out via
`-IncludeSkipFloor $false`. See
[`ci/README.md`](ci/README.md#the-comment).

## Keyed-list workload: the keyed child-diff path StocksGrid never hits

The StocksGrid macro workload (`StressPerf.ReactorOptimized`) renders a fixed grid
of cells mutated **in place by index**. Its child diff therefore always takes
`ChildReconciler.ReconcilePositional` — the positional re-walk. It never exercises
the reconciler's **keyed** arm, so keyed-diff optimizations (the keyed-list LIS
diff, keyed structural-skip) are invisible to it *by construction* — the same blind
spot that made the original headline-only comparison unable to resolve them.

So `/perf` runs a **third interleaved A/B leg** on `StressPerf.KeyedList`: a ~500-row
list of **stably keyed** children that are reordered / inserted / removed each tick.
Because every child carries a key, the child reconciler takes its keyed arm
(`ReconcileKeyed` → `ReconcileKeyedMiddle`, the LIS-based minimal-move pass) and runs
a real keyed diff every tick. The workload is deterministic (fixed RNG seed, constant
row count — insertions paired with removals) so `main` and PR compare identical edit
sequences, and its rows' labels are content-stable so a moved row's text never changes
— isolating the **structural** (keyed-diff) signal from per-cell property updates. It
reports the four headline metrics in its own table under the same interleaving, reps,
warm-up, and 95%-CI gating as the headline leg, and is opt-out via
`-IncludeKeyedList $false`. See
[`ci/README.md`](ci/README.md#the-comment).

## Reconciler micro-benchmarks: ns-resolution Core path

Every metric above is measured **across a live WinUI render pipeline**, which is
Expand Down
57 changes: 57 additions & 0 deletions tests/stress_perf/ci/PerfLib.Tests.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,63 @@ $floorComment1 = Format-PerfComment -Main $main -Pr $pr -WinUI3 $null -Rust $nul
Assert-Match $floorComment1 '--percent 1' 'skip-floor heading reflects Context.SkipFloorPercent'


# ── Format-PerfKeyedListSection + Format-PerfComment: keyed-list workload ──────
# 12 paired keyed-list runs exercising ALL FOUR headline metrics by direction AND by
# significance: rps/reconcile/diff move DOWN main->PR, while memory carries a small
# SYMMETRIC per-pair jitter (mean Δ ~0). So the verdicts must split: rps (higher-
# better) DOWN = regression; reconcile/diff (lower-better) DOWN = improvement; memory's
# paired CI straddles 0 = within noise — proving the keyed section reuses Table 1's
# direction-aware paired-CI machinery, not a hard-coded verdict. The small jitter on
# the directional metrics keeps each of their paired CIs off 0.
$keyedMainRuns = @(); $keyedPrRuns = @()
1..12 | ForEach-Object {
$j = ($_ % 4) * 0.05
$mj = ((($_ % 2) * 2) - 1) * 0.2 # alternating +0.2 / -0.2 so the paired memory Δ straddles 0
$keyedMainRuns += [pscustomobject]@{ RendersPerSec = 8.0 + $j; AvgReconcileMs = 9.0 + $j; AvgDiffMs = 7.0 + $j; AvgMemoryMB = 250 + $mj; TotalRenders = 80; DurationSeconds = 10 }
$keyedPrRuns += [pscustomobject]@{ RendersPerSec = 7.0 + $j; AvgReconcileMs = 7.0 + $j; AvgDiffMs = 5.0 + $j; AvgMemoryMB = 250 - $mj; TotalRenders = 70; DurationSeconds = 10 }
}
$keyedMain = Measure-PerfRuns -Runs $keyedMainRuns
$keyedPr = Measure-PerfRuns -Runs $keyedPrRuns

# Direct section renderer: empty when either side is null, populated when both present.
Assert-Equal 0 @(Format-PerfKeyedListSection -MainKeyed $null -PrKeyed $keyedPr -Percent 50).Count 'keyed section empty when main keyed null'
Assert-Equal 0 @(Format-PerfKeyedListSection -MainKeyed $keyedMain -PrKeyed $null -Percent 50).Count 'keyed section empty when pr keyed null'
$keyedSection = Format-PerfKeyedListSection -MainKeyed $keyedMain -PrKeyed $keyedPr -Percent 50
$keyedSectionText = $keyedSection -join "`n"
Assert-Match $keyedSectionText 'Keyed-list workload' 'keyed section has heading'
Assert-Match $keyedSectionText 'StressPerf.KeyedList' 'keyed heading names the workload'
Assert-Match $keyedSectionText 'Avg Reconcile' 'keyed section has reconcile row'
Assert-Match $keyedSectionText 'keyed arm' 'keyed preamble explains the keyed arm'
Assert-Match $keyedSectionText 'LIS' 'keyed preamble cites the LIS minimal-move pass'
# Direction-awareness: rps and reconcile both DECREASE main->PR, yet rps (higher-is-
# better) must read regression while reconcile (lower-is-better) reads improvement.
$keyedRpsRow = ($keyedSection | Where-Object { $_ -match 'Renders/sec' }) -join ' '
$keyedReconRow = ($keyedSection | Where-Object { $_ -match 'Avg Reconcile' }) -join ' '
$keyedDiffRow = ($keyedSection | Where-Object { $_ -match 'Avg Diff' }) -join ' '
$keyedMemRow = ($keyedSection | Where-Object { $_ -match 'Avg Memory' }) -join ' '
Assert-Match $keyedRpsRow 'regression' 'keyed: rps DOWN reads regression (higher-is-better honored)'
Assert-Match $keyedReconRow 'improvement' 'keyed: reconcile DOWN reads improvement (lower-is-better honored)'
Assert-Match $keyedDiffRow 'improvement' 'keyed: diff DOWN reads improvement (lower-is-better honored)'
Assert-Match $keyedMemRow 'within noise' 'keyed: symmetric memory Δ reads within noise (paired CI straddles 0)'
# -Percent threads into the heading independently of the methodology line.
$keyedSection75 = (Format-PerfKeyedListSection -MainKeyed $keyedMain -PrKeyed $keyedPr -Percent 75) -join "`n"
Assert-Match $keyedSection75 'Keyed-list workload*--percent 75' 'keyed heading reflects the -Percent argument'

# Threaded through Format-PerfComment: present when keyed aggregates present, sitting
# after the regression/skip-floor tables and before the cross-framework table.
$keyedComment = Format-PerfComment -Main $main -Pr $pr -WinUI3 $null -Rust $null -MainFloor $floorMain -PrFloor $floorPr -MainKeyed $keyedMain -PrKeyed $keyedPr -Context $ctx
Assert-Match $keyedComment 'Keyed-list workload' 'comment renders keyed-list table when keyed aggregates present'
$idxRegK = $keyedComment.IndexOf('Regression vs')
$idxFloorK = $keyedComment.IndexOf('Low-mutation skip-floor')
$idxKeyed = $keyedComment.IndexOf('Keyed-list workload')
$idxXfwK = $keyedComment.IndexOf('Cross-framework reference')
Assert-True (($idxRegK -lt $idxKeyed) -and ($idxFloorK -lt $idxKeyed) -and ($idxKeyed -lt $idxXfwK)) 'keyed-list table sits after the regression + skip-floor tables and before cross-framework'

# Omitted entirely when keyed aggregates are absent (keyed-list leg disabled / build omitted).
$noKeyedComment = Format-PerfComment -Main $main -Pr $pr -WinUI3 $null -Rust $null -MainKeyed $null -PrKeyed $null -Context $ctx
Assert-True (-not ($noKeyedComment -like '*Keyed-list workload*')) 'keyed-list table omitted when keyed aggregates null'


# ── Reconciler micro-suite: Read-MicroBenchResults / comparison / render ──────
function New-MicroRow {
param([string]$BenchId, [string]$Name, [string]$Variant, [int]$Rep, [double]$MeanNs, [double]$AllocBytes, [string]$Status = 'ok', [int]$Iterations = 1)
Expand Down
64 changes: 64 additions & 0 deletions tests/stress_perf/ci/PerfLib.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -742,6 +742,58 @@ function Format-PerfSkipFloorSection {
return $lines.ToArray()
}

function Format-PerfKeyedListSection {
<#
.SYNOPSIS
Render the keyed-list workload table: the four headline metrics measured on
StressPerf.KeyedList — a ~500-row stably keyed list whose rows are reordered /
inserted / removed each tick. Empty array when there is nothing to show.
.DESCRIPTION
Unlike the positional StocksGrid headline/skip-floor legs (whose cells mutate
in place by index, always taking ChildReconciler.ReconcilePositional), this is
a SEPARATE macro workload that drives the child reconciler's KEYED arm
(ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass). It is
the sensitive macro measure for keyed-diff optimizations (keyed-list diff,
keyed structural-skip) that the StocksGrid workload can never exercise. Reuses
the same paired-Δ 95% CI machinery (Get-PerfDelta over the index-aligned
per-run samples) as the headline table. Returns an empty array when either
aggregate is $null (keyed-list leg disabled, build omitted, or one side
produced no metrics), so the caller renders nothing.
.PARAMETER MainKeyed Aggregated baseline keyed-list metrics (Measure-PerfRuns), or $null.
.PARAMETER PrKeyed Aggregated PR-head keyed-list metrics, or $null.
.PARAMETER Percent The mutation percent the keyed-list leg ran at (heading / preamble).
#>
param(
[AllowNull()][pscustomobject]$MainKeyed,
[AllowNull()][pscustomobject]$PrKeyed,
[double]$Percent = 50
)
if ($null -eq $MainKeyed -or $null -eq $PrKeyed) { return @() }

$lines = [System.Collections.Generic.List[string]]::new()
$lines.Add("### Keyed-list workload (``StressPerf.KeyedList``, ``--percent $Percent``)")
$lines.Add('')
$lines.Add("A separate macro workload: a ~500-row **stably keyed** list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its **keyed arm** (``ReconcileKeyed`` → ``ReconcileKeyedMiddle``, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure &mdash; so this is the sensitive macro signal for **keyed-diff** work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.")
$lines.Add('')
$lines.Add('| Metric | `main` (baseline) | This PR | Δ (95% CI) | Status |')
$lines.Add('|---|--:|--:|--:|:--|')
foreach ($m in $script:PerfMetricSpec) {
$bVal = $MainKeyed.($m.Key)
$pVal = $PrKeyed.($m.Key)
$spread = [math]::Max([double]$MainKeyed."$($m.Key)Spread", [double]$PrKeyed."$($m.Key)Spread")
$delta = Get-PerfDelta -Baseline $bVal -Candidate $pVal -LowerIsBetter $m.LowerIsBetter -SpreadPct $spread `
-BaselineSamples $MainKeyed."$($m.Key)Samples" -CandidateSamples $PrKeyed."$($m.Key)Samples"
$lines.Add(('| {0} {1} | {2} | {3} | {4} | {5} |' -f `
$m.Label, $m.Arrow, `
(Format-PerfNumber $bVal $m.Digits), `
(Format-PerfNumber $pVal $m.Digits), `
(Format-PerfDeltaCell $delta), `
(Get-PerfStatusGlyph $delta.Status)))
}
$lines.Add('')
return $lines.ToArray()
}

function Format-PerfComment {
<#
.SYNOPSIS
Expand All @@ -756,6 +808,8 @@ function Format-PerfComment {
(Get-PerfMicroComparison output), or $null when not run.
.PARAMETER MainFloor Aggregated baseline low-mutation skip-floor metrics, or $null.
.PARAMETER PrFloor Aggregated PR-head low-mutation skip-floor metrics, or $null.
.PARAMETER MainKeyed Aggregated baseline keyed-list workload metrics, or $null.
.PARAMETER PrKeyed Aggregated PR-head keyed-list workload metrics, or $null.
.PARAMETER Context Hashtable: Percent, Duration, Reps, Warmup, SkipFloorPercent,
BaseSha, HeadSha, Runner, Cpu, Cores, MemoryGB, RunUrl,
Timestamp, Note.
Expand All @@ -768,6 +822,8 @@ function Format-PerfComment {
[AllowNull()][object[]]$Micro,
[AllowNull()][pscustomobject]$MainFloor,
[AllowNull()][pscustomobject]$PrFloor,
[AllowNull()][pscustomobject]$MainKeyed,
[AllowNull()][pscustomobject]$PrKeyed,
[Parameter(Mandatory)][hashtable]$Context
)

Expand Down Expand Up @@ -842,6 +898,14 @@ function Format-PerfComment {
& $add ''
}

# ── Keyed-list workload table (StressPerf.KeyedList) ─────────────────────
# A separate macro workload driving the child reconciler's KEYED arm
# (ReconcileKeyed → ReconcileKeyedMiddle, the LIS minimal-move pass) that the
# positional StocksGrid cells above never reach. The sensitive macro signal for
# keyed-diff optimizations. Rendered only when both keyed aggregates are present.
$keyedPct = if ($Context.ContainsKey('Percent')) { [double]$Context.Percent } else { 50 }
foreach ($kline in (Format-PerfKeyedListSection -MainKeyed $MainKeyed -PrKeyed $PrKeyed -Percent $keyedPct)) { & $add $kline }

# ── Reconciler micro-benchmarks (ns-resolution, WinUI-undiluted) ──────────
# Rendered only when the PerfBench.ControlModel micro leg produced results for
# both sides. Resolves Core/Reconciler time + allocation deltas the macro
Expand Down
13 changes: 12 additions & 1 deletion tests/stress_perf/ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,8 @@ git worktree remove ../main
| `-MicroIterations` | `10000` | Inner iterations per repetition inside each micro-bench (amortises timer resolution). |
| `-IncludeSkipFloor` | `$true` | Run a **second interleaved A/B leg** at `-SkipFloorPercent` and append a low-mutation skip-floor table (compare mode). Set `$false` to skip it (halves the macro runtime). |
| `-SkipFloorPercent` | `0` | Mutation percent for the skip-floor leg. At `0` the workload still mutates one cell/tick (`StockDataSource.Update` clamps the count to `Math.Max(1, …)`), so reconcile/diff isolate the O(n) per-tick child skip-walk floor the 50% leg dilutes. |
| `-Apps` | `ReactorOptimized,Direct` | Single-tree mode only: which harnesses to run. |
| `-IncludeKeyedList` | `$true` | Run a **third interleaved A/B leg** on `StressPerf.KeyedList` — a ~500-row stably keyed list reordered/inserted/removed each tick — and append its own table (compare mode). Drives the child reconciler's **keyed arm** (`ReconcileKeyed` → `ReconcileKeyedMiddle`, the LIS minimal-move pass) the positional StocksGrid cells never reach. Build is best-effort; set `$false` to skip the leg. |
| `-Apps` | `ReactorOptimized,Direct` | Single-tree mode only: which harnesses to run (`ReactorOptimized`, `Direct`, `KeyedList`). |
| `-Platform` | host arch | Target architecture (`x64` or `ARM64`). Defaults to your machine's native arch so the WinUI harness runs without emulation. |
| `-SelfContained` | `$true` | Build with the bundled WinApp runtime (no machine-wide install). |
| `-SkipBuild` | off | Reuse existing binaries (skip `dotnet build`). |
Expand Down Expand Up @@ -224,6 +225,16 @@ Several tables plus footnotes:
`main` vs PR with the same paired-CI band. Rendered only when the harness
reports the metric (n/a for pre-metric PR heads). This is the table that moves
for allocation-reduction PRs.
- **Keyed-list workload (`StressPerf.KeyedList`)** — the four headline metrics
from a **third interleaved A/B leg** on a separate ~500-row **stably keyed** list
whose rows are reordered / inserted / removed each tick. Because every child
carries a key, the child reconciler takes its **keyed arm** (`ReconcileKeyed` →
`ReconcileKeyedMiddle`, the LIS-based minimal-move pass) instead of the positional
re-walk the StocksGrid tables measure — so this is the sensitive macro signal for
**keyed-diff** optimizations (keyed-list diff, keyed structural-skip) that the
positional cells can never exercise. Same paired-CI gating as Table 1; omitted
when `-IncludeKeyedList $false`, the workload build fails, or a side produces no
metrics.
- **Reconciler micro-benchmarks** — per-bench `ns/op` and `B/op` from the
`PerfBench.ControlModel` micro-suite (M1–M13), `main` vs PR. ns-resolution and
WinUI-undiluted, so it resolves Core/Reconciler time and allocation deltas the
Expand Down
Loading
Loading