DO-NOT-MERGE: #657 re-measure on post-#665 main by azchohfi · Pull Request #724 · microsoft/microsoft-ui-reactor

azchohfi · 2026-06-27T03:51:27Z

Throwaway measurement PR. Rebases #657 (keyed-list diff alloc, head 13679c9) onto current origin/main (b9ace1e = #692+M14+#665+#649) to re-measure the keyed-list block on the fresh baseline. DO NOT MERGE — origin #657 stays pristine. Closed after /perf completes.

Hot keyed list-diff path (grid steady state) allocated heavily and missed fast-paths even when keys were unchanged. Eliminate per-diff allocations and add the missing fast-path, preserving diff behavior exactly. ChildReconciler.cs: - Keyed prefix/suffix loops now take the Element.CanSkipUpdate early-exit that the positional path has, so stable keyed rows no longer re-diff every tick (#30); cache children.Count once instead of re-reading the COM IVector.get_Size per suffix iteration (#37). - Replace HashSet-returning ComputeLIS with allocation-free ComputeLISInto filling a pooled bool mask; pool tails/tailIndices/predecessors from ArrayPool (#31/#32). Keep a thin ComputeLIS(int[]) wrapper for tests. - Pool ReconcileKeyedMiddle's working arrays (ArrayPool) and the two key->index maps (re-entrancy-safe ThreadStatic dict pool); buffers cleared and returned on every exit (#33). - Filter: count-pass + single Element[] fill, no List+ToArray (#36). - GetKey: cache Type.Name via ConcurrentDictionary (#38). KeyedListDiff.cs: - Rent newKeys from ArrayPool<string>, threading explicit newCount (rented array may be larger); returned clearArray:true in a finally (#2). - Move the no-op (SequenceEqual) + empty/empty fast paths ABOVE the duplicate scan so the steady-state grid case never allocates/scans a dup set (#1). - Fold the churn-bailout decision into ApplyGeneral, computed from the same post-prefix/suffix scratch map the general walk builds (diff-range churn == full-range churn), removing the O(2n) null-marker pre-pass (#34). - Pool the doomed ReactorRow[] (ArrayPool, descending IComparer sort over [0,removes), cleared+returned) (#3). - Lazy-allocate movedRows only on an actual move under an ambient (#35). - In-place SyncLastKeysToSource instead of LastKeys Clear()+Add (#10). - HasDuplicates reuses state.Scratch (TryAdd) for 4+ keys instead of a fresh HashSet (#11); cached null-key diagnostic sample (#41). Tests: add pooling non-corruption coverage (rented-buffer-larger-than-count, interleaved independent states, randomized stress vs oracle with survivor identity, pooled doomed removes, churn with shared prefix+suffix, Scratch reuse for the dup scan) and ComputeLISInto bool-mask coverage. Full Reactor.Tests suite green (9693 passed); core lib Release build is warning-clean (AOT/trim). Closes #653 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Review follow-ups on #657 (keyed list-diff allocation work): - ChildReconciler.ReconcileKeyedMiddle: return the pooled matched and inLis bool[] buffers with clearArray:false instead of true (Copilot review). Both are value-typed (no reference pinning) and have their used range fully (re)initialized before any read - matched via the Array.Clear on rent, inLis via ComputeLISInto leading clear - so the full-array wipe on return was avoidable O(rented-capacity) work on the hot path. Matches how the int[] newToOld already returns. Reference-typed pooled buffers (string[]/ReactorRow[] in KeyedListDiff) still clear. - Tests (ChildReconcilerLisIntoTests): replace the circular oracle that compared ComputeLISInto against ComputeLIS (now a thin wrapper over it) with an independent brute-force LIS-length DP that honors the -1 unmapped sentinel, asserting the mask marks a valid strictly-increasing subsequence of maximal length. - Tests (ChildReconcilerKeyedSkipTests, new): cover the #30 keyed CanSkipUpdate fast path - an identical keyed list skips every row with no ops and no child-control access, across repeated stable frames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-27T03:51:35Z

/perf

+            foreach (var k in next)
+                if (s.ByKey.TryGetValue(k, out var row)) survivorsBefore[k] = row;


github-actions · 2026-06-27T04:31:15Z

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs `main` baseline

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	2.48	2.61	+2.4% _{95% CI [-3.6, +8.5]}	≈ within noise
Avg Reconcile (ms) ↓	130.0	130.9	-1.5% _{95% CI [-3.8, +0.8]}	≈ within noise
Avg Diff (ms) ↓	119.8	118.1	-1.2% _{95% CI [-3.6, +1.2]}	≈ within noise
Avg Memory (MB) ↓	283.8	284.6	0.0% _{95% CI [-0.8, +0.8]}	≈ within noise

Low-mutation skip-floor (`--percent 0`)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	16.36	15.91	-1.6% _{95% CI [-10.7, +7.6]}	≈ within noise
Avg Reconcile (ms) ↓	37.5	37.9	+5.4% _{95% CI [-8.0, +18.8]}	≈ within noise
Avg Diff (ms) ↓	35.4	35.9	+5.5% _{95% CI [-8.2, +19.2]}	≈ within noise
Avg Memory (MB) ↓	267.0	266.3	-0.1% _{95% CI [-0.4, +0.3]}	≈ within noise

Allocation (Reactor) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	5774436	5773474	+0.1% _{95% CI [-1.0, +1.3]}	≈ within noise
Gen0 GC / 1k renders ↓	230.77	230.77	-0.3% _{95% CI [-10.9, +10.2]}	≈ within noise

Keyed-list workload (`StressPerf.KeyedList`, `--percent 50`)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	20.94	20.90	-1.3% _{95% CI [-3.4, +0.7]}	≈ within noise
Avg Reconcile (ms) ↓	16.0	15.6	+0.8% _{95% CI [-1.3, +2.9]}	≈ within noise
Avg Diff (ms) ↓	15.7	15.4	+0.4% _{95% CI [-1.6, +2.5]}	≈ within noise
Avg Memory (MB) ↓	168.9	172.2	+1.9% _{95% CI [+1.4, +2.4]}	⚠️ regression

Allocation (keyed-list) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	313777	217985	-30.5% _{95% CI [-30.8, -30.2]}	✅ improvement
Gen0 GC / 1k renders ↓	17.78	13.61	-22.4% _{95% CI [-30.0, -14.7]}	✅ improvement

Reconciler micro-benchmarks (`PerfBench.ControlModel`)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench	`main` ns/op	Δ ns (95% CI)	`main` B/op	Δ alloc (95% CI)	Status
`M1` Mount_Leaf_NoCallback	148354.3	+0.8% _{95% CI [-0.2, +1.9]}	1140.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M2` Mount_Leaf_OneCallback	108358.3	-2.8% _{95% CI [-8.0, +2.4]}	3383.3	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M3` Mount_Leaf_ThreeCallbacks	217328.5	0.0% _{95% CI [-4.0, +4.1]}	8460.3	+0.1% _{95% CI [-2.6, +2.7]}	≈ within noise
`M4` Dispatch_Switch_Cold	104469.9	-2.2% _{95% CI [-4.5, +0.2]}	1767.8	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M5` Dispatch_Switch_Warm	106110.0	-2.8% _{95% CI [-7.0, +1.5]}	1766.0	-0.4% _{95% CI [-1.8, +1.1]}	≈ within noise
`M6` Dispatch_ExternalType	90500.7	+3.1% _{95% CI [-0.7, +7.0]}	987.6	-0.6% _{95% CI [-3.2, +2.0]}	≈ within noise
`M7` Update_NoChange	55148.5	+0.8% _{95% CI [-0.1, +1.6]}	452.1	+0.7% _{95% CI [-7.1, +8.4]}	≈ within noise
`M8` Update_OneLeafChanged	41393.3	-0.2% _{95% CI [-1.8, +1.5]}	536.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M9` Update_AllChanged	2805582.0	-0.3% _{95% CI [-1.4, +0.8]}	184278.1	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M10` EventHandlerState_Alloc	85233.1	-1.6% _{95% CI [-2.8, -0.5]}	3095.2	0.0% _{95% CI [0.0, +0.1]}	≈ within noise
`M11` ModifierEHS_Frequency	45870.8	+0.4% _{95% CI [-1.0, +1.9]}	638.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M12` Pool_Rent_HotPath	116699.7	+0.6% _{95% CI [-0.2, +1.5]}	1099.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M13` Setters_Suppression_Scope	96.8	-0.3% _{95% CI [-9.6, +9.0]}	26.7	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M14` Dsl_Rebuild_Cascade	1515895.0	-0.3% _{95% CI [-1.5, +1.0]}	2231828.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`C207` ChangeHandler_DpRead_Coalesce	1228.7	-5.0% _{95% CI [-8.9, -1.1]}	0.6	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OAlloc` Optional_Element_Alloc	216.6	-2.4% _{95% CI [-7.6, +2.9]}	528.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OUpdate` Optional_Reconciler_Update	12072.2	+1.1% _{95% CI [0.0, +2.2]}	2772.3	0.0% _{95% CI [0.0, 0.0]}	≈ within noise

Cross-framework reference (same StocksGrid workload)

Metric	vanilla WinUI3¹	Rust `windows-reactor`²	Reactor (this PR)
Renders/sec ↑	3.27	4.90	2.61
Avg Reconcile (ms) ↓	n/a	18.5	130.9
Avg Diff (ms) ↓	n/a	16.1	118.1
Avg Memory (MB) ↓	264.8	195.7	284.6

_{↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.}
_{Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).}
_{Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.}
_{¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.}
_{² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.}
_{Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.}
_{Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042996823.}
_{Generated by .github/workflows/perf-compare.yml · PR 03db34e vs main b9ace1e · 2026-06-27T04:31:11Z · run log.}

azchohfi and others added 2 commits June 26, 2026 20:48

github-code-quality Bot found potential problems Jun 27, 2026

View reviewed changes

Comment thread tests/Reactor.Tests/Internal/KeyedListDiffPoolingTests.cs

Comment on lines +284 to +285

foreach (var k in next)

if (s.ByKey.TryGetValue(k, out var row)) survivorsBefore[k] = row;

azchohfi closed this Jun 27, 2026

azchohfi deleted the temp-657-remeasure branch June 27, 2026 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO-NOT-MERGE: #657 re-measure on post-#665 main#724

DO-NOT-MERGE: #657 re-measure on post-#665 main#724
azchohfi wants to merge 2 commits into
mainfrom
temp-657-remeasure

azchohfi commented Jun 27, 2026

Uh oh!

azchohfi commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		foreach (var k in next)
		if (s.ByKey.TryGetValue(k, out var row)) survivorsBefore[k] = row;

Uh oh!

Conversation

azchohfi commented Jun 27, 2026

Uh oh!

azchohfi commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

⚡ Reactor perf comparison

Regression vs main baseline

Low-mutation skip-floor (--percent 0)

Allocation (Reactor) — lower is better

Keyed-list workload (StressPerf.KeyedList, --percent 50)

Reconciler micro-benchmarks (PerfBench.ControlModel)

Cross-framework reference (same StocksGrid workload)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Regression vs `main` baseline

Low-mutation skip-floor (`--percent 0`)

Keyed-list workload (`StressPerf.KeyedList`, `--percent 50`)

Reconciler micro-benchmarks (`PerfBench.ControlModel`)