DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main by azchohfi · Pull Request #719 · microsoft/microsoft-ui-reactor

azchohfi · 2026-06-27T01:01:42Z

DO NOT MERGE. Throwaway measurement PR only.

Purpose: official /perf re-measure of #665 against the M14 Dsl_Rebuild_Cascade micro, with the baseline now INCLUDING #692 (flagship reconciler self-merge guard, merged to main as c497c94) + M14 (#711). This is #665's 8 commits rebased cleanly onto current origin/main. Build 0/0, Reactor.Tests 9754 pass. Origin #665 stays pristine; this branch is deleted after the run.

Restores the steady-state ShallowEquals -> CanSkipUpdate fast path for interactive/selectable grid cells (it was defeated for any cell carrying an event handler or a fluent .Set chain) and removes the dominant per-cell, per-frame allocations in the Element/DSL layer. FLAGSHIP-1 (Element.cs): replace the "any handler present => never skip" rule in ModifiersEqual with ModifierCallbacksEqual, a per-slot ReferenceEquals over all 27 callback/gesture/drag-drop slots. Reference-stable handlers now skip; capturing closures still force Update. Per-slot reference equality (not a pure presence bitmask) is the safe predicate because modifier handlers dispatch through ModifierEventHandlerState.Current*, which is refreshed only on the non-skip Update path -- presence-only would strand a stale closure on a skipped cell. Also closes a latent hole where 21 of the slots were ignored. FLAGSHIP-2 (Element.cs): add SettersEqual<T> (length + element-wise ReferenceEquals) and route all 51 ShallowEquals arms through it instead of ReferenceEquals(Setters). The .Set helpers append to a fresh array each render ([.. Setters, configure]); element-wise compare lets an unchanged chain of non-capturing lambdas / method groups (compiler-cached statics) skip again. _emptyLayout/_emptyVisual sentinels; compare LayoutModifiers by exact record equality and VisualModifiers field-by-field (brushes via BrushesEqual). Also fixes latent skip bugs: RequestedTheme/logical-inset and Scale/Rotation/Translation/CenterPoint were previously not compared. SingleAttachedDictionary for the common one-attached-value case (one .Grid per cell), avoiding a full Dictionary<Type,object> alloc per cell per frame. Color via GetBrush(Color), replacing one new SolidColorBrush (a thread-affine WinRT DependencyObject) per Parse call. Single shared brush cache. Star-track loop; InterspersedGrid exact-sized arrays; FilterChildren two-pass count+fill; shared static s_oneStar cross-axis track. #160 already satisfied (ShallowEquals is type-switch-first). Behavior preserved exactly (same render output, same events fire). Validated: 9693 unit tests pass; Modifier/Reconciler/Skip/Event/DataGrid/Flex selftests 0 failures; core lib Release x64 clean (0 AOT/trim warnings-as-errors). New xUnit coverage in PerfDiffSkipPathTests.cs and DeclarativeModifierTests.cs. Closes #664 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…LAGSHIP-2 safety revert Expands the diff/memory PR into the ElementExtensions fluent-modifier layer and makes the FLAGSHIP-2 setter comparison safe-conservative. FLAGSHIP-2 (safety revert): SettersEqual now returns true only for the same array instance (memoized element) or both-empty. Element-wise delegate-identity compare was unsafe: a .Set(...) is an apply-time imperative write whose effect can read mutable external state, so identity-equality does NOT imply the write is unnecessary. Setters keep the element on the Update path whenever any are present. LazyVStack/HStack scroll/repeater setter compares routed through SettersEqual (both-empty now compares equal). FLAGSHIP-1 handler ref-equality is unaffected (handlers dispatch later via ModifierEventHandlerState.Current*). #165/#157 — ModifyLayout/ModifyVisual bucket-merge entry points. ~36 pure-layout and pure-visual fluent modifiers (Margin/Padding/Width/Height/Min*/Max*/Align/ Opacity/Scale/Rotation/CenterPoint/CornerRadius/Background/Foreground/WithBorder/ Translation) merge their delta straight into the Layout/Visual slot instead of allocating a throwaway parent ElementModifiers (and its bucket shim) per chained call. Semantics are identical to Modify(el, new ElementModifiers { <field> }): proven against ElementModifiers.Merge and asserted by *_Chain_Equals_HandBuilt tests. #168 (brush cache wiring) — public BrushHelper.Parse(string) still returns a fresh, caller-owned brush (unchanged vs main). New internal ParseShared(string) returns a shared per-thread per-ARGB brush; Background/Foreground/WithBorder(string) hot paths route through it, so thousands of identical cells reuse one brush instead of allocating a WinRT DependencyObject per cell per render. Pure allocation win — the diff compares brushes structurally (BrushesEqual = color+opacity), so behavior is unchanged. #174 (ApplyStyle cache) — cache the OnMount delegate per style name so an unchanged .ApplyStyle("X") chain reuses one delegate instead of a fresh capturing closure + display class per render. Memory-only: OnMountAction is excluded from ModifiersEqual (runs at mount only), so this is not a skip-path change. Out of scope / deferred: #167 ThemeBindings (Reconciler ripple); #158/#161-164. Validation: core lib Release x64 0 warnings/0 errors; Reactor.Tests 9704 pass/0 fail/ 64 skip (PerfDiffSkipPathTests 31/31); selftests Modifier/Skip/Event/Brush/DataGrid/ Flex 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ernal; strengthen Dsl tests Addresses pr-review findings (multi-model cross-check + panel) on the draft: - correctness (HIGH, found by multi-model gpt-5.4 cross-check): UniformGrid wrote the positioned `.Grid(...)` cells back into the array returned by FilterChildren. After #173, FilterChildren's no-null fast path returns the caller's `items` array ALIASED (no copy), so an explicit `UniformGrid(orientation, myArray)` call corrupted the caller's array with Grid-positioned wrappers — a regression vs main (where FilterChildren always copied). UniformGrid now positions into its own Element[] and never mutates `filtered`. Other 16 FilterChildren consumers only pass the result as read-only Children, and InterspersedGrid already builds its own array, so this was the sole offender. Allocation-neutral on the hot (no-null) path vs main. + regression test asserting the source array is untouched. - api-ergonomics / packaging / alternative-solution / security (converged): make BrushHelper.GetBrush(Color) internal. It was a NEW public method (main never exposed it) returning a shared, mutable SolidColorBrush from a [ThreadStatic] cache — a public-surface + shared-mutable-brush footgun. Only ParseShared (same assembly) calls it; confirmed no external references repo-wide. Docs updated so the public entry point is Parse(string) (fresh, caller-owned). - test-coverage: #170 ForEach tests now assert rendered Content/index sequence and empty-input behavior (not just counts); added headless-safe BrushHelper.ParseColor coverage for the color path behind the shared-brush modifiers. Validation: core lib Release x64 0/0; Reactor.Tests 9711 pass/0 fail/64 skip (PerfDiffSkipPathTests 38/38); selftests Grid/Reconciler/Modifier/Skip/Event/Brush/ DataGrid/Flex 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…h sharing Review-pass fixes on PR #665 (pr-review skill + gpt-5.4 multi-model cross-check): H1 (correctness, HIGH) - ModifiersEqual let an element with a stable .OnTapped take the skip path without comparing OnUnmountAction. The skip path bypasses ApplyModifiers, which is what re-registers the latest teardown in the reconciler's _onUnmountActions on every Update; a changed .OnUnmount closure was therefore stranded and the stale first-render teardown would fire at unmount. Fixed by comparing OnUnmountAction by ReferenceEquals (mirrors the handler/.Ref slots). Zero grid-workload cost (cells have null teardown both renders). The plain-leaf no-handler case was a pre-existing latent bug on main; this also closes it (strictly more correct). Regression test added. H2 (multi-model, HIGH) - reverted #168 shared-brush sharing. main deliberately creates a fresh SolidColorBrush per call because a brush is a thread-affine DependencyObject with mutable Color/Opacity and cannot be safely shared across controls; an imperative .Set(tb => mutate tb.Background) would otherwise leak to every peer cell and poison the cache. Removed ParseShared/GetBrush/_brushCache; Background/Foreground/WithBorder(string) now use the fresh-brush Parse. Kept the safe ParseColor string->Color cache (pre-existing on main) and the #165/#157 ModifyVisual bucket-merge alloc win. Test-coverage (MEDIUM) - added a 27-slot ModifierCallbacksEqual sweep proving every event/gesture/drag-drop handler slot participates by reference (guards against a forgotten or mis-paired slot). Validation: src/Reactor Release x64 0 warn / 0 err (AOT-as-errors gate); tests/Reactor.Tests 9713 pass / 0 fail / 64 skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…s header Copilot review (comment on 783030a) flagged the test file's header comment still claimed a '#168 brush-cache' validated by the selftest/AppTests tiers. After the shared-brush revert, BrushHelper caches only the parsed Color and Parse returns a fresh brush per call, so there is no shared brush instance to validate on a real thread. Reworded the header to match. Comment-only change; 9713 pass / 0 fail unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Locks the realistic DataGrid cell shape (reference-stable OnPointerPressed + OnTapped, all other routed slots null) at the ShallowEquals/element level: the cell stays skip-eligible across renders, but declines the skip the instant any secondary routed slot (OnDoubleTapped/OnGotFocus/OnLostFocus) changes identity, so that slot's ModifierEventHandlerState.Current* is refreshed instead of dispatching a stale closure. Complements the per-slot 27-way sweep, which already proves ModifierCallbacksEqual compares the full Current*-dispatched set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ModifierCallbacksEqual added 6 gesture/drag slots (Pan/Pinch/Rotate/ LongPress/DragSource/DropTarget) that the historical diff never compared. Those dispatch through separate per-element gesture/drag state, not ModifierEventHandlerState.Current*, so comparing them forced Update where the framework previously skipped — re-arming an in-flight gesture mid-interaction. For a held long-press this re-registered the handler between its Began and Ended phases, so the released callback fired against a refreshed closure and double-dispatched (E2E GestureTests .Interactive_OnLongPress_FiresAfterHold: count 2 vs expected 1). Narrow the predicate to the 21 routed-input handlers (the actual grid-cell skip lever), matching main's gesture behavior exactly and preserving observable behavior. Grid cells use only routed handlers, so the perf win is unaffected. Tests: replace the 27-slot sweep with a 21-slot sweep + a behavior-preservation guard proving gesture/drag closures that change identity stay skip-eligible (so a future contributor can't silently re-add them and reintroduce the double-fire). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot review on a6adfdd flagged that AttachedEqual's foreach iterates the IReadOnlyDictionary interface, so SingleAttachedDictionary's yield-based enumerator allocates an IEnumerator per diff — partially defeating #155's per-cell allocation cut for the common one-.Grid-per-cell case. Special-case SingleAttachedDictionary on either side and compare the lone slot via TryGetValue (Count is already known equal), so the hot path is allocation free. Behavior-preserving: same equality result the foreach produced. Tests: AttachedEqual_SingleEntry_FastPath_Matches_Semantics (incl. asymmetric b-is-Single branch) + AttachedEqual_SingleEntry_DoesNotAllocate (10k calls, asserts < 2KB via GC.GetAllocatedBytesForCurrentThread). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-27T01:01:51Z

/perf

+                foreach (var gc in group.Children)
+                {
+                    if (gc is not null and not EmptyElement)
+                        count++;
+                }


github-actions · 2026-06-27T01:43:20Z

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs `main` baseline

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	2.55	2.47	-1.9% _{95% CI [-8.2, +4.4]}	≈ within noise
Avg Reconcile (ms) ↓	140.9	141.3	+1.4% _{95% CI [-2.5, +5.4]}	≈ within noise
Avg Diff (ms) ↓	124.5	127.6	+3.0% _{95% CI [-1.1, +7.2]}	≈ within noise
Avg Memory (MB) ↓	288.5	285.7	-0.4% _{95% CI [-1.3, +0.6]}	≈ within noise

Low-mutation skip-floor (`--percent 0`)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	15.81	15.65	-2.1% _{95% CI [-9.5, +5.4]}	≈ within noise
Avg Reconcile (ms) ↓	40.3	38.7	+2.7% _{95% CI [-8.1, +13.4]}	≈ within noise
Avg Diff (ms) ↓	37.8	36.2	+2.8% _{95% CI [-8.2, +13.8]}	≈ within noise
Avg Memory (MB) ↓	266.8	266.5	-0.1% _{95% CI [-0.8, +0.5]}	≈ within noise

Allocation (Reactor) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	5766814	5836460	+0.7% _{95% CI [-0.8, +2.1]}	≈ within noise
Gen0 GC / 1k renders ↓	285.71	262.93	-10.8% _{95% CI [-22.1, +0.4]}	≈ within noise

Keyed-list workload (`StressPerf.KeyedList`, `--percent 50`)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	20.03	20.12	+1.0% _{95% CI [-2.0, +4.0]}	≈ within noise
Avg Reconcile (ms) ↓	16.6	16.5	-1.9% _{95% CI [-4.1, +0.2]}	≈ within noise
Avg Diff (ms) ↓	16.5	16.3	-1.8% _{95% CI [-4.0, +0.3]}	≈ within noise
Avg Memory (MB) ↓	168.9	168.9	0.0% _{95% CI [-0.9, +0.9]}	≈ within noise

Allocation (keyed-list) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	314713	313059	-0.6% _{95% CI [-0.9, -0.2]}	✅ improvement
Gen0 GC / 1k renders ↓	15.19	15.12	+0.7% _{95% CI [-11.1, +12.4]}	≈ within noise

Reconciler micro-benchmarks (`PerfBench.ControlModel`)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench	`main` ns/op	Δ ns (95% CI)	`main` B/op	Δ alloc (95% CI)	Status
`M1` Mount_Leaf_NoCallback	152070.4	-1.1% _{95% CI [-4.3, +2.2]}	1140.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M2` Mount_Leaf_OneCallback	111143.4	+2.1% _{95% CI [-2.6, +6.8]}	3383.3	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M3` Mount_Leaf_ThreeCallbacks	234028.1	-0.3% _{95% CI [-2.8, +2.2]}	8485.6	-1.6% _{95% CI [-3.2, -0.1]}	≈ within noise
`M4` Dispatch_Switch_Cold	115800.3	-0.8% _{95% CI [-2.4, +0.8]}	1767.8	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M5` Dispatch_Switch_Warm	119592.2	-0.5% _{95% CI [-6.9, +6.0]}	1766.0	+0.8% _{95% CI [-1.3, +2.8]}	≈ within noise
`M6` Dispatch_ExternalType	91811.4	+1.8% _{95% CI [-2.8, +6.4]}	987.6	+1.5% _{95% CI [-2.2, +5.2]}	≈ within noise
`M7` Update_NoChange	56826.5	-2.0% _{95% CI [-2.8, -1.1]}	452.1	-2.0% _{95% CI [-12.6, +8.6]}	≈ within noise
`M8` Update_OneLeafChanged	42156.9	+0.2% _{95% CI [-0.7, +1.2]}	536.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M9` Update_AllChanged	2920899.4	-1.4% _{95% CI [-3.6, +0.7]}	184278.1	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M10` EventHandlerState_Alloc	91718.8	+2.3% _{95% CI [+0.1, +4.4]}	3095.2	0.0% _{95% CI [-0.1, +0.1]}	≈ within noise
`M11` ModifierEHS_Frequency	47625.3	+0.5% _{95% CI [-0.5, +1.6]}	638.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M12` Pool_Rent_HotPath	118637.6	+0.1% _{95% CI [-1.5, +1.7]}	1099.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M13` Setters_Suppression_Scope	160.1	+3.0% _{95% CI [-6.3, +12.3]}	26.7	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M14` Dsl_Rebuild_Cascade	1899232.5	-13.2% _{95% CI [-14.6, -11.9]}	2915828.9	-23.5% _{95% CI [-23.5, -23.5]}	✅ improvement
`C207` ChangeHandler_DpRead_Coalesce	1433.9	+1.4% _{95% CI [-2.5, +5.3]}	0.6	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OAlloc` Optional_Element_Alloc	222.5	-2.6% _{95% CI [-8.2, +2.9]}	528.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OUpdate` Optional_Reconciler_Update	12252.4	+4.0% _{95% CI [+2.1, +6.0]}	2676.3	+3.6% _{95% CI [+3.6, +3.6]}	⚠️ regression

Cross-framework reference (same StocksGrid workload)

Metric	vanilla WinUI3¹	Rust `windows-reactor`²	Reactor (this PR)
Renders/sec ↑	3.21	4.75	2.47
Avg Reconcile (ms) ↓	n/a	20.0	141.3
Avg Diff (ms) ↓	n/a	17.7	127.6
Avg Memory (MB) ↓	264.0	197.3	285.7

_{↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.}
_{Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).}
_{Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.}
_{¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.}
_{² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.}
_{Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.}
_{Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042987674.}
_{Generated by .github/workflows/perf-compare.yml · PR bd16d0e vs main 023d879 · 2026-06-27T01:43:17Z · run log.}

azchohfi and others added 8 commits June 26, 2026 17:58

github-code-quality Bot found potential problems Jun 27, 2026

View reviewed changes

Comment thread src/Reactor/Elements/Dsl.cs

Comment on lines +1958 to +1962

foreach (var gc in group.Children)

{

if (gc is not null and not EmptyElement)

count++;

}

azchohfi closed this Jun 27, 2026

azchohfi deleted the temp-665-post692 branch June 27, 2026 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main#719

DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main#719
azchohfi wants to merge 8 commits into
mainfrom
temp-665-post692

azchohfi commented Jun 27, 2026

Uh oh!

azchohfi commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

azchohfi commented Jun 27, 2026

Uh oh!

azchohfi commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

⚡ Reactor perf comparison

Regression vs main baseline

Low-mutation skip-floor (--percent 0)

Allocation (Reactor) — lower is better

Keyed-list workload (StressPerf.KeyedList, --percent 50)

Reconciler micro-benchmarks (PerfBench.ControlModel)

Cross-framework reference (same StocksGrid workload)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Regression vs `main` baseline

Low-mutation skip-floor (`--percent 0`)

Keyed-list workload (`StressPerf.KeyedList`, `--percent 50`)

Reconciler micro-benchmarks (`PerfBench.ControlModel`)