DO-NOT-MERGE: re-validate #666 on post-flagship main by azchohfi · Pull Request #727 · microsoft/microsoft-ui-reactor

azchohfi · 2026-06-27T04:47:51Z

Throwaway re-validation of #666 on origin/main b9ace1e (=#692+M14+#665+#649). DO NOT MERGE; origin #666 pristine.

Reduces redundant WinRT/COM-RCW boundary crossings and per-element allocations on the diff/reconcile/patch hot path (data-grid stress workload, StocksGrid). Targets the dominant Reconcile gap. Reconciler.cs / Update.cs / Mount.cs: - #16 UnmountRecursive: read ReactorAttached.State once (was 5x GetValue). - #17 UnmountAndCollect: read element tag once (was 5x GetElementTag). - #14 Update: coalesce repeated `is FrameworkElement` RCW type-tests to one. - #15 Cache the component rerender delegate on ComponentNode; recreate only when the requestRerender identity changes. - #18 UpdateDefaultAutomationName: ref/ordinal short-circuit before the AutomationProperties.GetName COM read. - #19 ApplyModifiers accepts null = clear-all (no fresh ElementModifiers). - #20 PopulateDirtyAncestorPath iterates a tracked self-triggered set instead of the whole _componentNodes dict. - #21 UnmountAndPool reuses a borrowed scratch list (reentrant calls fall back to a fresh list). - #22 RefreshRealizedContainers reuses a cleared field list. - #23 ApplyStyleToElement ref-equal + ActualTheme guard; ApplyThemeBindings gated on the ShallowEquals fast path. - #24 BuildCacheKey: thread-static StringBuilder + pair buffer, single pass. - #25 HasConsumedContextChanged compares via Context<T>/EqualityComparer<T> (no boxing of value-type context). - #26 ApplyResourceOverrides: no ToHashSet/ToArray, ref-equal fast path. - #27 WireReferenceListEdge: static loop, no per-edge List/closures. - #28 ApplyStaggerDelays: hoisted static animation-key array. - #29 PushContextDisposable: single-pair push overload, pooled PopOnDispose. Reconciler.KeyedItemsBinding.cs: - #39 ref-equality cache; only realloc + SetItemViewSource on change. Reconciler.Mount.cs: - #40 LayoutFootgunDetector gated by GridElement pre-filter. AccessibilityScanner.cs (debug-only scan path): - #62 compute GetChildren once in Walk, pass into Push. - #63 cache sibling-text array in ScanContext, invalidate on Push. - #64 BuildContext single-pass with pooled scratch (no LINQ chains). - #65 IsLikelyEmoji explicit span loop (no LINQ). - #66 CheckTabIndexGaps sorts a rented buffer (no OrderBy/ToList). Behavior preserved exactly: coalesced COM reads observe the same values at the same times; unmount/cleanup ordering unchanged; scanner emits identical findings. AOT/trim clean (no new reflection). Adds tests/Reactor.Tests/ReconcilerInteropPerfTests.cs (11 tests: rerender-delegate caching, context-change compare, cache-key reuse, single-pair context push, scanner output parity). Closes #654 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Review-pass fixes for PR #666 (issue #654), all behavior-preserving and within the assigned reconcile-path files: - Context #25 (H1): CurrentValueEquals now reproduces the old object.Equals semantics EXACTLY — reference types use the virtual object.Equals (so an IEquatable<T> type without an object.Equals override keeps reference-equality change detection instead of collapsing distinct instances and skipping a required context rerender), value types keep the no-box EqualityComparer<T> path but guard the unbox with `is T` so a same-slot context-type swap returns false (as before) rather than throwing InvalidCastException. - #26 ApplyResourceOverrides (CO2): removal is driven by `managed` vs new overrides, not an old-vs-new ref comparison; the shallow-skip fast path calls this with old===new yet still must reconcile away stale managed keys. - #20 self-triggered set (CO3): add an Unmounted tombstone so a cached rerender closure firing after teardown can't resurrect the node into _selfTriggeredNodes (which the dirty-path pass would never clear again). - #16/#17 unmount tag re-read (CO4): the V1/type unmount dispatch re-reads the attached tag AFTER the user .OnUnmount/RunCleanups callbacks (which may detach state via the public APIs), matching the pre-coalesce per-site reads; pre-callback branches still share the single top-of-method read. - #23 ClearThemeBindings (AS1): clear the companion effective-theme marker DP alongside the style marker so the pair stays consistent across pool/reuse. Tests (tests/Reactor.Tests/ReconcilerInteropPerfTests.cs): - #15 rerender-delegate caching: identity reuse vs source-identity recreate; invocation marks SelfTriggered and bubbles to the source; post-unmount tombstone refuses a late self-trigger (#20). - #25 (H1) reference-IEquatable keeps reference-equality; value-type type mismatch returns false without throwing. - Scanner output parity pinned to exact literal messages (#62-#66). GetOrCreateComponentRerender made internal (InternalsVisibleTo Reactor.Tests) as a test seam. Validation: dotnet build src/Reactor -c Release 0/0; tests 9687 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Guards the boxed-Nullable<T> edge raised in Copilot review: a non-null int? boxes to a boxed underlying int, and the value-type `is T` guard (T = int?) still matches it, so an unchanged nullable context value is not spuriously treated as changed. Proves the H1 hybrid CurrentValueEquals preserves the old object.Equals nullable semantics against production code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds Assert.NotNull(wrapped) before invoking the cached rerender delegate in the two CachedRerender_* tests. Asserts the GetOrCreateComponentRerender contract (never returns null) and resolves a CodeQL "useless assignment" false positive (its dataflow did not model the subsequent delegate invocation as a read). Behavior-neutral test hardening. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Review-pass fixes from the pr-review skill (correctness + alternative-solution + test-coverage dimensions) and the multi-model cross-check, all within the existing reconcile interop scope. No behavior change: - Context<T>.CurrentValueEquals (CO-H1, high): the value-type arm routed through EqualityComparer<T>.Default, which dispatches to IEquatable<T> and diverges from the original object.Equals for structs whose IEquatable<T>.Equals disagrees with their object.Equals override. Unify both arms to current.Equals(lastBoxed) — for two non-null operands that is exactly object.Equals — and keep the reference-type ReferenceEquals short-circuit so same-instance / non-reflexive cases stay exact. Still avoids boxing the freshly-read value for primitives/Nullable<T>. Drops the now-unused System.Collections.Generic using. - BuildCacheKey (AS-L1): release the thread-static scratch pair array when an outlier element grows it past a small cap, so it is not pinned for the thread's lifetime. - RefreshRealizedContainers (AS-L2): only retain the realized-scratch list when its capacity stays modest; an oversized one-off refresh is collected, not pinned. - Tests: assert scanner emission ORDER (the parity test sorts first), plus value-type IEquatable-vs-object.Equals divergence and the reference-equality fast-path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Addresses a Copilot review comment on #666: a cached rerender closure (perf #15) handed to a long-lived or cross-thread subscriber can outlive unmount and keep its ComponentNode alive. Drop ComponentNode.Control in ClearSelfTriggered(unmounting:true) so a leaked closure can't also pin the unmounted visual subtree. The dirty-path pass (perf #20) is the only reader of Control and never visits an unmounted node, so this is observably a no-op — it only releases the subtree for GC, reinforcing the #20 tombstone. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…eview) Addresses three Copilot re-review comments on the interop-perf PR, all the same scratch-hygiene pattern already applied to t_cacheKeyPairs (#24) and _realizedScratch (#22): a one-off outlier input can leave a thread-static / instance scratch buffer holding an oversized backing array (or stale element refs) for the rest of the thread/reconciler lifetime. Behavior-neutral — the computed outputs are unchanged, so existing parity tests cover them. - AccessibilityScanner.BuildContext: wrap the t_buildCtxChildren populate/use in try/finally so a mid-population exception can't leave element refs pinned on the thread-static list until the next call. Success-path timing of the clear is unchanged. - BuildCacheKey: mirror the pairs-array retain-cap for the thread-static StringBuilder (t_cacheKeySb) — release it after a one-off unusually large key instead of pinning the grown backing buffer. - UnmountAndPool: mirror the _realizedScratch capacity cap for _unmountPoolScratch — only retain the scratch list when it's still modestly sized, so a one-off large-subtree unmount doesn't pin an oversized array. Validation: dotnet build src/Reactor -c Release -> 0 warnings / 0 errors; dotnet test tests/Reactor.Tests -> 9691 passed / 0 failed / 64 skipped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…heck) A multi-model (gpt-5.4) cross-check of the four reconcile/scan scratch retain-caps flagged the BuildCacheKey site: at the degenerate combined-cap case (an element with >64 theme bindings AND a >1024-char key rendered every frame), discarding the thread-static buffers reallocates a KeyValuePair[] (2x the size of the pre-#24 baseline's string[]) plus a fresh StringBuilder — i.e. WORSE than origin/main, which the perf PR must never be. The size-cap's only job was bounding memory retention, but Array.Clear(pairs) / sb.Clear() already drop the referenced strings/ThemeRefs, so nothing is pinned regardless. Removing the caps makes BuildCacheKey unconditional- retain, which is strictly cheaper than baseline in every case (the dominant StringBuilder backing buffer is reused, not realloc'd) and provably never worse. The other two caps are unchanged — the cross-check CONFIRMED them: _realizedScratch and _unmountPoolScratch discard to EXACTLY the origin/main allocation shape (List<UIElement> / List<FrameworkElement>), so they cannot be worse than baseline, and both sit well above the 50-row StocksGrid steady-state high-water (~64 <= 256) so they never trim-and-regrow on the stress workload. Behavior-neutral: BuildCacheKey return values are unchanged (only buffer-retention differs), so existing parity tests cover it. Validation: dotnet build src/Reactor -c Release -> 0 warnings / 0 errors; dotnet test tests/Reactor.Tests -> 9691 passed / 0 failed / 64 skipped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

azchohfi · 2026-06-27T04:47:53Z

/perf

+
+        // Same instance recorded and current: object.Equals(x, x) == true via the
+        // ReferenceEquals short-circuit even though x.Equals(x) == false. Match it exactly.
+        Assert.Equal(object.Equals(x, x), NonReflexiveCtx.CurrentValueEquals(scope, x));


+                foreach (var c in GetChildren(el))
+                {
+                    if (c is null or EmptyElement) continue;
+                    children.Add(c);
+                    if (childText is null && c is TextBlockElement tb)
+                        childText = tb.Content;
+                }


github-actions · 2026-06-27T05:27:00Z

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs `main` baseline

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	2.62	2.74	+5.3% _{95% CI [-8.7, +19.2]}	≈ within noise
Avg Reconcile (ms) ↓	136.5	133.9	-1.7% _{95% CI [-11.8, +8.3]}	≈ within noise
Avg Diff (ms) ↓	124.2	121.4	-1.3% _{95% CI [-11.5, +8.9]}	≈ within noise
Avg Memory (MB) ↓	284.7	287.1	+0.7% _{95% CI [-0.4, +1.9]}	≈ within noise

Low-mutation skip-floor (`--percent 0`)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	16.46	16.20	-1.1% _{95% CI [-4.6, +2.3]}	≈ within noise
Avg Reconcile (ms) ↓	32.4	31.4	-2.9% _{95% CI [-9.5, +3.7]}	≈ within noise
Avg Diff (ms) ↓	30.5	29.6	-3.0% _{95% CI [-9.8, +3.7]}	≈ within noise
Avg Memory (MB) ↓	268.7	268.7	0.0% _{95% CI [-0.4, +0.3]}	≈ within noise

Allocation (Reactor) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	5758965	5722942	-0.4% _{95% CI [-2.2, +1.4]}	≈ within noise
Gen0 GC / 1k renders ↓	226.50	227.78	+0.2% _{95% CI [-11.9, +12.3]}	≈ within noise

Keyed-list workload (`StressPerf.KeyedList`, `--percent 50`)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	23.11	23.15	-0.3% _{95% CI [-2.8, +2.1]}	≈ within noise
Avg Reconcile (ms) ↓	14.0	13.9	-0.2% _{95% CI [-2.0, +1.6]}	≈ within noise
Avg Diff (ms) ↓	13.8	13.7	-0.5% _{95% CI [-2.3, +1.3]}	≈ within noise
Avg Memory (MB) ↓	169.7	169.8	-0.2% _{95% CI [-0.8, +0.4]}	≈ within noise

Allocation (keyed-list) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	312927	307416	-1.7% _{95% CI [-2.0, -1.5]}	✅ improvement
Gen0 GC / 1k renders ↓	17.13	16.91	-1.2% _{95% CI [-5.6, +3.2]}	≈ within noise

Reconciler micro-benchmarks (`PerfBench.ControlModel`)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench	`main` ns/op	Δ ns (95% CI)	`main` B/op	Δ alloc (95% CI)	Status
`M1` Mount_Leaf_NoCallback	147036.4	-1.2% _{95% CI [-2.2, -0.2]}	1140.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M2` Mount_Leaf_OneCallback	104137.5	+1.0% _{95% CI [-3.6, +5.5]}	3383.3	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M3` Mount_Leaf_ThreeCallbacks	204283.1	+0.7% _{95% CI [-2.5, +4.0]}	8460.3	+0.3% _{95% CI [-1.1, +1.7]}	≈ within noise
`M4` Dispatch_Switch_Cold	98706.6	-0.5% _{95% CI [-2.7, +1.8]}	1767.8	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M5` Dispatch_Switch_Warm	98522.6	+1.9% _{95% CI [-3.2, +7.1]}	1766.0	+0.4% _{95% CI [-1.0, +1.9]}	≈ within noise
`M6` Dispatch_ExternalType	89757.6	+1.0% _{95% CI [-3.1, +5.0]}	987.6	+0.7% _{95% CI [-1.9, +3.4]}	≈ within noise
`M7` Update_NoChange	55017.6	+5.8% _{95% CI [-4.2, +15.8]}	452.1	+0.7% _{95% CI [-7.1, +8.4]}	≈ within noise
`M8` Update_OneLeafChanged	41080.2	0.0% _{95% CI [-1.2, +1.2]}	536.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M9` Update_AllChanged	2787244.5	-1.1% _{95% CI [-2.1, -0.2]}	184278.1	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M10` EventHandlerState_Alloc	82689.7	-1.4% _{95% CI [-2.9, +0.1]}	3095.2	0.0% _{95% CI [0.0, +0.1]}	≈ within noise
`M11` ModifierEHS_Frequency	44718.8	+0.6% _{95% CI [-0.1, +1.3]}	638.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M12` Pool_Rent_HotPath	116404.1	-1.0% _{95% CI [-1.8, -0.3]}	1099.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M13` Setters_Suppression_Scope	90.8	+5.5% _{95% CI [-0.5, +11.4]}	26.7	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M14` Dsl_Rebuild_Cascade	1476794.6	-15.4% _{95% CI [-16.5, -14.3]}	2231828.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`C207` ChangeHandler_DpRead_Coalesce	1137.3	+0.5% _{95% CI [-3.7, +4.7]}	0.6	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OAlloc` Optional_Element_Alloc	221.0	-2.4% _{95% CI [-8.4, +3.6]}	528.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OUpdate` Optional_Reconciler_Update	11924.5	+0.3% _{95% CI [-0.7, +1.2]}	2772.3	0.0% _{95% CI [0.0, 0.0]}	≈ within noise

Cross-framework reference (same StocksGrid workload)

Metric	vanilla WinUI3¹	Rust `windows-reactor`²	Reactor (this PR)
Renders/sec ↑	3.45	4.80	2.74
Avg Reconcile (ms) ↓	n/a	18.2	133.9
Avg Diff (ms) ↓	n/a	16.5	121.4
Avg Memory (MB) ↓	264.2	195.5	287.1

_{↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.}
_{Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).}
_{Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.}
_{¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.}
_{² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.}
_{Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.}
_{Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042999896.}
_{Generated by .github/workflows/perf-compare.yml · PR 5ef7a80 vs main e8572a0 · 2026-06-27T05:26:56Z · run log.}

azchohfi and others added 8 commits June 26, 2026 21:45

github-code-quality Bot found potential problems Jun 27, 2026

View reviewed changes

azchohfi closed this Jun 27, 2026

azchohfi deleted the sweep-666 branch June 27, 2026 05:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO-NOT-MERGE: re-validate #666 on post-flagship main#727

DO-NOT-MERGE: re-validate #666 on post-flagship main#727
azchohfi wants to merge 8 commits into
mainfrom
sweep-666

azchohfi commented Jun 27, 2026

Uh oh!

azchohfi commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

azchohfi commented Jun 27, 2026

Uh oh!

azchohfi commented Jun 27, 2026

Uh oh!

github-actions Bot commented Jun 27, 2026

⚡ Reactor perf comparison

Regression vs main baseline

Low-mutation skip-floor (--percent 0)

Allocation (Reactor) — lower is better

Keyed-list workload (StressPerf.KeyedList, --percent 50)

Reconciler micro-benchmarks (PerfBench.ControlModel)

Cross-framework reference (same StocksGrid workload)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Regression vs `main` baseline

Low-mutation skip-floor (`--percent 0`)

Keyed-list workload (`StressPerf.KeyedList`, `--percent 50`)

Reconciler micro-benchmarks (`PerfBench.ControlModel`)