Conversation
Reduces redundant WinRT/COM-RCW boundary crossings and per-element allocations on the diff/reconcile/patch hot path (data-grid stress workload, StocksGrid). Targets the dominant Reconcile gap. Reconciler.cs / Update.cs / Mount.cs: - #16 UnmountRecursive: read ReactorAttached.State once (was 5x GetValue). - #17 UnmountAndCollect: read element tag once (was 5x GetElementTag). - #14 Update: coalesce repeated `is FrameworkElement` RCW type-tests to one. - #15 Cache the component rerender delegate on ComponentNode; recreate only when the requestRerender identity changes. - #18 UpdateDefaultAutomationName: ref/ordinal short-circuit before the AutomationProperties.GetName COM read. - #19 ApplyModifiers accepts null = clear-all (no fresh ElementModifiers). - #20 PopulateDirtyAncestorPath iterates a tracked self-triggered set instead of the whole _componentNodes dict. - #21 UnmountAndPool reuses a borrowed scratch list (reentrant calls fall back to a fresh list). - #22 RefreshRealizedContainers reuses a cleared field list. - #23 ApplyStyleToElement ref-equal + ActualTheme guard; ApplyThemeBindings gated on the ShallowEquals fast path. - #24 BuildCacheKey: thread-static StringBuilder + pair buffer, single pass. - #25 HasConsumedContextChanged compares via Context<T>/EqualityComparer<T> (no boxing of value-type context). - #26 ApplyResourceOverrides: no ToHashSet/ToArray, ref-equal fast path. - #27 WireReferenceListEdge: static loop, no per-edge List/closures. - #28 ApplyStaggerDelays: hoisted static animation-key array. - #29 PushContextDisposable: single-pair push overload, pooled PopOnDispose. Reconciler.KeyedItemsBinding.cs: - #39 ref-equality cache; only realloc + SetItemViewSource on change. Reconciler.Mount.cs: - #40 LayoutFootgunDetector gated by GridElement pre-filter. AccessibilityScanner.cs (debug-only scan path): - #62 compute GetChildren once in Walk, pass into Push. - #63 cache sibling-text array in ScanContext, invalidate on Push. - #64 BuildContext single-pass with pooled scratch (no LINQ chains). - #65 IsLikelyEmoji explicit span loop (no LINQ). - #66 CheckTabIndexGaps sorts a rented buffer (no OrderBy/ToList). Behavior preserved exactly: coalesced COM reads observe the same values at the same times; unmount/cleanup ordering unchanged; scanner emits identical findings. AOT/trim clean (no new reflection). Adds tests/Reactor.Tests/ReconcilerInteropPerfTests.cs (11 tests: rerender-delegate caching, context-change compare, cache-key reuse, single-pair context push, scanner output parity). Closes #654 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-pass fixes for PR #666 (issue #654), all behavior-preserving and within the assigned reconcile-path files: - Context #25 (H1): CurrentValueEquals now reproduces the old object.Equals semantics EXACTLY — reference types use the virtual object.Equals (so an IEquatable<T> type without an object.Equals override keeps reference-equality change detection instead of collapsing distinct instances and skipping a required context rerender), value types keep the no-box EqualityComparer<T> path but guard the unbox with `is T` so a same-slot context-type swap returns false (as before) rather than throwing InvalidCastException. - #26 ApplyResourceOverrides (CO2): removal is driven by `managed` vs new overrides, not an old-vs-new ref comparison; the shallow-skip fast path calls this with old===new yet still must reconcile away stale managed keys. - #20 self-triggered set (CO3): add an Unmounted tombstone so a cached rerender closure firing after teardown can't resurrect the node into _selfTriggeredNodes (which the dirty-path pass would never clear again). - #16/#17 unmount tag re-read (CO4): the V1/type unmount dispatch re-reads the attached tag AFTER the user .OnUnmount/RunCleanups callbacks (which may detach state via the public APIs), matching the pre-coalesce per-site reads; pre-callback branches still share the single top-of-method read. - #23 ClearThemeBindings (AS1): clear the companion effective-theme marker DP alongside the style marker so the pair stays consistent across pool/reuse. Tests (tests/Reactor.Tests/ReconcilerInteropPerfTests.cs): - #15 rerender-delegate caching: identity reuse vs source-identity recreate; invocation marks SelfTriggered and bubbles to the source; post-unmount tombstone refuses a late self-trigger (#20). - #25 (H1) reference-IEquatable keeps reference-equality; value-type type mismatch returns false without throwing. - Scanner output parity pinned to exact literal messages (#62-#66). GetOrCreateComponentRerender made internal (InternalsVisibleTo Reactor.Tests) as a test seam. Validation: dotnet build src/Reactor -c Release 0/0; tests 9687 passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Guards the boxed-Nullable<T> edge raised in Copilot review: a non-null int? boxes to a boxed underlying int, and the value-type `is T` guard (T = int?) still matches it, so an unchanged nullable context value is not spuriously treated as changed. Proves the H1 hybrid CurrentValueEquals preserves the old object.Equals nullable semantics against production code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds Assert.NotNull(wrapped) before invoking the cached rerender delegate in the two CachedRerender_* tests. Asserts the GetOrCreateComponentRerender contract (never returns null) and resolves a CodeQL "useless assignment" false positive (its dataflow did not model the subsequent delegate invocation as a read). Behavior-neutral test hardening. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-pass fixes from the pr-review skill (correctness + alternative-solution + test-coverage dimensions) and the multi-model cross-check, all within the existing reconcile interop scope. No behavior change: - Context<T>.CurrentValueEquals (CO-H1, high): the value-type arm routed through EqualityComparer<T>.Default, which dispatches to IEquatable<T> and diverges from the original object.Equals for structs whose IEquatable<T>.Equals disagrees with their object.Equals override. Unify both arms to current.Equals(lastBoxed) — for two non-null operands that is exactly object.Equals — and keep the reference-type ReferenceEquals short-circuit so same-instance / non-reflexive cases stay exact. Still avoids boxing the freshly-read value for primitives/Nullable<T>. Drops the now-unused System.Collections.Generic using. - BuildCacheKey (AS-L1): release the thread-static scratch pair array when an outlier element grows it past a small cap, so it is not pinned for the thread's lifetime. - RefreshRealizedContainers (AS-L2): only retain the realized-scratch list when its capacity stays modest; an oversized one-off refresh is collected, not pinned. - Tests: assert scanner emission ORDER (the parity test sorts first), plus value-type IEquatable-vs-object.Equals divergence and the reference-equality fast-path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses a Copilot review comment on #666: a cached rerender closure (perf #15) handed to a long-lived or cross-thread subscriber can outlive unmount and keep its ComponentNode alive. Drop ComponentNode.Control in ClearSelfTriggered(unmounting:true) so a leaked closure can't also pin the unmounted visual subtree. The dirty-path pass (perf #20) is the only reader of Control and never visits an unmounted node, so this is observably a no-op — it only releases the subtree for GC, reinforcing the #20 tombstone. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eview) Addresses three Copilot re-review comments on the interop-perf PR, all the same scratch-hygiene pattern already applied to t_cacheKeyPairs (#24) and _realizedScratch (#22): a one-off outlier input can leave a thread-static / instance scratch buffer holding an oversized backing array (or stale element refs) for the rest of the thread/reconciler lifetime. Behavior-neutral — the computed outputs are unchanged, so existing parity tests cover them. - AccessibilityScanner.BuildContext: wrap the t_buildCtxChildren populate/use in try/finally so a mid-population exception can't leave element refs pinned on the thread-static list until the next call. Success-path timing of the clear is unchanged. - BuildCacheKey: mirror the pairs-array retain-cap for the thread-static StringBuilder (t_cacheKeySb) — release it after a one-off unusually large key instead of pinning the grown backing buffer. - UnmountAndPool: mirror the _realizedScratch capacity cap for _unmountPoolScratch — only retain the scratch list when it's still modestly sized, so a one-off large-subtree unmount doesn't pin an oversized array. Validation: dotnet build src/Reactor -c Release -> 0 warnings / 0 errors; dotnet test tests/Reactor.Tests -> 9691 passed / 0 failed / 64 skipped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…heck) A multi-model (gpt-5.4) cross-check of the four reconcile/scan scratch retain-caps flagged the BuildCacheKey site: at the degenerate combined-cap case (an element with >64 theme bindings AND a >1024-char key rendered every frame), discarding the thread-static buffers reallocates a KeyValuePair[] (2x the size of the pre-#24 baseline's string[]) plus a fresh StringBuilder — i.e. WORSE than origin/main, which the perf PR must never be. The size-cap's only job was bounding memory retention, but Array.Clear(pairs) / sb.Clear() already drop the referenced strings/ThemeRefs, so nothing is pinned regardless. Removing the caps makes BuildCacheKey unconditional- retain, which is strictly cheaper than baseline in every case (the dominant StringBuilder backing buffer is reused, not realloc'd) and provably never worse. The other two caps are unchanged — the cross-check CONFIRMED them: _realizedScratch and _unmountPoolScratch discard to EXACTLY the origin/main allocation shape (List<UIElement> / List<FrameworkElement>), so they cannot be worse than baseline, and both sit well above the 50-row StocksGrid steady-state high-water (~64 <= 256) so they never trim-and-regrow on the stress workload. Behavior-neutral: BuildCacheKey return values are unchanged (only buffer-retention differs), so existing parity tests cover it. Validation: dotnet build src/Reactor -c Release -> 0 warnings / 0 errors; dotnet test tests/Reactor.Tests -> 9691 passed / 0 failed / 64 skipped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf |
|
|
||
| // Same instance recorded and current: object.Equals(x, x) == true via the | ||
| // ReferenceEquals short-circuit even though x.Equals(x) == false. Match it exactly. | ||
| Assert.Equal(object.Equals(x, x), NonReflexiveCtx.CurrentValueEquals(scope, x)); |
| foreach (var c in GetChildren(el)) | ||
| { | ||
| if (c is null or EmptyElement) continue; | ||
| children.Add(c); | ||
| if (childText is null && c is TextBlockElement tb) | ||
| childText = tb.Content; | ||
| } |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 2.62 | 2.74 | +5.3% 95% CI [-8.7, +19.2] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 136.5 | 133.9 | -1.7% 95% CI [-11.8, +8.3] | ≈ within noise |
| Avg Diff (ms) ↓ | 124.2 | 121.4 | -1.3% 95% CI [-11.5, +8.9] | ≈ within noise |
| Avg Memory (MB) ↓ | 284.7 | 287.1 | +0.7% 95% CI [-0.4, +1.9] | ≈ within noise |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.46 | 16.20 | -1.1% 95% CI [-4.6, +2.3] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 32.4 | 31.4 | -2.9% 95% CI [-9.5, +3.7] | ≈ within noise |
| Avg Diff (ms) ↓ | 30.5 | 29.6 | -3.0% 95% CI [-9.8, +3.7] | ≈ within noise |
| Avg Memory (MB) ↓ | 268.7 | 268.7 | 0.0% 95% CI [-0.4, +0.3] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 5758965 | 5722942 | -0.4% 95% CI [-2.2, +1.4] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 226.50 | 227.78 | +0.2% 95% CI [-11.9, +12.3] | ≈ within noise |
Keyed-list workload (StressPerf.KeyedList, --percent 50)
A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 23.11 | 23.15 | -0.3% 95% CI [-2.8, +2.1] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 14.0 | 13.9 | -0.2% 95% CI [-2.0, +1.6] | ≈ within noise |
| Avg Diff (ms) ↓ | 13.8 | 13.7 | -0.5% 95% CI [-2.3, +1.3] | ≈ within noise |
| Avg Memory (MB) ↓ | 169.7 | 169.8 | -0.2% 95% CI [-0.8, +0.4] | ≈ within noise |
Allocation (keyed-list) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 312927 | 307416 | -1.7% 95% CI [-2.0, -1.5] | ✅ improvement |
| Gen0 GC / 1k renders ↓ | 17.13 | 16.91 | -1.2% 95% CI [-5.6, +3.2] | ≈ within noise |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
147036.4 | -1.2% 95% CI [-2.2, -0.2] | 1140.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
104137.5 | +1.0% 95% CI [-3.6, +5.5] | 3383.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
204283.1 | +0.7% 95% CI [-2.5, +4.0] | 8460.3 | +0.3% 95% CI [-1.1, +1.7] | ≈ within noise |
M4 Dispatch_Switch_Cold |
98706.6 | -0.5% 95% CI [-2.7, +1.8] | 1767.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
98522.6 | +1.9% 95% CI [-3.2, +7.1] | 1766.0 | +0.4% 95% CI [-1.0, +1.9] | ≈ within noise |
M6 Dispatch_ExternalType |
89757.6 | +1.0% 95% CI [-3.1, +5.0] | 987.6 | +0.7% 95% CI [-1.9, +3.4] | ≈ within noise |
M7 Update_NoChange |
55017.6 | +5.8% 95% CI [-4.2, +15.8] | 452.1 | +0.7% 95% CI [-7.1, +8.4] | ≈ within noise |
M8 Update_OneLeafChanged |
41080.2 | 0.0% 95% CI [-1.2, +1.2] | 536.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2787244.5 | -1.1% 95% CI [-2.1, -0.2] | 184278.1 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
82689.7 | -1.4% 95% CI [-2.9, +0.1] | 3095.2 | 0.0% 95% CI [0.0, +0.1] | ≈ within noise |
M11 ModifierEHS_Frequency |
44718.8 | +0.6% 95% CI [-0.1, +1.3] | 638.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M12 Pool_Rent_HotPath |
116404.1 | -1.0% 95% CI [-1.8, -0.3] | 1099.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
90.8 | +5.5% 95% CI [-0.5, +11.4] | 26.7 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M14 Dsl_Rebuild_Cascade |
1476794.6 | -15.4% 95% CI [-16.5, -14.3] | 2231828.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
C207 ChangeHandler_DpRead_Coalesce |
1137.3 | +0.5% 95% CI [-3.7, +4.7] | 0.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
221.0 | -2.4% 95% CI [-8.4, +3.6] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
11924.5 | +0.3% 95% CI [-0.7, +1.2] | 2772.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 3.45 | 4.80 | 2.74 |
| Avg Reconcile (ms) ↓ | n/a | 18.2 | 133.9 |
| Avg Diff (ms) ↓ | n/a | 16.5 | 121.4 |
| Avg Memory (MB) ↓ | 264.2 | 195.5 | 287.1 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042999896.
Generated by .github/workflows/perf-compare.yml · PR 5ef7a80 vs main e8572a0 · 2026-06-27T05:26:56Z · run log.
Throwaway re-validation of #666 on origin/main b9ace1e (=#692+M14+#665+#649). DO NOT MERGE; origin #666 pristine.