perf(reconciler): skip redundant element-modifier self-merge (P1)#692
perf(reconciler): skip redundant element-modifier self-merge (P1)#692azchohfi wants to merge 2 commits into
Conversation
45f5916 to
fce3d26
Compare
|
/perf |
There was a problem hiding this comment.
Pull request overview
This PR is a Phase-2 reconciler update-path performance change aimed at reducing per-element allocations and work during Reconciler.Update, especially for large-grid workloads.
Changes:
- Avoids redundant
x.Merge(x)allocations when resolved modifiers already reference the element’s ownElementModifiers. - Adds a new
modifiersEqualcomputation and anApplyModifiersskip fast-path when resolved modifiers are structurally equal (with anOnUpdateActionexception). - Adds a new headless test suite to pin the self-merge allocation regression and the
OnUpdateActionexception behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Reactor/Core/Reconciler.Update.cs | Adds ReferenceEquals guard for modifier self-merge and introduces the ApplyModifiers fast-path via modifiersEqual / ShouldApplyModifiers. |
| tests/Reactor.Tests/ReconcilerModifierMergeTests.cs | New tests covering self-merge semantics, allocation regression “teeth”, and ShouldApplyModifiers behavior around OnUpdateAction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 3.45 | 3.75 | +7.6% 95% CI [+3.2, +12.0] | ✅ improvement |
| Avg Reconcile (ms) ↓ | 114.9 | 100.4 | -13.0% 95% CI [-15.2, -10.8] | ✅ improvement |
| Avg Diff (ms) ↓ | 102.6 | 89.1 | -13.2% 95% CI [-15.1, -11.2] | ✅ improvement |
| Avg Memory (MB) ↓ | 304.2 | 291.2 | -4.4% 95% CI [-5.2, -3.7] | ✅ improvement |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.40 | 16.58 | +2.6% 95% CI [-2.8, +8.0] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 29.9 | 32.3 | +7.4% 95% CI [-0.2, +15.0] | ≈ within noise |
| Avg Diff (ms) ↓ | 27.9 | 30.2 | +8.1% 95% CI [+0.3, +15.9] | |
| Avg Memory (MB) ↓ | 269.7 | 269.1 | -0.2% 95% CI [-0.5, +0.2] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 9573517 | 5532233 | -42.3% 95% CI [-42.6, -42.0] | ✅ improvement |
| Gen0 GC / 1k renders ↓ | 282.05 | 217.86 | -22.9% 95% CI [-26.6, -19.2] | ✅ improvement |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±1% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged in v1 (the per-side runs are not yet rep-interleaved, so a process-to-process timing offset can otherwise read as significant). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
106337.1 | -12.5% 95% CI [-20.7, -4.2] | 1124.4 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
52511.3 | -9.5% 95% CI [-19.4, +0.3] | 3263.8 | 0.0% 95% CI [-0.1, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
148668.4 | -1.3% 95% CI [-8.1, +5.6] | 7999.8 | -0.2% 95% CI [-0.6, +0.2] | ≈ within noise |
M4 Dispatch_Switch_Cold |
74499.4 | -3.8% 95% CI [-12.8, +5.2] | 1843.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
72742.6 | +1.4% 95% CI [-5.5, +8.3] | 1843.2 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M6 Dispatch_ExternalType |
64322.4 | -5.6% 95% CI [-8.3, -2.9] | 991.2 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M7 Update_NoChange |
64484.2 | -2.6% 95% CI [-3.7, -1.6] | 740.2 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M8 Update_OneLeafChanged |
53966.0 | -3.9% 95% CI [-6.0, -1.9] | 912.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2562809.3 | -2.6% 95% CI [-4.2, -1.0] | 184556.2 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
47036.8 | -14.3% 95% CI [-22.2, -6.4] | 2975.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M11 ModifierEHS_Frequency |
63575.8 | -0.3% 95% CI [-7.7, +7.0] | 1269.5 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M12 Pool_Rent_HotPath |
75798.1 | -5.8% 95% CI [-8.6, -3.1] | 1103.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
160.6 | -0.1% 95% CI [-8.0, +7.7] | 29.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
C207 ChangeHandler_DpRead_Coalesce |
1748.2 | -15.9% 95% CI [-16.8, -15.1] | 1.2 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
220.0 | -7.1% 95% CI [-12.1, -2.2] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
12174.5 | -21.0% 95% CI [-31.0, -11.1] | 5813.6 | -53.7% 95% CI [-53.9, -53.5] | ✅ improvement |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 4.30 | 6.34 | 3.75 |
| Avg Reconcile (ms) ↓ | n/a | 18.7 | 100.4 |
| Avg Diff (ms) ↓ | n/a | 16.3 | 89.1 |
| Avg Memory (MB) ↓ | 262.6 | 193.7 | 291.2 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build; Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — it is not auto-flagged until the per-side runs are rep-interleaved (a documented fast-follow), because a process-to-process timing offset can otherwise make the paired ns CI exclude 0 for an unchanged diff.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 9V74 80-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042870008.
Generated by .github/workflows/perf-compare.yml · PR 157ff02 vs main 240c31b · 2026-06-26T14:08:22Z · run log.
fce3d26 to
7fb4f2e
Compare
Reconciler.Update resolves an element's modifiers by accumulating any ModifiedElement wrapper layers, then merging the final inner element's own Modifiers. For the common case of an element that carries modifiers directly (no wrapper) -- every cell in a large keyed grid -- the accumulator is still referentially the element's own ElementModifiers, so that final merge was a self-merge x.Merge(x): it allocated a fresh, value-identical ElementModifiers plus its non-null Layout/Visual bucket sub-records (a Layout+Visual cell is parent + 2 buckets = 3 records, on each of the old and new sides => ~6) per changed cell, every render. On the StocksGrid workload (500 cells, ~50% mutation) that is ~3 KB/changed cell (~7 MB/render of pure garbage). Guard the final merge with !ReferenceEquals(accumulator, element.Modifiers): only merge when a wrapper layer actually contributed a distinct instance. When nothing wrapped the element, keep its own Modifiers reference as-is. Semantically identical (Merge(x,x) is value-equal to x); purely removes the allocation. ApplyModifiers behavior is unchanged. Scope note: this PR is P1 only. The originally-paired P2 (an ApplyModifiers fast-path that skips the post-update pass when modifiers compare equal) was dropped after review found Element.ModifiersEqual does not compare every field ApplyModifiers writes (RequestedTheme, Scale/Rotation/Translation/CenterPoint, inline-flow margin/padding/border, OnUnmountAction/OnUpdateAction hooks), so a structurally-equal compare can coexist with a changed transform/theme and the skip would leave the control stale. P2 will return as its own PR that first makes ModifiersEqual complete w.r.t. ApplyModifiers' guarded writes. Tests (tests/Reactor.Tests/ReconcilerModifierMergeTests.cs): - Merge(x,x) is value-equal to x but a distinct instance (the invariant the guard relies on). - A real wrapper+inner merge still combines correctly (inner wins, base fills gaps) -- the guard only skips the self case, never a real merge. - Revert->fail teeth: 50k Update calls on a direct-modifier leaf allocate ~0 B/call with the guard; reverting it allocates ~1.95 KB/call (cap 64 B). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7fb4f2e to
ad4de25
Compare
|
/perf |
… preservation (P1 pr-review) Folds two test-coverage findings from the pr-review of #692 (both multi-model-confirmed), strengthening the modifier-merge teeth without touching production code: - Update_WrapperLayer_StillMergesInnerModifiers: proves the !ReferenceEquals guard skips ONLY the self-merge - when a ModifiedElement wrapper contributes a distinct modifier instance, Reconciler.Update''s fall-through still performs the real merge. Differential-allocation teeth verified to bite: simulating an over-broad guard makes this fail while the existing self-merge tooth still passes (unique coverage). - Merge_Preserves_Lifecycle_Callbacks: pins that ElementModifiers.Merge preserves OnMount/OnUnmount/OnUpdateAction (other wins, base fills gaps), guarding the same callback-drop class that bit a sibling change. Full Reactor.Tests green (9707 passed / 64 skipped / 0 failed). Test-only; no src/Reactor change, so the P1 perf delta on ad4de25 is unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf Acceptance-demo run: end-to-end validation of the complete merged harness (alloc metric + ns micro-suite + low-mutation skip-floor + keyed-list leg) on real CI-built exes. #692 is the self-merge-guard alloc fix, so the StocksGrid allocation headline is the story; keyed/skip-floor legs are expected within-noise (this PR doesn't target those paths). Baseline = current main (41e41d7, production-identical — all four harness merges were 0-src). — PERFVAL harness session |
|
/perf Re-fire post-#700-merge (budget-fit |
What
Reconciler.Updateresolves an element's modifiers by accumulating anyModifiedElementwrapper layers, then merging the final inner element's ownModifiers. For the common case of an element that carries modifiers directly (no wrapper) — every cell in a large keyed grid — the accumulator is still referentially the element's ownElementModifiers, so that final merge was a self-mergex.Merge(x): it allocated a fresh, value-identicalElementModifiersplus itsLayout/Visual/Textsub-records (~6 records) per side, per changed cell, every render.On the StocksGrid workload (500 cells, ~50% mutation) that is ~3 KB/changed cell (~7 MB/render of pure garbage) — the single largest allocation lever found in the Phase-1 reconciler profile (hotspot H1).
Change
Guard the final merge with
!ReferenceEquals(accumulator, element.Modifiers): only merge when a wrapper layer actually contributed a distinct instance. When nothing wrapped the element, keep its ownModifiersreference as-is.Merge(x, x)is value-equal tox.ApplyModifiersbehavior is unchanged (it runs exactly as onmain).Scope note (P1 only)
This PR is P1 only. The originally-paired P2 — an
ApplyModifiersfast-path that skipped the post-update pass when modifiers compared equal — was dropped after review.Element.ModifiersEqualdoes not compare every fieldApplyModifierswrites (RequestedTheme,Scale/Rotation/Translation/CenterPoint, inline-flow margin/padding/border, and theOnUnmountAction/OnUpdateActionside-effect hooks), so a structurally-equal compare can coexist with a changed transform/theme and the skip would leave the control stale. P2 will return as its own follow-up PR that first makesModifiersEqualcomplete w.r.t.ApplyModifiers' guarded writes.Tests
tests/Reactor.Tests/ReconcilerModifierMergeTests.cs:Merge(x, x)is value-equal toxbut a distinct instance (what makes skipping the self-merge safe).Updatecalls on a direct-modifier leaf allocate ~0 B/call with the guard; reverting it allocates ~1.95 KB/call (cap 64 B).Full
dotnet test tests/Reactor.Testsgreen (9705 passed / 0 failed / 64 skipped). Core libdotnet build src/Reactor/Reactor.csproj -c ReleaseAOT-clean (0 warnings / 0 errors).