Skip to content

perf(reconciler): skip redundant element-modifier self-merge (P1)#692

Draft
azchohfi wants to merge 2 commits into
mainfrom
azchohfi-reconciler-perf-profile
Draft

perf(reconciler): skip redundant element-modifier self-merge (P1)#692
azchohfi wants to merge 2 commits into
mainfrom
azchohfi-reconciler-perf-profile

Conversation

@azchohfi

@azchohfi azchohfi commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

What

Reconciler.Update resolves an element's modifiers by accumulating any ModifiedElement wrapper layers, then merging the final inner element's own Modifiers. For the common case of an element that carries modifiers directly (no wrapper) — every cell in a large keyed grid — the accumulator is still referentially the element's own ElementModifiers, so that final merge was a self-merge x.Merge(x): it allocated a fresh, value-identical ElementModifiers plus its Layout/Visual/Text sub-records (~6 records) per side, per changed cell, every render.

On the StocksGrid workload (500 cells, ~50% mutation) that is ~3 KB/changed cell (~7 MB/render of pure garbage) — the single largest allocation lever found in the Phase-1 reconciler profile (hotspot H1).

Change

Guard the final merge with !ReferenceEquals(accumulator, element.Modifiers): only merge when a wrapper layer actually contributed a distinct instance. When nothing wrapped the element, keep its own Modifiers reference as-is.

  • Semantically identical: Merge(x, x) is value-equal to x.
  • Purely removes the allocation. ApplyModifiers behavior is unchanged (it runs exactly as on main).

Scope note (P1 only)

This PR is P1 only. The originally-paired P2 — an ApplyModifiers fast-path that skipped the post-update pass when modifiers compared equal — was dropped after review. Element.ModifiersEqual does not compare every field ApplyModifiers writes (RequestedTheme, Scale/Rotation/Translation/CenterPoint, inline-flow margin/padding/border, and the OnUnmountAction/OnUpdateAction side-effect hooks), so a structurally-equal compare can coexist with a changed transform/theme and the skip would leave the control stale. P2 will return as its own follow-up PR that first makes ModifiersEqual complete w.r.t. ApplyModifiers' guarded writes.

Tests

tests/Reactor.Tests/ReconcilerModifierMergeTests.cs:

  • InvariantMerge(x, x) is value-equal to x but a distinct instance (what makes skipping the self-merge safe).
  • Correctness — a real wrapper+inner merge still combines correctly (inner wins, base fills gaps); the guard only skips the self case, never a real merge.
  • Revert→fail teeth — 50k Update calls on a direct-modifier leaf allocate ~0 B/call with the guard; reverting it allocates ~1.95 KB/call (cap 64 B).

Full dotnet test tests/Reactor.Tests green (9705 passed / 0 failed / 64 skipped). Core lib dotnet build src/Reactor/Reactor.csproj -c Release AOT-clean (0 warnings / 0 errors).

Draft — held per the perf merge-gate (no merge until the /perf harness can measure the allocation delta + explicit GO).

@azchohfi azchohfi force-pushed the azchohfi-reconciler-perf-profile branch from 45f5916 to fce3d26 Compare June 26, 2026 06:29
@azchohfi azchohfi requested a review from Copilot June 26, 2026 06:42
@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a Phase-2 reconciler update-path performance change aimed at reducing per-element allocations and work during Reconciler.Update, especially for large-grid workloads.

Changes:

  • Avoids redundant x.Merge(x) allocations when resolved modifiers already reference the element’s own ElementModifiers.
  • Adds a new modifiersEqual computation and an ApplyModifiers skip fast-path when resolved modifiers are structurally equal (with an OnUpdateAction exception).
  • Adds a new headless test suite to pin the self-merge allocation regression and the OnUpdateAction exception behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Reactor/Core/Reconciler.Update.cs Adds ReferenceEquals guard for modifier self-merge and introduces the ApplyModifiers fast-path via modifiersEqual / ShouldApplyModifiers.
tests/Reactor.Tests/ReconcilerModifierMergeTests.cs New tests covering self-merge semantics, allocation regression “teeth”, and ShouldApplyModifiers behavior around OnUpdateAction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/Reactor/Core/Reconciler.Update.cs Outdated
@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs main baseline

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 3.45 3.75 +7.6% 95% CI [+3.2, +12.0] ✅ improvement
Avg Reconcile (ms) ↓ 114.9 100.4 -13.0% 95% CI [-15.2, -10.8] ✅ improvement
Avg Diff (ms) ↓ 102.6 89.1 -13.2% 95% CI [-15.1, -11.2] ✅ improvement
Avg Memory (MB) ↓ 304.2 291.2 -4.4% 95% CI [-5.2, -3.7] ✅ improvement

Low-mutation skip-floor (--percent 0)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 16.40 16.58 +2.6% 95% CI [-2.8, +8.0] ≈ within noise
Avg Reconcile (ms) ↓ 29.9 32.3 +7.4% 95% CI [-0.2, +15.0] ≈ within noise
Avg Diff (ms) ↓ 27.9 30.2 +8.1% 95% CI [+0.3, +15.9] ⚠️ regression
Avg Memory (MB) ↓ 269.7 269.1 -0.2% 95% CI [-0.5, +0.2] ≈ within noise

Allocation (Reactor) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 9573517 5532233 -42.3% 95% CI [-42.6, -42.0] ✅ improvement
Gen0 GC / 1k renders ↓ 282.05 217.86 -22.9% 95% CI [-26.6, -19.2] ✅ improvement

Reconciler micro-benchmarks (PerfBench.ControlModel)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±1% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged in v1 (the per-side runs are not yet rep-interleaved, so a process-to-process timing offset can otherwise read as significant). Δ is the mean paired change with a 95% CI.

Bench main ns/op Δ ns (95% CI) main B/op Δ alloc (95% CI) Status
M1 Mount_Leaf_NoCallback 106337.1 -12.5% 95% CI [-20.7, -4.2] 1124.4 0.0% 95% CI [0.0, 0.0] ≈ within noise
M2 Mount_Leaf_OneCallback 52511.3 -9.5% 95% CI [-19.4, +0.3] 3263.8 0.0% 95% CI [-0.1, 0.0] ≈ within noise
M3 Mount_Leaf_ThreeCallbacks 148668.4 -1.3% 95% CI [-8.1, +5.6] 7999.8 -0.2% 95% CI [-0.6, +0.2] ≈ within noise
M4 Dispatch_Switch_Cold 74499.4 -3.8% 95% CI [-12.8, +5.2] 1843.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
M5 Dispatch_Switch_Warm 72742.6 +1.4% 95% CI [-5.5, +8.3] 1843.2 0.0% 95% CI [0.0, 0.0] ≈ within noise
M6 Dispatch_ExternalType 64322.4 -5.6% 95% CI [-8.3, -2.9] 991.2 0.0% 95% CI [0.0, 0.0] ≈ within noise
M7 Update_NoChange 64484.2 -2.6% 95% CI [-3.7, -1.6] 740.2 0.0% 95% CI [0.0, 0.0] ≈ within noise
M8 Update_OneLeafChanged 53966.0 -3.9% 95% CI [-6.0, -1.9] 912.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
M9 Update_AllChanged 2562809.3 -2.6% 95% CI [-4.2, -1.0] 184556.2 0.0% 95% CI [0.0, 0.0] ≈ within noise
M10 EventHandlerState_Alloc 47036.8 -14.3% 95% CI [-22.2, -6.4] 2975.6 0.0% 95% CI [0.0, 0.0] ≈ within noise
M11 ModifierEHS_Frequency 63575.8 -0.3% 95% CI [-7.7, +7.0] 1269.5 0.0% 95% CI [0.0, 0.0] ≈ within noise
M12 Pool_Rent_HotPath 75798.1 -5.8% 95% CI [-8.6, -3.1] 1103.8 0.0% 95% CI [0.0, 0.0] ≈ within noise
M13 Setters_Suppression_Scope 160.6 -0.1% 95% CI [-8.0, +7.7] 29.3 0.0% 95% CI [0.0, 0.0] ≈ within noise
C207 ChangeHandler_DpRead_Coalesce 1748.2 -15.9% 95% CI [-16.8, -15.1] 1.2 0.0% 95% CI [0.0, 0.0] ≈ within noise
OAlloc Optional_Element_Alloc 220.0 -7.1% 95% CI [-12.1, -2.2] 528.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
OUpdate Optional_Reconciler_Update 12174.5 -21.0% 95% CI [-31.0, -11.1] 5813.6 -53.7% 95% CI [-53.9, -53.5] ✅ improvement

Cross-framework reference (same StocksGrid workload)

Metric vanilla WinUI3¹ Rust windows-reactor² Reactor (this PR)
Renders/sec ↑ 4.30 6.34 3.75
Avg Reconcile (ms) ↓ n/a 18.7 100.4
Avg Diff (ms) ↓ n/a 16.3 89.1
Avg Memory (MB) ↓ 262.6 193.7 291.2

↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build; Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — it is not auto-flagged until the per-side runs are rep-interleaved (a documented fast-follow), because a process-to-process timing offset can otherwise make the paired ns CI exclude 0 for an unchanged diff.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 9V74 80-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042870008.
Generated by .github/workflows/perf-compare.yml · PR 157ff02 vs main 240c31b · 2026-06-26T14:08:22Z · run log.

@azchohfi azchohfi force-pushed the azchohfi-reconciler-perf-profile branch from fce3d26 to 7fb4f2e Compare June 26, 2026 07:04
@azchohfi azchohfi changed the title perf(reconciler): eliminate redundant modifier self-merge + ApplyModifiers fast-path (P1+P2) perf(reconciler): skip redundant element-modifier self-merge (P1) Jun 26, 2026
@azchohfi azchohfi requested a review from Copilot June 26, 2026 07:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread src/Reactor/Core/Reconciler.Update.cs Outdated
Reconciler.Update resolves an element's modifiers by accumulating any
ModifiedElement wrapper layers, then merging the final inner element's own
Modifiers. For the common case of an element that carries modifiers directly
(no wrapper) -- every cell in a large keyed grid -- the accumulator is still
referentially the element's own ElementModifiers, so that final merge was a
self-merge x.Merge(x): it allocated a fresh, value-identical ElementModifiers
plus its non-null Layout/Visual bucket sub-records (a Layout+Visual cell is
parent + 2 buckets = 3 records, on each of the old and new sides => ~6) per
changed cell, every render. On the StocksGrid workload (500 cells, ~50%
mutation) that is ~3 KB/changed cell (~7 MB/render of pure garbage).

Guard the final merge with !ReferenceEquals(accumulator, element.Modifiers):
only merge when a wrapper layer actually contributed a distinct instance.
When nothing wrapped the element, keep its own Modifiers reference as-is.
Semantically identical (Merge(x,x) is value-equal to x); purely removes the
allocation. ApplyModifiers behavior is unchanged.

Scope note: this PR is P1 only. The originally-paired P2 (an ApplyModifiers
fast-path that skips the post-update pass when modifiers compare equal) was
dropped after review found Element.ModifiersEqual does not compare every field
ApplyModifiers writes (RequestedTheme, Scale/Rotation/Translation/CenterPoint,
inline-flow margin/padding/border, OnUnmountAction/OnUpdateAction hooks), so a
structurally-equal compare can coexist with a changed transform/theme and the
skip would leave the control stale. P2 will return as its own PR that first
makes ModifiersEqual complete w.r.t. ApplyModifiers' guarded writes.

Tests (tests/Reactor.Tests/ReconcilerModifierMergeTests.cs):
- Merge(x,x) is value-equal to x but a distinct instance (the invariant the
  guard relies on).
- A real wrapper+inner merge still combines correctly (inner wins, base fills
  gaps) -- the guard only skips the self case, never a real merge.
- Revert->fail teeth: 50k Update calls on a direct-modifier leaf allocate ~0
  B/call with the guard; reverting it allocates ~1.95 KB/call (cap 64 B).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi azchohfi force-pushed the azchohfi-reconciler-perf-profile branch from 7fb4f2e to ad4de25 Compare June 26, 2026 07:12
@azchohfi azchohfi requested a review from Copilot June 26, 2026 07:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

… preservation (P1 pr-review)

Folds two test-coverage findings from the pr-review of #692 (both
multi-model-confirmed), strengthening the modifier-merge teeth without
touching production code:

- Update_WrapperLayer_StillMergesInnerModifiers: proves the !ReferenceEquals
  guard skips ONLY the self-merge - when a ModifiedElement wrapper contributes
  a distinct modifier instance, Reconciler.Update''s fall-through still performs
  the real merge. Differential-allocation teeth verified to bite: simulating an
  over-broad guard makes this fail while the existing self-merge tooth still
  passes (unique coverage).
- Merge_Preserves_Lifecycle_Callbacks: pins that ElementModifiers.Merge
  preserves OnMount/OnUnmount/OnUpdateAction (other wins, base fills gaps),
  guarding the same callback-drop class that bit a sibling change.

Full Reactor.Tests green (9707 passed / 64 skipped / 0 failed). Test-only;
no src/Reactor change, so the P1 perf delta on ad4de25 is unaffected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

Acceptance-demo run: end-to-end validation of the complete merged harness (alloc metric + ns micro-suite + low-mutation skip-floor + keyed-list leg) on real CI-built exes. #692 is the self-merge-guard alloc fix, so the StocksGrid allocation headline is the story; keyed/skip-floor legs are expected within-noise (this PR doesn't target those paths). Baseline = current main (41e41d7, production-identical — all four harness merges were 0-src). — PERFVAL harness session

@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

Re-fire post-#700-merge (budget-fit 240c31b5). The 12:07Z run (28235303559) validated the −40.2% alloc flagship but ran on PRE-#700 main, so the micro section was absent (13-bench suite timed out at 420 s, completing only M1–M4). This run validates #700's budget-fit on real CI exes: the micro section should now render. Macro alloc should re-confirm ~−40%. Expected-absent: the keyed-list leg (this PR's tree at 157ff020 predates #694 StressPerf.KeyedList, so the PR-side keyed build can't find the project — lineage, not a regression). — PERFVAL harness session

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants