perf(reconciler): skip redundant element-modifier self-merge (P1) by azchohfi · Pull Request #692 · microsoft/microsoft-ui-reactor

azchohfi · 2026-06-26T06:00:04Z

What

Reconciler.Update resolves an element's modifiers by accumulating any ModifiedElement wrapper layers, then merging the final inner element's own Modifiers. For the common case of an element that carries modifiers directly (no wrapper) — every cell in a large keyed grid — the accumulator is still referentially the element's own ElementModifiers, so that final merge was a self-merge x.Merge(x): it allocated a fresh, value-identical ElementModifiers plus its Layout/Visual/Text sub-records (~6 records) per side, per changed cell, every render.

On the StocksGrid workload (500 cells, ~50% mutation) that is ~3 KB/changed cell (~7 MB/render of pure garbage) — the single largest allocation lever found in the Phase-1 reconciler profile (hotspot H1).

Change

Guard the final merge with !ReferenceEquals(accumulator, element.Modifiers): only merge when a wrapper layer actually contributed a distinct instance. When nothing wrapped the element, keep its own Modifiers reference as-is.

Semantically identical: Merge(x, x) is value-equal to x.
Purely removes the allocation. ApplyModifiers behavior is unchanged (it runs exactly as on main).

Scope note (P1 only)

This PR is P1 only. The originally-paired P2 — an ApplyModifiers fast-path that skipped the post-update pass when modifiers compared equal — was dropped after review. Element.ModifiersEqual does not compare every field ApplyModifiers writes (RequestedTheme, Scale/Rotation/Translation/CenterPoint, inline-flow margin/padding/border, and the OnUnmountAction/OnUpdateAction side-effect hooks), so a structurally-equal compare can coexist with a changed transform/theme and the skip would leave the control stale. P2 will return as its own follow-up PR that first makes ModifiersEqual complete w.r.t. ApplyModifiers' guarded writes.

Tests

tests/Reactor.Tests/ReconcilerModifierMergeTests.cs:

Invariant — Merge(x, x) is value-equal to x but a distinct instance (what makes skipping the self-merge safe).
Correctness — a real wrapper+inner merge still combines correctly (inner wins, base fills gaps); the guard only skips the self case, never a real merge.
Revert→fail teeth — 50k Update calls on a direct-modifier leaf allocate ~0 B/call with the guard; reverting it allocates ~1.95 KB/call (cap 64 B).

Full dotnet test tests/Reactor.Tests green (9705 passed / 0 failed / 64 skipped). Core lib dotnet build src/Reactor/Reactor.csproj -c Release AOT-clean (0 warnings / 0 errors).

Draft — held per the perf merge-gate (no merge until the /perf harness can measure the allocation delta + explicit GO).

azchohfi · 2026-06-26T06:49:46Z

/perf

Copilot

Pull request overview

This PR is a Phase-2 reconciler update-path performance change aimed at reducing per-element allocations and work during Reconciler.Update, especially for large-grid workloads.

Changes:

Avoids redundant x.Merge(x) allocations when resolved modifiers already reference the element’s own ElementModifiers.
Adds a new modifiersEqual computation and an ApplyModifiers skip fast-path when resolved modifiers are structurally equal (with an OnUpdateAction exception).
Adds a new headless test suite to pin the self-merge allocation regression and the OnUpdateAction exception behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/Reactor/Core/Reconciler.Update.cs	Adds ReferenceEquals guard for modifier self-merge and introduces the ApplyModifiers fast-path via `modifiersEqual` / `ShouldApplyModifiers`.
tests/Reactor.Tests/ReconcilerModifierMergeTests.cs	New tests covering self-merge semantics, allocation regression “teeth”, and `ShouldApplyModifiers` behavior around `OnUpdateAction`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-06-26T07:02:16Z

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs `main` baseline

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	2.38	2.59	+6.7% _{95% CI [-0.9, +14.3]}	≈ within noise
Avg Reconcile (ms) ↓	149.6	137.7	-8.6% _{95% CI [-12.5, -4.7]}	✅ improvement
Avg Diff (ms) ↓	137.0	125.3	-9.6% _{95% CI [-13.7, -5.6]}	✅ improvement
Avg Memory (MB) ↓	293.8	288.6	-1.6% _{95% CI [-2.7, -0.5]}	✅ improvement

Low-mutation skip-floor (`--percent 0`)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Renders/sec ↑	15.99	16.02	+3.5% _{95% CI [-3.6, +10.6]}	≈ within noise
Avg Reconcile (ms) ↓	36.2	37.0	+2.2% _{95% CI [-7.5, +11.9]}	≈ within noise
Avg Diff (ms) ↓	34.0	34.8	+2.4% _{95% CI [-7.8, +12.5]}	≈ within noise
Avg Memory (MB) ↓	266.8	267.7	+0.1% _{95% CI [-0.3, +0.6]}	≈ within noise

Allocation (Reactor) — lower is better

Metric	`main` (baseline)	This PR	Δ (95% CI)	Status
Alloc bytes/render ↓	9603489	5739023	-40.2% _{95% CI [-41.0, -39.5]}	✅ improvement
Gen0 GC / 1k renders ↓	291.67	277.93	-4.3% _{95% CI [-12.8, +4.3]}	≈ within noise

Reconciler micro-benchmarks (`PerfBench.ControlModel`)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench	`main` ns/op	Δ ns (95% CI)	`main` B/op	Δ alloc (95% CI)	Status
`M1` Mount_Leaf_NoCallback	153923.4	-1.2% _{95% CI [-7.6, +5.3]}	1140.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M2` Mount_Leaf_OneCallback	112027.3	-2.7% _{95% CI [-8.9, +3.5]}	3383.3	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M3` Mount_Leaf_ThreeCallbacks	234735.3	+2.0% _{95% CI [-4.2, +8.1]}	8429.9	-0.5% _{95% CI [-2.8, +1.9]}	≈ within noise
`M4` Dispatch_Switch_Cold	107320.8	+3.0% _{95% CI [-3.1, +9.1]}	1767.8	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M5` Dispatch_Switch_Warm	109536.2	+1.9% _{95% CI [-5.5, +9.3]}	1766.0	0.0% _{95% CI [-1.2, +1.2]}	≈ within noise
`M6` Dispatch_ExternalType	91878.5	+0.2% _{95% CI [-3.5, +3.8]}	987.6	+0.1% _{95% CI [-2.1, +2.2]}	≈ within noise
`M7` Update_NoChange	56791.0	-0.5% _{95% CI [-5.3, +4.3]}	452.1	+0.7% _{95% CI [-7.1, +8.4]}	≈ within noise
`M8` Update_OneLeafChanged	42189.1	-1.6% _{95% CI [-4.6, +1.4]}	536.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M9` Update_AllChanged	2924667.5	+3.8% _{95% CI [-4.9, +12.6]}	184278.1	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M10` EventHandlerState_Alloc	88743.6	-1.4% _{95% CI [-3.5, +0.6]}	3095.2	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M11` ModifierEHS_Frequency	46891.4	-2.0% _{95% CI [-8.9, +5.0]}	638.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M12` Pool_Rent_HotPath	119522.6	+0.1% _{95% CI [-4.6, +4.9]}	1099.9	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`M13` Setters_Suppression_Scope	150.0	-15.6% _{95% CI [-24.7, -6.4]}	26.7	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`C207` ChangeHandler_DpRead_Coalesce	1390.3	-0.2% _{95% CI [-6.8, +6.4]}	0.6	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OAlloc` Optional_Element_Alloc	215.7	+2.8% _{95% CI [-14.9, +20.6]}	528.0	0.0% _{95% CI [0.0, 0.0]}	≈ within noise
`OUpdate` Optional_Reconciler_Update	17256.2	-29.6% _{95% CI [-31.4, -27.8]}	5810.8	-53.9% _{95% CI [-53.9, -53.9]}	✅ improvement

Cross-framework reference (same StocksGrid workload)

Metric	vanilla WinUI3¹	Rust `windows-reactor`²	Reactor (this PR)
Renders/sec ↑	3.18	4.75	2.59
Avg Reconcile (ms) ↓	n/a	19.4	137.7
Avg Diff (ms) ↓	n/a	17.7	125.3
Avg Memory (MB) ↓	264.2	196.7	288.6

_{↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.}
_{Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).}
_{Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.}
_{¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.}
_{² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.}
_{Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.}
_{Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042925502.}
_{Generated by .github/workflows/perf-compare.yml · PR 157ff02 vs main 52baebb · 2026-06-26T19:26:31Z · run log.}

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Reconciler.Update resolves an element's modifiers by accumulating any ModifiedElement wrapper layers, then merging the final inner element's own Modifiers. For the common case of an element that carries modifiers directly (no wrapper) -- every cell in a large keyed grid -- the accumulator is still referentially the element's own ElementModifiers, so that final merge was a self-merge x.Merge(x): it allocated a fresh, value-identical ElementModifiers plus its non-null Layout/Visual bucket sub-records (a Layout+Visual cell is parent + 2 buckets = 3 records, on each of the old and new sides => ~6) per changed cell, every render. On the StocksGrid workload (500 cells, ~50% mutation) that is ~3 KB/changed cell (~7 MB/render of pure garbage). Guard the final merge with !ReferenceEquals(accumulator, element.Modifiers): only merge when a wrapper layer actually contributed a distinct instance. When nothing wrapped the element, keep its own Modifiers reference as-is. Semantically identical (Merge(x,x) is value-equal to x); purely removes the allocation. ApplyModifiers behavior is unchanged. Scope note: this PR is P1 only. The originally-paired P2 (an ApplyModifiers fast-path that skips the post-update pass when modifiers compare equal) was dropped after review found Element.ModifiersEqual does not compare every field ApplyModifiers writes (RequestedTheme, Scale/Rotation/Translation/CenterPoint, inline-flow margin/padding/border, OnUnmountAction/OnUpdateAction hooks), so a structurally-equal compare can coexist with a changed transform/theme and the skip would leave the control stale. P2 will return as its own PR that first makes ModifiersEqual complete w.r.t. ApplyModifiers' guarded writes. Tests (tests/Reactor.Tests/ReconcilerModifierMergeTests.cs): - Merge(x,x) is value-equal to x but a distinct instance (the invariant the guard relies on). - A real wrapper+inner merge still combines correctly (inner wins, base fills gaps) -- the guard only skips the self case, never a real merge. - Revert->fail teeth: 50k Update calls on a direct-modifier leaf allocate ~0 B/call with the guard; reverting it allocates ~1.95 KB/call (cap 64 B). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

azchohfi · 2026-06-26T07:19:32Z

/perf

… preservation (P1 pr-review) Folds two test-coverage findings from the pr-review of #692 (both multi-model-confirmed), strengthening the modifier-merge teeth without touching production code: - Update_WrapperLayer_StillMergesInnerModifiers: proves the !ReferenceEquals guard skips ONLY the self-merge - when a ModifiedElement wrapper contributes a distinct modifier instance, Reconciler.Update''s fall-through still performs the real merge. Differential-allocation teeth verified to bite: simulating an over-broad guard makes this fail while the existing self-merge tooth still passes (unique coverage). - Merge_Preserves_Lifecycle_Callbacks: pins that ElementModifiers.Merge preserves OnMount/OnUnmount/OnUpdateAction (other wins, base fills gaps), guarding the same callback-drop class that bit a sibling change. Full Reactor.Tests green (9707 passed / 64 skipped / 0 failed). Test-only; no src/Reactor change, so the P1 perf delta on ad4de25 is unaffected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

azchohfi · 2026-06-26T11:31:16Z

/perf

Acceptance-demo run: end-to-end validation of the complete merged harness (alloc metric + ns micro-suite + low-mutation skip-floor + keyed-list leg) on real CI-built exes. #692 is the self-merge-guard alloc fix, so the StocksGrid allocation headline is the story; keyed/skip-floor legs are expected within-noise (this PR doesn't target those paths). Baseline = current main (41e41d7, production-identical — all four harness merges were 0-src). — PERFVAL harness session

azchohfi · 2026-06-26T13:47:09Z

/perf

Re-fire post-#700-merge (budget-fit 240c31b5). The 12:07Z run (28235303559) validated the −40.2% alloc flagship but ran on PRE-#700 main, so the micro section was absent (13-bench suite timed out at 420 s, completing only M1–M4). This run validates #700's budget-fit on real CI exes: the micro section should now render. Macro alloc should re-confirm ~−40%. Expected-absent: the keyed-list leg (this PR's tree at 157ff020 predates #694 StressPerf.KeyedList, so the PR-side keyed build can't find the project — lineage, not a regression). — PERFVAL harness session

azchohfi · 2026-06-26T18:55:01Z

/perf

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

… P4) (#699) * perf(reconciler): structural-skip untouched child ranges (positional, P4) Make `ChildReconciler.ReconcilePositional` O(changed) instead of O(count) when a memoizing producer (`UseMemoCellsByIndex`) reuses untouched cells reference- equal. Targets the positional skip-walk FLOOR — the per-render O(count) cost of visiting every cell to confirm it can be skipped — which dominates low-mutation renders of large keyed grids (e.g. StocksGrid). Mechanism (CWT side-channel hint, mirrors #681's _dirtyAncestorPath bridge): • Producer: the `UseMemoCellsByIndex` reuse branch publishes a `ChildDiffHint` (ChangedIndices + ThemeSensitiveCount) keyed by reference on the fresh-per- render Element[]. No Element-record widening; AOT-safe (ConditionalWeakTable, no reflection). The theme count is carried forward incrementally so steady- state reuse stays O(changed); a one-time O(count) scan runs only on the first reuse after a full rebuild and as a defensive recompute. • Consumer: `ReconcilePositional` engages a fast path that updates ONLY the hinted changed indices and skips the rest, iff ALL hold: 1. old/new element counts match, 2. the live child collection equals that count (no in-flight anim inflated it), 3. no animation ambient, 4. a hint is present for THIS array (a CWT hit also proves Filter returned the same reference — no null/EmptyElement shifted the index space), 5. no cell is theme-sensitive (`!AnyThemeSensitive`), 6. the container is not on #681's dirty-ancestor path. Correctness: • Untouched indices are reference-equal BY CONSTRUCTION (the hook reuses prevChildren[i] for unchanged i and rebuilds only changedIndices). The changed and full-walk paths share a single `UpdateCommonChild` helper, so both honour identical skip / update / type-mismatch semantics. • The theme gate is the load-bearing safety property: the ONLY work the full walk does for an untouched cell that a structural skip would drop is re-resolving `ApplyThemeBindings` / `ApplyResourceOverrides` ThemeRefs against the effective theme (which a parent RequestedTheme toggle can change WITHOUT touching the element tree). Gating on the whole-array `AnyThemeSensitive` flag is provably safe and sidesteps the subtle dirty-path reasoning that bit P2. Tests: • Headless (Reactor.Tests): producer hint correctness incl. incremental theme- count carry + caller-mutation snapshot (UseMemoCellsTests); hint registry + IsThemeSensitive (ChildDiffHintsTests); consumer differential vs full walk incl. the gate teeth `ThemeSensitive_Hint_Forces_Full_Walk` (revert the gate → fails), count-mismatch, defensive OOB, empty-changed (ChildReconcilerStructuralSkipTests). • Live selftests (Reactor.AppTests.Host): LifecycleParity (OnUpdateAction fires for a changed index, never for untouched ref-equal — == full walk); ThemeRangeParity (themed ref-equal range under a RequestedTheme toggle renders + re-themes, no cell dropped). Per the empirical note below, the authoritative gate teeth is the headless visited-index assertion, not a live color delta. Empirical theme note: a LIVE color-delta teeth for the theme gate is impossible — WinUI auto-re-resolves a `{ThemeResource}` Style setter on any effective-theme change even when Reactor structurally skips the cell (verified: gate reverted → cells skipped, ApplyThemeBindings not re-run, yet brushes still went Light→Dark). The one snapshot a skip truly leaves stale (`ApplyResourceOverrides`' concrete ThemeRef.Resolve into fe.Resources) does not reliably re-resolve in the reconcile harness. The headless `ThemeSensitive_Hint_Forces_Full_Walk` is therefore the gate's load-bearing teeth; the live fixture is the end-to-end parity companion. Measurement: the win shows under a low-mutation skip-floor metric (PERFVAL's `--percent 0`); on the default 50%-mutation StocksGrid it is within noise. File-disjoint from the perf fleet (#692/#695 own Reconciler.Update.cs; this touches ChildReconciler.cs / ChildDiffHints.cs / UseMemoCells.cs + a parentControl thread- through in Reconciler.cs / V1HandlerAdapter.cs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * perf(reconciler): fold pr-review findings into PR-C structural skip Address the internal pr-review skill + GitHub Copilot findings on the positional structural-skip fast path. No behavior change for the StocksGrid target workload (all new gates pass on it); each fold tightens correctness or documents an invariant. - Hot-reload safety (C1): add `!ForceFullRenderActive` gate so a hot-reload force pass never structurally skips an untouched wrapper cell (the dirty path is empty during a pure force pass, so the dirty-path gate alone did not cover it). Falls back to the full walk, which honours ForceRenderThroughWrapper per cell. - Array-identity guard (S1): the hint now carries a WeakReference to the exact previous-render array its ChangedIndices were diffed against; the fast path engages only when the reconciler's old array IS that array. A cheap, self-documenting sufficient condition for the per-index ref-equality invariant; any defensive copy upstream safely falls back to the full walk. Weak on purpose -- a strong ref would chain every historical array through the reference-keyed CWT and leak. - Duplicate-index hardening (dedupe): snapshot + sort/compact the caller's changedIndices before the theme tally / builder / publish. A duplicated themed->plain index could otherwise under-count the incremental theme-sensitive tally and wrongly publish AnyThemeSensitive=false, and would rebuild + re-update the same cell N times. - Dirty-path gate (T1): documented as conservative defense-in-depth. Proven by experiment that it is behaviorally redundant given the count/CWT/array-id gates (the full walk skips a ref-equal self-triggered cell identically via CanSkipUpdate), retained as cheap insurance; costs nothing on the target workload (cell panel is a descendant, not an ancestor, of the self-triggered grid component). - ResourceOverrides conservatism (C3): documented why the ThemeRef-backed ResourceOverrides arm of IsThemeSensitive is intentionally conservative. Tests: weak-ref round-trip + stale-old-array teeth (gate 8) + chained theme-count carry (T3) + duplicate-index theme-count/build-once (T4). Full Reactor.Tests green (9731); StructuralSkip selftests green; core lib Release AOT-clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * perf(reconciler): address Copilot review on PR-C structural skip Fold the three GitHub Copilot review findings on the positional structural-skip fast path. All are hardening; no behavior change on the StocksGrid target. - Null cells (findings 1+2): a cell builder may legitimately return null (ChildReconciler.Filter drops nulls downstream), but PR-C's theme tally now inspects prev/built cells via ChildDiffHints.IsThemeSensitive, which dereferenced element.ThemeBindings and would NRE on a null. Widen the predicate to accept Element? and treat null as non-theme-sensitive (a null has no bindings to re-resolve). Fixes all three call sites in UseMemoCellsByIndex (the O(count) CountThemeSensitive scan + both incremental tally reads) at the single chokepoint. - DebugElementsSkipped diagnostic (finding 3): the fast path adjusted the skipped-element counter by `common - changed.Length`, but the loop defensively ignores out-of-range hint indices, so the raw hint length over-counts visited work and the diagnostic could skew (or, with enough out-of-range indices, go negative). Track indices ACTUALLY visited and base the adjustment on that, making the counter match the full walk exactly. Tests: null-cell predicate guard + producer null-cell theme-scan teeth; the out-of-range consumer test now asserts the skipped-element count equals the full-walk total (4), which the old `common - changed.Length` undercounted. Full Reactor.Tests green (9733); core lib Release AOT-clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fold pr-review skill findings into PR-C structural skip Ran the internal pr-review skill on the PR-C HEAD (7 dimensions + a gpt-5.4 multi-model cross-check, a different model family). Fold the actionable findings: - H1 (test-coverage; multi-model CONFIRMED load-bearing): add a hot-reload gate teeth selftest. StructuralSkip_HotReloadWrapperReRender puts a WRAPPER cell (Component) inside a UseMemoCellsByIndex range whose body a simulated hot-reload edit changes, then drives a real force pass. The fast path's !ForceFullRenderActive gate must defer to the full walk (which honours ForceRenderThroughWrapper per cell) so the wrapper re-renders its edited body. Teeth verified: reverting the gate fails WrapperReRenders + OldBodyGone (the structural skip swallows the edit). - H2 (test-coverage; partially-confirmed): add a headless differential test that mirrors the real producer's reference-equal reuse at untouched indices (not fresh copies) and asserts the fast-path output == full-walk output (identical skip accounting, no structural mutation, visited set a subset). - M1 (security + correctness; multi-model: real but not a ship-blocker, no cheap complete defense) + M3 (docs + api): document the returned array's immutability / no-mutation contract, the changedIndices dedupe contract, and the theme-sensitive fallback in the UseMemoCellsByIndex XML doc; hand-sync the generated reference MD. Dispositions recorded (no code change): - H3 / gate 6 (!IsOnDirtyAncestorPath): multi-model DISPUTED the test-coverage finding and independently confirmed the gate is behaviorally redundant given the count/CWT/array-id gates (a ref-equal untouched cell is skipped identically by the full walk via Element.CanSkipUpdate before dirty-path logic is consulted). No behavioral teeth is constructible; kept as documented cheap defense-in-depth. - M2: ThemeRangeParity already documents itself in-code as a smoke/parity check, not the gate teeth; the authoritative !AnyThemeSensitive teeth is the headless ChildReconcilerStructuralSkipTests.ThemeSensitive_Hint_Forces_Full_Walk. - L1: the ResourceOverrides arm of IsThemeSensitive is intentionally conservative (already documented) per the verified theme crux. Gates: core lib Release AOT 0W/0E; Reactor.Tests 9734 pass / 0 fail; StructuralSkip selftests 3 fixtures / 14 checks green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * test: pin structural-skip per-cell read elision as an allocation budget The existing ChildReconcilerStructuralSkipTests assert the fast path's VISIT COUNT (which child indices are read) but nothing pins the resulting allocation cut, so the measured StocksGrid allocation win (#699) could be silently reverted with every behavioural test still green. Add Structural_Skip_Pins_PerCell_Read_Elision_As_Allocation_Budget: a MeasuringChildCollection charges a fixed managed allocation per Get(i), modeling the per-cell COM read / marshaling the skip elides for untouched reference-equal cells (the real cost is native and unmeasurable headless). Fast path (hint published) allocates O(changed); full walk (no hint) allocates O(count). Asserts the mechanism (5 vs 500 reads/iter) and an 8x GC-bytes budget. Has teeth: disabling the fast-path gate makes the hinted path walk every cell, collapsing fast onto full and failing the test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * harden ChildDiffHint.AnyThemeSensitive to fail-safe (!= 0 not > 0) Copilot review on #699 flagged that AnyThemeSensitive derives from ThemeSensitiveCount > 0, so a hypothetical negative count would read as NOT theme-sensitive and could allow the structural-skip fast path to skip a theme-sensitive subtree (a missed-update risk). The only producer (UseMemoCellsByIndex) already clamps its incremental tally to a >= 0 floor before publishing (UseMemoCells.cs:299-300) and CountThemeSensitive only counts upward, so a negative is unreachable and > 0 is correct today. But this is the SAFETY gate for a correctness- sensitive skip, so harden the type to be fail-safe regardless: test != 0 rather than > 0. Behavior is byte-identical for every value the producer can emit (all >= 0); the only difference is that an anomalous negative now BLOCKS the skip (forces the always-correct full walk) instead of silently allowing it — the correct fail direction for a correctness gate. Provably perf-neutral: the StocksGrid workload publishes count == 0 every render, where both > 0 and != 0 yield false identically, so the fast path engages unchanged. Adds a fail-safe teeth test that goes red if the guard is reverted to > 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: azchohfi <azchohfi@users.noreply.github.com>

azchohfi force-pushed the azchohfi-reconciler-perf-profile branch from 45f5916 to fce3d26 Compare June 26, 2026 06:29

azchohfi requested a review from Copilot June 26, 2026 06:42

Copilot started reviewing on behalf of azchohfi June 26, 2026 06:42 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread src/Reactor/Core/Reconciler.Update.cs Outdated

azchohfi force-pushed the azchohfi-reconciler-perf-profile branch from fce3d26 to 7fb4f2e Compare June 26, 2026 07:04

azchohfi changed the title ~~perf(reconciler): eliminate redundant modifier self-merge + ApplyModifiers fast-path (P1+P2)~~ perf(reconciler): skip redundant element-modifier self-merge (P1) Jun 26, 2026

azchohfi requested a review from Copilot June 26, 2026 07:05

Copilot started reviewing on behalf of azchohfi June 26, 2026 07:05 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread src/Reactor/Core/Reconciler.Update.cs Outdated

azchohfi force-pushed the azchohfi-reconciler-perf-profile branch from 7fb4f2e to ad4de25 Compare June 26, 2026 07:12

azchohfi requested a review from Copilot June 26, 2026 07:12

Copilot started reviewing on behalf of azchohfi June 26, 2026 07:13 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

azchohfi mentioned this pull request Jun 26, 2026

perf(reconciler): skip redundant automation-name write when value already matches (P3) #695

Closed

azchohfi requested a review from Copilot June 26, 2026 09:09

Copilot started reviewing on behalf of azchohfi June 26, 2026 09:09 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

azchohfi mentioned this pull request Jun 26, 2026

perf(reconciler): structural-skip untouched child ranges (positional, P4) #699

Merged

azchohfi mentioned this pull request Jun 26, 2026

perf(ci): fit the reconciler micro-suite inside its per-side budget #700

Merged

This was referenced Jun 26, 2026

perf(ci): rep-level micro interleaving + dormant ns flag mechanism #703

Merged

perf(ci): make micro-suite incompleteness loud + recalibrate alloc band #704

Merged

azchohfi marked this pull request as ready for review June 26, 2026 22:36

azchohfi requested a review from codemonkeychris as a code owner June 26, 2026 22:36

azchohfi requested a review from Copilot June 26, 2026 22:36

Copilot started reviewing on behalf of azchohfi June 26, 2026 22:36 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

azchohfi merged commit c497c94 into main Jun 26, 2026
20 checks passed

azchohfi deleted the azchohfi-reconciler-perf-profile branch June 26, 2026 23:00

azchohfi mentioned this pull request Jun 27, 2026

perf: cache hook delegates + drop per-render hook allocations #668

Closed

Uh oh!

Conversation

azchohfi commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Change

Scope note (P1 only)

Tests

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚡ Reactor perf comparison

Regression vs main baseline

Low-mutation skip-floor (--percent 0)

Allocation (Reactor) — lower is better

Reconciler micro-benchmarks (PerfBench.ControlModel)

Cross-framework reference (same StocksGrid workload)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

azchohfi commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Regression vs `main` baseline

Low-mutation skip-floor (`--percent 0`)

Reconciler micro-benchmarks (`PerfBench.ControlModel`)