Skip to content

DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main#719

Closed
azchohfi wants to merge 8 commits into
mainfrom
temp-665-post692
Closed

DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main#719
azchohfi wants to merge 8 commits into
mainfrom
temp-665-post692

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

DO NOT MERGE. Throwaway measurement PR only.

Purpose: official /perf re-measure of #665 against the M14 Dsl_Rebuild_Cascade micro, with the baseline now INCLUDING #692 (flagship reconciler self-merge guard, merged to main as c497c94) + M14 (#711). This is #665's 8 commits rebased cleanly onto current origin/main. Build 0/0, Reactor.Tests 9754 pass. Origin #665 stays pristine; this branch is deleted after the run.

azchohfi and others added 8 commits June 26, 2026 17:58
Restores the steady-state ShallowEquals -> CanSkipUpdate fast path for
interactive/selectable grid cells (it was defeated for any cell carrying an
event handler or a fluent .Set chain) and removes the dominant per-cell,
per-frame allocations in the Element/DSL layer.

FLAGSHIP-1 (Element.cs): replace the "any handler present => never skip" rule
in ModifiersEqual with ModifierCallbacksEqual, a per-slot ReferenceEquals over
all 27 callback/gesture/drag-drop slots. Reference-stable handlers now skip;
capturing closures still force Update. Per-slot reference equality (not a pure
presence bitmask) is the safe predicate because modifier handlers dispatch
through ModifierEventHandlerState.Current*, which is refreshed only on the
non-skip Update path -- presence-only would strand a stale closure on a
skipped cell. Also closes a latent hole where 21 of the slots were ignored.

FLAGSHIP-2 (Element.cs): add SettersEqual<T> (length + element-wise
ReferenceEquals) and route all 51 ShallowEquals arms through it instead of
ReferenceEquals(Setters). The .Set helpers append to a fresh array each render
([.. Setters, configure]); element-wise compare lets an unchanged chain of
non-capturing lambdas / method groups (compiler-cached statics) skip again.

_emptyLayout/_emptyVisual sentinels; compare LayoutModifiers by exact record
equality and VisualModifiers field-by-field (brushes via BrushesEqual). Also
fixes latent skip bugs: RequestedTheme/logical-inset and
Scale/Rotation/Translation/CenterPoint were previously not compared.

SingleAttachedDictionary for the common one-attached-value case (one .Grid per
cell), avoiding a full Dictionary<Type,object> alloc per cell per frame.

Color via GetBrush(Color), replacing one new SolidColorBrush (a thread-affine
WinRT DependencyObject) per Parse call. Single shared brush cache.

Star-track loop; InterspersedGrid exact-sized arrays; FilterChildren two-pass
count+fill; shared static s_oneStar cross-axis track. #160 already satisfied
(ShallowEquals is type-switch-first).

Behavior preserved exactly (same render output, same events fire). Validated:
9693 unit tests pass; Modifier/Reconciler/Skip/Event/DataGrid/Flex selftests
0 failures; core lib Release x64 clean (0 AOT/trim warnings-as-errors). New
xUnit coverage in PerfDiffSkipPathTests.cs and DeclarativeModifierTests.cs.

Closes #664

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LAGSHIP-2 safety revert

Expands the diff/memory PR into the ElementExtensions fluent-modifier layer and
makes the FLAGSHIP-2 setter comparison safe-conservative.

FLAGSHIP-2 (safety revert): SettersEqual now returns true only for the same array
instance (memoized element) or both-empty. Element-wise delegate-identity compare
was unsafe: a .Set(...) is an apply-time imperative write whose effect can read
mutable external state, so identity-equality does NOT imply the write is
unnecessary. Setters keep the element on the Update path whenever any are present.
LazyVStack/HStack scroll/repeater setter compares routed through SettersEqual
(both-empty now compares equal). FLAGSHIP-1 handler ref-equality is unaffected
(handlers dispatch later via ModifierEventHandlerState.Current*).

#165/#157 — ModifyLayout/ModifyVisual bucket-merge entry points. ~36 pure-layout
and pure-visual fluent modifiers (Margin/Padding/Width/Height/Min*/Max*/Align/
Opacity/Scale/Rotation/CenterPoint/CornerRadius/Background/Foreground/WithBorder/
Translation) merge their delta straight into the Layout/Visual slot instead of
allocating a throwaway parent ElementModifiers (and its bucket shim) per chained
call. Semantics are identical to Modify(el, new ElementModifiers { <field> }):
proven against ElementModifiers.Merge and asserted by *_Chain_Equals_HandBuilt tests.

#168 (brush cache wiring) — public BrushHelper.Parse(string) still returns a fresh,
caller-owned brush (unchanged vs main). New internal ParseShared(string) returns a
shared per-thread per-ARGB brush; Background/Foreground/WithBorder(string) hot paths
route through it, so thousands of identical cells reuse one brush instead of
allocating a WinRT DependencyObject per cell per render. Pure allocation win — the
diff compares brushes structurally (BrushesEqual = color+opacity), so behavior is
unchanged.

#174 (ApplyStyle cache) — cache the OnMount delegate per style name so an unchanged
.ApplyStyle("X") chain reuses one delegate instead of a fresh capturing closure +
display class per render. Memory-only: OnMountAction is excluded from ModifiersEqual
(runs at mount only), so this is not a skip-path change.

Out of scope / deferred: #167 ThemeBindings (Reconciler ripple); #158/#161-164.

Validation: core lib Release x64 0 warnings/0 errors; Reactor.Tests 9704 pass/0 fail/
64 skip (PerfDiffSkipPathTests 31/31); selftests Modifier/Skip/Event/Brush/DataGrid/
Flex 0 failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ernal; strengthen Dsl tests

Addresses pr-review findings (multi-model cross-check + panel) on the draft:

- correctness (HIGH, found by multi-model gpt-5.4 cross-check): UniformGrid wrote
  the positioned `.Grid(...)` cells back into the array returned by FilterChildren.
  After #173, FilterChildren's no-null fast path returns the caller's `items`
  array ALIASED (no copy), so an explicit `UniformGrid(orientation, myArray)` call
  corrupted the caller's array with Grid-positioned wrappers — a regression vs main
  (where FilterChildren always copied). UniformGrid now positions into its own
  Element[] and never mutates `filtered`. Other 16 FilterChildren consumers only
  pass the result as read-only Children, and InterspersedGrid already builds its
  own array, so this was the sole offender. Allocation-neutral on the hot (no-null)
  path vs main. + regression test asserting the source array is untouched.

- api-ergonomics / packaging / alternative-solution / security (converged): make
  BrushHelper.GetBrush(Color) internal. It was a NEW public method (main never
  exposed it) returning a shared, mutable SolidColorBrush from a [ThreadStatic]
  cache — a public-surface + shared-mutable-brush footgun. Only ParseShared (same
  assembly) calls it; confirmed no external references repo-wide. Docs updated so
  the public entry point is Parse(string) (fresh, caller-owned).

- test-coverage: #170 ForEach tests now assert rendered Content/index sequence and
  empty-input behavior (not just counts); added headless-safe BrushHelper.ParseColor
  coverage for the color path behind the shared-brush modifiers.

Validation: core lib Release x64 0/0; Reactor.Tests 9711 pass/0 fail/64 skip
(PerfDiffSkipPathTests 38/38); selftests Grid/Reconciler/Modifier/Skip/Event/Brush/
DataGrid/Flex 0 failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…h sharing

Review-pass fixes on PR #665 (pr-review skill + gpt-5.4 multi-model cross-check):

H1 (correctness, HIGH) - ModifiersEqual let an element with a stable .OnTapped
take the skip path without comparing OnUnmountAction. The skip path bypasses
ApplyModifiers, which is what re-registers the latest teardown in the
reconciler's _onUnmountActions on every Update; a changed .OnUnmount closure was
therefore stranded and the stale first-render teardown would fire at unmount.
Fixed by comparing OnUnmountAction by ReferenceEquals (mirrors the handler/.Ref
slots). Zero grid-workload cost (cells have null teardown both renders). The
plain-leaf no-handler case was a pre-existing latent bug on main; this also
closes it (strictly more correct). Regression test added.

H2 (multi-model, HIGH) - reverted #168 shared-brush sharing. main deliberately
creates a fresh SolidColorBrush per call because a brush is a thread-affine
DependencyObject with mutable Color/Opacity and cannot be safely shared across
controls; an imperative .Set(tb => mutate tb.Background) would otherwise leak to
every peer cell and poison the cache. Removed ParseShared/GetBrush/_brushCache;
Background/Foreground/WithBorder(string) now use the fresh-brush Parse. Kept the
safe ParseColor string->Color cache (pre-existing on main) and the #165/#157
ModifyVisual bucket-merge alloc win.

Test-coverage (MEDIUM) - added a 27-slot ModifierCallbacksEqual sweep proving
every event/gesture/drag-drop handler slot participates by reference (guards
against a forgotten or mis-paired slot).

Validation: src/Reactor Release x64 0 warn / 0 err (AOT-as-errors gate);
tests/Reactor.Tests 9713 pass / 0 fail / 64 skip.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s header

Copilot review (comment on 783030a) flagged the test file's header comment
still claimed a '#168 brush-cache' validated by the selftest/AppTests tiers.
After the shared-brush revert, BrushHelper caches only the parsed Color and
Parse returns a fresh brush per call, so there is no shared brush instance to
validate on a real thread. Reworded the header to match. Comment-only change;
9713 pass / 0 fail unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Locks the realistic DataGrid cell shape (reference-stable OnPointerPressed +

OnTapped, all other routed slots null) at the ShallowEquals/element level: the

cell stays skip-eligible across renders, but declines the skip the instant any

secondary routed slot (OnDoubleTapped/OnGotFocus/OnLostFocus) changes identity,

so that slot's ModifierEventHandlerState.Current* is refreshed instead of

dispatching a stale closure. Complements the per-slot 27-way sweep, which already

proves ModifierCallbacksEqual compares the full Current*-dispatched set.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ModifierCallbacksEqual added 6 gesture/drag slots (Pan/Pinch/Rotate/
LongPress/DragSource/DropTarget) that the historical diff never compared.
Those dispatch through separate per-element gesture/drag state, not
ModifierEventHandlerState.Current*, so comparing them forced Update where
the framework previously skipped — re-arming an in-flight gesture
mid-interaction. For a held long-press this re-registered the handler
between its Began and Ended phases, so the released callback fired against
a refreshed closure and double-dispatched (E2E GestureTests
.Interactive_OnLongPress_FiresAfterHold: count 2 vs expected 1).

Narrow the predicate to the 21 routed-input handlers (the actual
grid-cell skip lever), matching main's gesture behavior exactly and
preserving observable behavior. Grid cells use only routed handlers, so
the perf win is unaffected.

Tests: replace the 27-slot sweep with a 21-slot sweep + a
behavior-preservation guard proving gesture/drag closures that change
identity stay skip-eligible (so a future contributor can't silently
re-add them and reintroduce the double-fire).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot review on a6adfdd flagged that AttachedEqual's foreach iterates the
IReadOnlyDictionary interface, so SingleAttachedDictionary's yield-based
enumerator allocates an IEnumerator per diff — partially defeating #155's
per-cell allocation cut for the common one-.Grid-per-cell case.

Special-case SingleAttachedDictionary on either side and compare the lone slot
via TryGetValue (Count is already known equal), so the hot path is allocation
free. Behavior-preserving: same equality result the foreach produced.

Tests: AttachedEqual_SingleEntry_FastPath_Matches_Semantics (incl. asymmetric
b-is-Single branch) + AttachedEqual_SingleEntry_DoesNotAllocate (10k calls,
asserts < 2KB via GC.GetAllocatedBytesForCurrentThread).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

Comment on lines +1958 to +1962
foreach (var gc in group.Children)
{
if (gc is not null and not EmptyElement)
count++;
}
@github-actions

Copy link
Copy Markdown

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs main baseline

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 2.55 2.47 -1.9% 95% CI [-8.2, +4.4] ≈ within noise
Avg Reconcile (ms) ↓ 140.9 141.3 +1.4% 95% CI [-2.5, +5.4] ≈ within noise
Avg Diff (ms) ↓ 124.5 127.6 +3.0% 95% CI [-1.1, +7.2] ≈ within noise
Avg Memory (MB) ↓ 288.5 285.7 -0.4% 95% CI [-1.3, +0.6] ≈ within noise

Low-mutation skip-floor (--percent 0)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 15.81 15.65 -2.1% 95% CI [-9.5, +5.4] ≈ within noise
Avg Reconcile (ms) ↓ 40.3 38.7 +2.7% 95% CI [-8.1, +13.4] ≈ within noise
Avg Diff (ms) ↓ 37.8 36.2 +2.8% 95% CI [-8.2, +13.8] ≈ within noise
Avg Memory (MB) ↓ 266.8 266.5 -0.1% 95% CI [-0.8, +0.5] ≈ within noise

Allocation (Reactor) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 5766814 5836460 +0.7% 95% CI [-0.8, +2.1] ≈ within noise
Gen0 GC / 1k renders ↓ 285.71 262.93 -10.8% 95% CI [-22.1, +0.4] ≈ within noise

Keyed-list workload (StressPerf.KeyedList, --percent 50)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyedReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 20.03 20.12 +1.0% 95% CI [-2.0, +4.0] ≈ within noise
Avg Reconcile (ms) ↓ 16.6 16.5 -1.9% 95% CI [-4.1, +0.2] ≈ within noise
Avg Diff (ms) ↓ 16.5 16.3 -1.8% 95% CI [-4.0, +0.3] ≈ within noise
Avg Memory (MB) ↓ 168.9 168.9 0.0% 95% CI [-0.9, +0.9] ≈ within noise

Allocation (keyed-list) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 314713 313059 -0.6% 95% CI [-0.9, -0.2] ✅ improvement
Gen0 GC / 1k renders ↓ 15.19 15.12 +0.7% 95% CI [-11.1, +12.4] ≈ within noise

Reconciler micro-benchmarks (PerfBench.ControlModel)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench main ns/op Δ ns (95% CI) main B/op Δ alloc (95% CI) Status
M1 Mount_Leaf_NoCallback 152070.4 -1.1% 95% CI [-4.3, +2.2] 1140.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M2 Mount_Leaf_OneCallback 111143.4 +2.1% 95% CI [-2.6, +6.8] 3383.3 0.0% 95% CI [0.0, 0.0] ≈ within noise
M3 Mount_Leaf_ThreeCallbacks 234028.1 -0.3% 95% CI [-2.8, +2.2] 8485.6 -1.6% 95% CI [-3.2, -0.1] ≈ within noise
M4 Dispatch_Switch_Cold 115800.3 -0.8% 95% CI [-2.4, +0.8] 1767.8 0.0% 95% CI [0.0, 0.0] ≈ within noise
M5 Dispatch_Switch_Warm 119592.2 -0.5% 95% CI [-6.9, +6.0] 1766.0 +0.8% 95% CI [-1.3, +2.8] ≈ within noise
M6 Dispatch_ExternalType 91811.4 +1.8% 95% CI [-2.8, +6.4] 987.6 +1.5% 95% CI [-2.2, +5.2] ≈ within noise
M7 Update_NoChange 56826.5 -2.0% 95% CI [-2.8, -1.1] 452.1 -2.0% 95% CI [-12.6, +8.6] ≈ within noise
M8 Update_OneLeafChanged 42156.9 +0.2% 95% CI [-0.7, +1.2] 536.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
M9 Update_AllChanged 2920899.4 -1.4% 95% CI [-3.6, +0.7] 184278.1 0.0% 95% CI [0.0, 0.0] ≈ within noise
M10 EventHandlerState_Alloc 91718.8 +2.3% 95% CI [+0.1, +4.4] 3095.2 0.0% 95% CI [-0.1, +0.1] ≈ within noise
M11 ModifierEHS_Frequency 47625.3 +0.5% 95% CI [-0.5, +1.6] 638.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M12 Pool_Rent_HotPath 118637.6 +0.1% 95% CI [-1.5, +1.7] 1099.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M13 Setters_Suppression_Scope 160.1 +3.0% 95% CI [-6.3, +12.3] 26.7 0.0% 95% CI [0.0, 0.0] ≈ within noise
M14 Dsl_Rebuild_Cascade 1899232.5 -13.2% 95% CI [-14.6, -11.9] 2915828.9 -23.5% 95% CI [-23.5, -23.5] ✅ improvement
C207 ChangeHandler_DpRead_Coalesce 1433.9 +1.4% 95% CI [-2.5, +5.3] 0.6 0.0% 95% CI [0.0, 0.0] ≈ within noise
OAlloc Optional_Element_Alloc 222.5 -2.6% 95% CI [-8.2, +2.9] 528.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
OUpdate Optional_Reconciler_Update 12252.4 +4.0% 95% CI [+2.1, +6.0] 2676.3 +3.6% 95% CI [+3.6, +3.6] ⚠️ regression

Cross-framework reference (same StocksGrid workload)

Metric vanilla WinUI3¹ Rust windows-reactor² Reactor (this PR)
Renders/sec ↑ 3.21 4.75 2.47
Avg Reconcile (ms) ↓ n/a 20.0 141.3
Avg Diff (ms) ↓ n/a 17.7 127.6
Avg Memory (MB) ↓ 264.0 197.3 285.7

↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042987674.
Generated by .github/workflows/perf-compare.yml · PR bd16d0e vs main 023d879 · 2026-06-27T01:43:17Z · run log.

@azchohfi azchohfi closed this Jun 27, 2026
@azchohfi azchohfi deleted the temp-665-post692 branch June 27, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant