Skip to content

DO-NOT-MERGE: #665-vs-M14 re-measure (new head e746f996, post-#692)#722

Closed
azchohfi wants to merge 9 commits into
mainfrom
temp-665-newhead
Closed

DO-NOT-MERGE: #665-vs-M14 re-measure (new head e746f996, post-#692)#722
azchohfi wants to merge 9 commits into
mainfrom
temp-665-newhead

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

DO NOT MERGE. Throwaway measurement PR only.

Purpose: official /perf re-measure of #665 at its NEW head (e746f99 — adds the 2 cache-polish fixes: BrushHelper OrdinalIgnoreCase color cache + bounded ElementExtensions style-applier cache) against the M14 Dsl_Rebuild_Cascade micro, baseline = current origin/main (includes #692 flagship + M14). #665's 9 commits rebased cleanly onto main. Build 0/0, Reactor.Tests 9756 pass. Origin #665 stays pristine; this branch is deleted after the run.

azchohfi and others added 9 commits June 26, 2026 19:40
Restores the steady-state ShallowEquals -> CanSkipUpdate fast path for
interactive/selectable grid cells (it was defeated for any cell carrying an
event handler or a fluent .Set chain) and removes the dominant per-cell,
per-frame allocations in the Element/DSL layer.

FLAGSHIP-1 (Element.cs): replace the "any handler present => never skip" rule
in ModifiersEqual with ModifierCallbacksEqual, a per-slot ReferenceEquals over
all 27 callback/gesture/drag-drop slots. Reference-stable handlers now skip;
capturing closures still force Update. Per-slot reference equality (not a pure
presence bitmask) is the safe predicate because modifier handlers dispatch
through ModifierEventHandlerState.Current*, which is refreshed only on the
non-skip Update path -- presence-only would strand a stale closure on a
skipped cell. Also closes a latent hole where 21 of the slots were ignored.

FLAGSHIP-2 (Element.cs): add SettersEqual<T> (length + element-wise
ReferenceEquals) and route all 51 ShallowEquals arms through it instead of
ReferenceEquals(Setters). The .Set helpers append to a fresh array each render
([.. Setters, configure]); element-wise compare lets an unchanged chain of
non-capturing lambdas / method groups (compiler-cached statics) skip again.

_emptyLayout/_emptyVisual sentinels; compare LayoutModifiers by exact record
equality and VisualModifiers field-by-field (brushes via BrushesEqual). Also
fixes latent skip bugs: RequestedTheme/logical-inset and
Scale/Rotation/Translation/CenterPoint were previously not compared.

SingleAttachedDictionary for the common one-attached-value case (one .Grid per
cell), avoiding a full Dictionary<Type,object> alloc per cell per frame.

Color via GetBrush(Color), replacing one new SolidColorBrush (a thread-affine
WinRT DependencyObject) per Parse call. Single shared brush cache.

Star-track loop; InterspersedGrid exact-sized arrays; FilterChildren two-pass
count+fill; shared static s_oneStar cross-axis track. #160 already satisfied
(ShallowEquals is type-switch-first).

Behavior preserved exactly (same render output, same events fire). Validated:
9693 unit tests pass; Modifier/Reconciler/Skip/Event/DataGrid/Flex selftests
0 failures; core lib Release x64 clean (0 AOT/trim warnings-as-errors). New
xUnit coverage in PerfDiffSkipPathTests.cs and DeclarativeModifierTests.cs.

Closes #664

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LAGSHIP-2 safety revert

Expands the diff/memory PR into the ElementExtensions fluent-modifier layer and
makes the FLAGSHIP-2 setter comparison safe-conservative.

FLAGSHIP-2 (safety revert): SettersEqual now returns true only for the same array
instance (memoized element) or both-empty. Element-wise delegate-identity compare
was unsafe: a .Set(...) is an apply-time imperative write whose effect can read
mutable external state, so identity-equality does NOT imply the write is
unnecessary. Setters keep the element on the Update path whenever any are present.
LazyVStack/HStack scroll/repeater setter compares routed through SettersEqual
(both-empty now compares equal). FLAGSHIP-1 handler ref-equality is unaffected
(handlers dispatch later via ModifierEventHandlerState.Current*).

#165/#157 — ModifyLayout/ModifyVisual bucket-merge entry points. ~36 pure-layout
and pure-visual fluent modifiers (Margin/Padding/Width/Height/Min*/Max*/Align/
Opacity/Scale/Rotation/CenterPoint/CornerRadius/Background/Foreground/WithBorder/
Translation) merge their delta straight into the Layout/Visual slot instead of
allocating a throwaway parent ElementModifiers (and its bucket shim) per chained
call. Semantics are identical to Modify(el, new ElementModifiers { <field> }):
proven against ElementModifiers.Merge and asserted by *_Chain_Equals_HandBuilt tests.

#168 (brush cache wiring) — public BrushHelper.Parse(string) still returns a fresh,
caller-owned brush (unchanged vs main). New internal ParseShared(string) returns a
shared per-thread per-ARGB brush; Background/Foreground/WithBorder(string) hot paths
route through it, so thousands of identical cells reuse one brush instead of
allocating a WinRT DependencyObject per cell per render. Pure allocation win — the
diff compares brushes structurally (BrushesEqual = color+opacity), so behavior is
unchanged.

#174 (ApplyStyle cache) — cache the OnMount delegate per style name so an unchanged
.ApplyStyle("X") chain reuses one delegate instead of a fresh capturing closure +
display class per render. Memory-only: OnMountAction is excluded from ModifiersEqual
(runs at mount only), so this is not a skip-path change.

Out of scope / deferred: #167 ThemeBindings (Reconciler ripple); #158/#161-164.

Validation: core lib Release x64 0 warnings/0 errors; Reactor.Tests 9704 pass/0 fail/
64 skip (PerfDiffSkipPathTests 31/31); selftests Modifier/Skip/Event/Brush/DataGrid/
Flex 0 failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ernal; strengthen Dsl tests

Addresses pr-review findings (multi-model cross-check + panel) on the draft:

- correctness (HIGH, found by multi-model gpt-5.4 cross-check): UniformGrid wrote
  the positioned `.Grid(...)` cells back into the array returned by FilterChildren.
  After #173, FilterChildren's no-null fast path returns the caller's `items`
  array ALIASED (no copy), so an explicit `UniformGrid(orientation, myArray)` call
  corrupted the caller's array with Grid-positioned wrappers — a regression vs main
  (where FilterChildren always copied). UniformGrid now positions into its own
  Element[] and never mutates `filtered`. Other 16 FilterChildren consumers only
  pass the result as read-only Children, and InterspersedGrid already builds its
  own array, so this was the sole offender. Allocation-neutral on the hot (no-null)
  path vs main. + regression test asserting the source array is untouched.

- api-ergonomics / packaging / alternative-solution / security (converged): make
  BrushHelper.GetBrush(Color) internal. It was a NEW public method (main never
  exposed it) returning a shared, mutable SolidColorBrush from a [ThreadStatic]
  cache — a public-surface + shared-mutable-brush footgun. Only ParseShared (same
  assembly) calls it; confirmed no external references repo-wide. Docs updated so
  the public entry point is Parse(string) (fresh, caller-owned).

- test-coverage: #170 ForEach tests now assert rendered Content/index sequence and
  empty-input behavior (not just counts); added headless-safe BrushHelper.ParseColor
  coverage for the color path behind the shared-brush modifiers.

Validation: core lib Release x64 0/0; Reactor.Tests 9711 pass/0 fail/64 skip
(PerfDiffSkipPathTests 38/38); selftests Grid/Reconciler/Modifier/Skip/Event/Brush/
DataGrid/Flex 0 failures.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…h sharing

Review-pass fixes on PR #665 (pr-review skill + gpt-5.4 multi-model cross-check):

H1 (correctness, HIGH) - ModifiersEqual let an element with a stable .OnTapped
take the skip path without comparing OnUnmountAction. The skip path bypasses
ApplyModifiers, which is what re-registers the latest teardown in the
reconciler's _onUnmountActions on every Update; a changed .OnUnmount closure was
therefore stranded and the stale first-render teardown would fire at unmount.
Fixed by comparing OnUnmountAction by ReferenceEquals (mirrors the handler/.Ref
slots). Zero grid-workload cost (cells have null teardown both renders). The
plain-leaf no-handler case was a pre-existing latent bug on main; this also
closes it (strictly more correct). Regression test added.

H2 (multi-model, HIGH) - reverted #168 shared-brush sharing. main deliberately
creates a fresh SolidColorBrush per call because a brush is a thread-affine
DependencyObject with mutable Color/Opacity and cannot be safely shared across
controls; an imperative .Set(tb => mutate tb.Background) would otherwise leak to
every peer cell and poison the cache. Removed ParseShared/GetBrush/_brushCache;
Background/Foreground/WithBorder(string) now use the fresh-brush Parse. Kept the
safe ParseColor string->Color cache (pre-existing on main) and the #165/#157
ModifyVisual bucket-merge alloc win.

Test-coverage (MEDIUM) - added a 27-slot ModifierCallbacksEqual sweep proving
every event/gesture/drag-drop handler slot participates by reference (guards
against a forgotten or mis-paired slot).

Validation: src/Reactor Release x64 0 warn / 0 err (AOT-as-errors gate);
tests/Reactor.Tests 9713 pass / 0 fail / 64 skip.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s header

Copilot review (comment on 783030a) flagged the test file's header comment
still claimed a '#168 brush-cache' validated by the selftest/AppTests tiers.
After the shared-brush revert, BrushHelper caches only the parsed Color and
Parse returns a fresh brush per call, so there is no shared brush instance to
validate on a real thread. Reworded the header to match. Comment-only change;
9713 pass / 0 fail unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Locks the realistic DataGrid cell shape (reference-stable OnPointerPressed +

OnTapped, all other routed slots null) at the ShallowEquals/element level: the

cell stays skip-eligible across renders, but declines the skip the instant any

secondary routed slot (OnDoubleTapped/OnGotFocus/OnLostFocus) changes identity,

so that slot's ModifierEventHandlerState.Current* is refreshed instead of

dispatching a stale closure. Complements the per-slot 27-way sweep, which already

proves ModifierCallbacksEqual compares the full Current*-dispatched set.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ModifierCallbacksEqual added 6 gesture/drag slots (Pan/Pinch/Rotate/
LongPress/DragSource/DropTarget) that the historical diff never compared.
Those dispatch through separate per-element gesture/drag state, not
ModifierEventHandlerState.Current*, so comparing them forced Update where
the framework previously skipped — re-arming an in-flight gesture
mid-interaction. For a held long-press this re-registered the handler
between its Began and Ended phases, so the released callback fired against
a refreshed closure and double-dispatched (E2E GestureTests
.Interactive_OnLongPress_FiresAfterHold: count 2 vs expected 1).

Narrow the predicate to the 21 routed-input handlers (the actual
grid-cell skip lever), matching main's gesture behavior exactly and
preserving observable behavior. Grid cells use only routed handlers, so
the perf win is unaffected.

Tests: replace the 27-slot sweep with a 21-slot sweep + a
behavior-preservation guard proving gesture/drag closures that change
identity stay skip-eligible (so a future contributor can't silently
re-add them and reintroduce the double-fire).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot review on a6adfdd flagged that AttachedEqual's foreach iterates the
IReadOnlyDictionary interface, so SingleAttachedDictionary's yield-based
enumerator allocates an IEnumerator per diff — partially defeating #155's
per-cell allocation cut for the common one-.Grid-per-cell case.

Special-case SingleAttachedDictionary on either side and compare the lone slot
via TryGetValue (Count is already known equal), so the hot path is allocation
free. Behavior-preserving: same equality result the foreach produced.

Tests: AttachedEqual_SingleEntry_FastPath_Matches_Semantics (incl. asymmetric
b-is-Single branch) + AttachedEqual_SingleEntry_DoesNotAllocate (10k calls,
asserts < 2KB via GC.GetAllocatedBytesForCurrentThread).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two scoped cache-polish fixes from the dual review (Copilot + pr-review skill):

FIX A (Copilot, BrushHelper.cs:16): construct _colorCache with
StringComparer.OrdinalIgnoreCase so colors differing only in casing
("Red"/"red", "#FF0000"/"#ff0000") dedupe to one entry instead of creating
duplicates + re-running the factory per distinct casing. Safe: ParseColor
lowercases named colors before the switch and ParseHex is case-insensitive,
so case-folded keying yields identical results. ParseColor switch unchanged.

FIX B (pr-review, ElementExtensions.cs StyleApplier): bound _styleApplierCache
with a 256-entry cap as defense-in-depth against a pathological data-driven
caller passing unbounded distinct style names. A lock-free TryGetValue
fast-path preserves the #174 zero-alloc steady state (Count is only touched
on a miss); past the cap it falls back to the pre-#174 per-call delegate, so
correctness is unchanged and the cache stays bounded.

Test (BrushHelperTests): ParseColor_IsCaseInsensitive (value equality across
casing) + ParseColor_Cache_Dedupes_Across_Casing (allocation teeth: 500
distinct casings of one name collapse to a single entry => ~0 alloc).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf

Comment on lines +1958 to +1962
foreach (var gc in group.Children)
{
if (gc is not null and not EmptyElement)
count++;
}
@github-actions

Copy link
Copy Markdown

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs main baseline

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 2.68 2.64 -1.7% 95% CI [-6.7, +3.2] ≈ within noise
Avg Reconcile (ms) ↓ 136.9 137.1 +3.0% 95% CI [-2.5, +8.5] ≈ within noise
Avg Diff (ms) ↓ 123.7 124.3 +4.9% 95% CI [-2.1, +11.9] ≈ within noise
Avg Memory (MB) ↓ 289.5 287.4 -0.7% 95% CI [-1.5, 0.0] ≈ within noise

Low-mutation skip-floor (--percent 0)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 16.15 15.76 -0.9% 95% CI [-6.4, +4.6] ≈ within noise
Avg Reconcile (ms) ↓ 35.9 34.5 -0.5% 95% CI [-8.0, +7.0] ≈ within noise
Avg Diff (ms) ↓ 33.7 32.9 +0.1% 95% CI [-8.2, +8.4] ≈ within noise
Avg Memory (MB) ↓ 267.4 267.0 0.0% 95% CI [-0.4, +0.4] ≈ within noise

Allocation (Reactor) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 5718117 5781622 +1.1% 95% CI [-0.4, +2.5] ≈ within noise
Gen0 GC / 1k renders ↓ 275.86 231.80 -12.8% 95% CI [-23.6, -2.0] ✅ improvement

Keyed-list workload (StressPerf.KeyedList, --percent 50)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyedReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 20.28 21.01 +3.3% 95% CI [+0.4, +6.2] ✅ improvement
Avg Reconcile (ms) ↓ 16.1 15.7 -1.4% 95% CI [-3.0, +0.1] ≈ within noise
Avg Diff (ms) ↓ 15.9 15.5 -1.4% 95% CI [-2.9, +0.1] ≈ within noise
Avg Memory (MB) ↓ 169.1 169.1 +0.3% 95% CI [-0.5, +1.1] ≈ within noise

Allocation (keyed-list) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 315013 313581 -0.4% 95% CI [-0.8, -0.1] ✅ improvement
Gen0 GC / 1k renders ↓ 16.70 18.65 +12.7% 95% CI [+4.1, +21.2] ⚠️ regression

Reconciler micro-benchmarks (PerfBench.ControlModel)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench main ns/op Δ ns (95% CI) main B/op Δ alloc (95% CI) Status
M1 Mount_Leaf_NoCallback 149215.5 +0.5% 95% CI [-0.7, +1.8] 1140.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M2 Mount_Leaf_OneCallback 107890.3 -0.4% 95% CI [-2.6, +1.8] 3383.3 0.0% 95% CI [0.0, 0.0] ≈ within noise
M3 Mount_Leaf_ThreeCallbacks 216795.0 +1.0% 95% CI [-2.4, +4.4] 8385.9 +0.6% 95% CI [-1.8, +3.0] ≈ within noise
M4 Dispatch_Switch_Cold 104859.4 +3.1% 95% CI [-3.4, +9.6] 1767.8 0.0% 95% CI [0.0, 0.0] ≈ within noise
M5 Dispatch_Switch_Warm 106867.9 +4.7% 95% CI [-3.8, +13.1] 1766.0 -0.7% 95% CI [-1.8, +0.4] ≈ within noise
M6 Dispatch_ExternalType 90495.4 -1.8% 95% CI [-6.2, +2.6] 987.6 -1.3% 95% CI [-3.2, +0.6] ≈ within noise
M7 Update_NoChange 56453.0 -2.8% 95% CI [-4.1, -1.4] 452.1 +1.8% 95% CI [-2.2, +5.9] ≈ within noise
M8 Update_OneLeafChanged 41747.3 -0.5% 95% CI [-1.2, +0.3] 536.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
M9 Update_AllChanged 2858939.0 0.0% 95% CI [-1.0, +1.0] 184278.1 0.0% 95% CI [0.0, 0.0] ≈ within noise
M10 EventHandlerState_Alloc 87070.4 -1.3% 95% CI [-2.9, +0.4] 3095.2 0.0% 95% CI [0.0, +0.1] ≈ within noise
M11 ModifierEHS_Frequency 46262.7 +0.4% 95% CI [-1.7, +2.6] 638.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M12 Pool_Rent_HotPath 117711.9 +0.5% 95% CI [-0.7, +1.7] 1099.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M13 Setters_Suppression_Scope 115.0 -4.2% 95% CI [-17.0, +8.5] 26.7 0.0% 95% CI [0.0, 0.0] ≈ within noise
M14 Dsl_Rebuild_Cascade 1781671.1 -13.5% 95% CI [-14.8, -12.2] 2915828.9 -23.5% 95% CI [-23.5, -23.5] ✅ improvement
C207 ChangeHandler_DpRead_Coalesce 1271.5 -1.9% 95% CI [-6.6, +2.8] 0.6 0.0% 95% CI [0.0, 0.0] ≈ within noise
OAlloc Optional_Element_Alloc 212.6 -0.4% 95% CI [-7.2, +6.5] 528.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
OUpdate Optional_Reconciler_Update 11924.2 +2.7% 95% CI [0.0, +5.3] 2676.3 +3.6% 95% CI [+3.6, +3.6] ⚠️ regression

Cross-framework reference (same StocksGrid workload)

Metric vanilla WinUI3¹ Rust windows-reactor² Reactor (this PR)
Renders/sec ↑ 3.18 4.71 2.64
Avg Reconcile (ms) ↓ n/a 19.9 137.1
Avg Diff (ms) ↓ n/a 17.2 124.3
Avg Memory (MB) ↓ 264.1 198.4 287.4

↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042993351.
Generated by .github/workflows/perf-compare.yml · PR 22952b2 vs main 023d879 · 2026-06-27T03:23:36Z · run log.

@azchohfi azchohfi closed this Jun 27, 2026
@azchohfi azchohfi deleted the temp-665-newhead branch June 27, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant