DO-NOT-MERGE: #665-vs-M14 re-measure (new head e746f996, post-#692)#722
DO-NOT-MERGE: #665-vs-M14 re-measure (new head e746f996, post-#692)#722azchohfi wants to merge 9 commits into
Conversation
Restores the steady-state ShallowEquals -> CanSkipUpdate fast path for interactive/selectable grid cells (it was defeated for any cell carrying an event handler or a fluent .Set chain) and removes the dominant per-cell, per-frame allocations in the Element/DSL layer. FLAGSHIP-1 (Element.cs): replace the "any handler present => never skip" rule in ModifiersEqual with ModifierCallbacksEqual, a per-slot ReferenceEquals over all 27 callback/gesture/drag-drop slots. Reference-stable handlers now skip; capturing closures still force Update. Per-slot reference equality (not a pure presence bitmask) is the safe predicate because modifier handlers dispatch through ModifierEventHandlerState.Current*, which is refreshed only on the non-skip Update path -- presence-only would strand a stale closure on a skipped cell. Also closes a latent hole where 21 of the slots were ignored. FLAGSHIP-2 (Element.cs): add SettersEqual<T> (length + element-wise ReferenceEquals) and route all 51 ShallowEquals arms through it instead of ReferenceEquals(Setters). The .Set helpers append to a fresh array each render ([.. Setters, configure]); element-wise compare lets an unchanged chain of non-capturing lambdas / method groups (compiler-cached statics) skip again. _emptyLayout/_emptyVisual sentinels; compare LayoutModifiers by exact record equality and VisualModifiers field-by-field (brushes via BrushesEqual). Also fixes latent skip bugs: RequestedTheme/logical-inset and Scale/Rotation/Translation/CenterPoint were previously not compared. SingleAttachedDictionary for the common one-attached-value case (one .Grid per cell), avoiding a full Dictionary<Type,object> alloc per cell per frame. Color via GetBrush(Color), replacing one new SolidColorBrush (a thread-affine WinRT DependencyObject) per Parse call. Single shared brush cache. Star-track loop; InterspersedGrid exact-sized arrays; FilterChildren two-pass count+fill; shared static s_oneStar cross-axis track. #160 already satisfied (ShallowEquals is type-switch-first). Behavior preserved exactly (same render output, same events fire). Validated: 9693 unit tests pass; Modifier/Reconciler/Skip/Event/DataGrid/Flex selftests 0 failures; core lib Release x64 clean (0 AOT/trim warnings-as-errors). New xUnit coverage in PerfDiffSkipPathTests.cs and DeclarativeModifierTests.cs. Closes #664 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LAGSHIP-2 safety revert Expands the diff/memory PR into the ElementExtensions fluent-modifier layer and makes the FLAGSHIP-2 setter comparison safe-conservative. FLAGSHIP-2 (safety revert): SettersEqual now returns true only for the same array instance (memoized element) or both-empty. Element-wise delegate-identity compare was unsafe: a .Set(...) is an apply-time imperative write whose effect can read mutable external state, so identity-equality does NOT imply the write is unnecessary. Setters keep the element on the Update path whenever any are present. LazyVStack/HStack scroll/repeater setter compares routed through SettersEqual (both-empty now compares equal). FLAGSHIP-1 handler ref-equality is unaffected (handlers dispatch later via ModifierEventHandlerState.Current*). #165/#157 — ModifyLayout/ModifyVisual bucket-merge entry points. ~36 pure-layout and pure-visual fluent modifiers (Margin/Padding/Width/Height/Min*/Max*/Align/ Opacity/Scale/Rotation/CenterPoint/CornerRadius/Background/Foreground/WithBorder/ Translation) merge their delta straight into the Layout/Visual slot instead of allocating a throwaway parent ElementModifiers (and its bucket shim) per chained call. Semantics are identical to Modify(el, new ElementModifiers { <field> }): proven against ElementModifiers.Merge and asserted by *_Chain_Equals_HandBuilt tests. #168 (brush cache wiring) — public BrushHelper.Parse(string) still returns a fresh, caller-owned brush (unchanged vs main). New internal ParseShared(string) returns a shared per-thread per-ARGB brush; Background/Foreground/WithBorder(string) hot paths route through it, so thousands of identical cells reuse one brush instead of allocating a WinRT DependencyObject per cell per render. Pure allocation win — the diff compares brushes structurally (BrushesEqual = color+opacity), so behavior is unchanged. #174 (ApplyStyle cache) — cache the OnMount delegate per style name so an unchanged .ApplyStyle("X") chain reuses one delegate instead of a fresh capturing closure + display class per render. Memory-only: OnMountAction is excluded from ModifiersEqual (runs at mount only), so this is not a skip-path change. Out of scope / deferred: #167 ThemeBindings (Reconciler ripple); #158/#161-164. Validation: core lib Release x64 0 warnings/0 errors; Reactor.Tests 9704 pass/0 fail/ 64 skip (PerfDiffSkipPathTests 31/31); selftests Modifier/Skip/Event/Brush/DataGrid/ Flex 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ernal; strengthen Dsl tests Addresses pr-review findings (multi-model cross-check + panel) on the draft: - correctness (HIGH, found by multi-model gpt-5.4 cross-check): UniformGrid wrote the positioned `.Grid(...)` cells back into the array returned by FilterChildren. After #173, FilterChildren's no-null fast path returns the caller's `items` array ALIASED (no copy), so an explicit `UniformGrid(orientation, myArray)` call corrupted the caller's array with Grid-positioned wrappers — a regression vs main (where FilterChildren always copied). UniformGrid now positions into its own Element[] and never mutates `filtered`. Other 16 FilterChildren consumers only pass the result as read-only Children, and InterspersedGrid already builds its own array, so this was the sole offender. Allocation-neutral on the hot (no-null) path vs main. + regression test asserting the source array is untouched. - api-ergonomics / packaging / alternative-solution / security (converged): make BrushHelper.GetBrush(Color) internal. It was a NEW public method (main never exposed it) returning a shared, mutable SolidColorBrush from a [ThreadStatic] cache — a public-surface + shared-mutable-brush footgun. Only ParseShared (same assembly) calls it; confirmed no external references repo-wide. Docs updated so the public entry point is Parse(string) (fresh, caller-owned). - test-coverage: #170 ForEach tests now assert rendered Content/index sequence and empty-input behavior (not just counts); added headless-safe BrushHelper.ParseColor coverage for the color path behind the shared-brush modifiers. Validation: core lib Release x64 0/0; Reactor.Tests 9711 pass/0 fail/64 skip (PerfDiffSkipPathTests 38/38); selftests Grid/Reconciler/Modifier/Skip/Event/Brush/ DataGrid/Flex 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…h sharing Review-pass fixes on PR #665 (pr-review skill + gpt-5.4 multi-model cross-check): H1 (correctness, HIGH) - ModifiersEqual let an element with a stable .OnTapped take the skip path without comparing OnUnmountAction. The skip path bypasses ApplyModifiers, which is what re-registers the latest teardown in the reconciler's _onUnmountActions on every Update; a changed .OnUnmount closure was therefore stranded and the stale first-render teardown would fire at unmount. Fixed by comparing OnUnmountAction by ReferenceEquals (mirrors the handler/.Ref slots). Zero grid-workload cost (cells have null teardown both renders). The plain-leaf no-handler case was a pre-existing latent bug on main; this also closes it (strictly more correct). Regression test added. H2 (multi-model, HIGH) - reverted #168 shared-brush sharing. main deliberately creates a fresh SolidColorBrush per call because a brush is a thread-affine DependencyObject with mutable Color/Opacity and cannot be safely shared across controls; an imperative .Set(tb => mutate tb.Background) would otherwise leak to every peer cell and poison the cache. Removed ParseShared/GetBrush/_brushCache; Background/Foreground/WithBorder(string) now use the fresh-brush Parse. Kept the safe ParseColor string->Color cache (pre-existing on main) and the #165/#157 ModifyVisual bucket-merge alloc win. Test-coverage (MEDIUM) - added a 27-slot ModifierCallbacksEqual sweep proving every event/gesture/drag-drop handler slot participates by reference (guards against a forgotten or mis-paired slot). Validation: src/Reactor Release x64 0 warn / 0 err (AOT-as-errors gate); tests/Reactor.Tests 9713 pass / 0 fail / 64 skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s header Copilot review (comment on 783030a) flagged the test file's header comment still claimed a '#168 brush-cache' validated by the selftest/AppTests tiers. After the shared-brush revert, BrushHelper caches only the parsed Color and Parse returns a fresh brush per call, so there is no shared brush instance to validate on a real thread. Reworded the header to match. Comment-only change; 9713 pass / 0 fail unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Locks the realistic DataGrid cell shape (reference-stable OnPointerPressed + OnTapped, all other routed slots null) at the ShallowEquals/element level: the cell stays skip-eligible across renders, but declines the skip the instant any secondary routed slot (OnDoubleTapped/OnGotFocus/OnLostFocus) changes identity, so that slot's ModifierEventHandlerState.Current* is refreshed instead of dispatching a stale closure. Complements the per-slot 27-way sweep, which already proves ModifierCallbacksEqual compares the full Current*-dispatched set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ModifierCallbacksEqual added 6 gesture/drag slots (Pan/Pinch/Rotate/ LongPress/DragSource/DropTarget) that the historical diff never compared. Those dispatch through separate per-element gesture/drag state, not ModifierEventHandlerState.Current*, so comparing them forced Update where the framework previously skipped — re-arming an in-flight gesture mid-interaction. For a held long-press this re-registered the handler between its Began and Ended phases, so the released callback fired against a refreshed closure and double-dispatched (E2E GestureTests .Interactive_OnLongPress_FiresAfterHold: count 2 vs expected 1). Narrow the predicate to the 21 routed-input handlers (the actual grid-cell skip lever), matching main's gesture behavior exactly and preserving observable behavior. Grid cells use only routed handlers, so the perf win is unaffected. Tests: replace the 27-slot sweep with a 21-slot sweep + a behavior-preservation guard proving gesture/drag closures that change identity stay skip-eligible (so a future contributor can't silently re-add them and reintroduce the double-fire). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot review on a6adfdd flagged that AttachedEqual's foreach iterates the IReadOnlyDictionary interface, so SingleAttachedDictionary's yield-based enumerator allocates an IEnumerator per diff — partially defeating #155's per-cell allocation cut for the common one-.Grid-per-cell case. Special-case SingleAttachedDictionary on either side and compare the lone slot via TryGetValue (Count is already known equal), so the hot path is allocation free. Behavior-preserving: same equality result the foreach produced. Tests: AttachedEqual_SingleEntry_FastPath_Matches_Semantics (incl. asymmetric b-is-Single branch) + AttachedEqual_SingleEntry_DoesNotAllocate (10k calls, asserts < 2KB via GC.GetAllocatedBytesForCurrentThread). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two scoped cache-polish fixes from the dual review (Copilot + pr-review skill):
FIX A (Copilot, BrushHelper.cs:16): construct _colorCache with
StringComparer.OrdinalIgnoreCase so colors differing only in casing
("Red"/"red", "#FF0000"/"#ff0000") dedupe to one entry instead of creating
duplicates + re-running the factory per distinct casing. Safe: ParseColor
lowercases named colors before the switch and ParseHex is case-insensitive,
so case-folded keying yields identical results. ParseColor switch unchanged.
FIX B (pr-review, ElementExtensions.cs StyleApplier): bound _styleApplierCache
with a 256-entry cap as defense-in-depth against a pathological data-driven
caller passing unbounded distinct style names. A lock-free TryGetValue
fast-path preserves the #174 zero-alloc steady state (Count is only touched
on a miss); past the cap it falls back to the pre-#174 per-call delegate, so
correctness is unchanged and the cache stays bounded.
Test (BrushHelperTests): ParseColor_IsCaseInsensitive (value equality across
casing) + ParseColor_Cache_Dedupes_Across_Casing (allocation teeth: 500
distinct casings of one name collapse to a single entry => ~0 alloc).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf |
| foreach (var gc in group.Children) | ||
| { | ||
| if (gc is not null and not EmptyElement) | ||
| count++; | ||
| } |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 2.68 | 2.64 | -1.7% 95% CI [-6.7, +3.2] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 136.9 | 137.1 | +3.0% 95% CI [-2.5, +8.5] | ≈ within noise |
| Avg Diff (ms) ↓ | 123.7 | 124.3 | +4.9% 95% CI [-2.1, +11.9] | ≈ within noise |
| Avg Memory (MB) ↓ | 289.5 | 287.4 | -0.7% 95% CI [-1.5, 0.0] | ≈ within noise |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.15 | 15.76 | -0.9% 95% CI [-6.4, +4.6] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 35.9 | 34.5 | -0.5% 95% CI [-8.0, +7.0] | ≈ within noise |
| Avg Diff (ms) ↓ | 33.7 | 32.9 | +0.1% 95% CI [-8.2, +8.4] | ≈ within noise |
| Avg Memory (MB) ↓ | 267.4 | 267.0 | 0.0% 95% CI [-0.4, +0.4] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 5718117 | 5781622 | +1.1% 95% CI [-0.4, +2.5] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 275.86 | 231.80 | -12.8% 95% CI [-23.6, -2.0] | ✅ improvement |
Keyed-list workload (StressPerf.KeyedList, --percent 50)
A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 20.28 | 21.01 | +3.3% 95% CI [+0.4, +6.2] | ✅ improvement |
| Avg Reconcile (ms) ↓ | 16.1 | 15.7 | -1.4% 95% CI [-3.0, +0.1] | ≈ within noise |
| Avg Diff (ms) ↓ | 15.9 | 15.5 | -1.4% 95% CI [-2.9, +0.1] | ≈ within noise |
| Avg Memory (MB) ↓ | 169.1 | 169.1 | +0.3% 95% CI [-0.5, +1.1] | ≈ within noise |
Allocation (keyed-list) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 315013 | 313581 | -0.4% 95% CI [-0.8, -0.1] | ✅ improvement |
| Gen0 GC / 1k renders ↓ | 16.70 | 18.65 | +12.7% 95% CI [+4.1, +21.2] |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
149215.5 | +0.5% 95% CI [-0.7, +1.8] | 1140.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
107890.3 | -0.4% 95% CI [-2.6, +1.8] | 3383.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
216795.0 | +1.0% 95% CI [-2.4, +4.4] | 8385.9 | +0.6% 95% CI [-1.8, +3.0] | ≈ within noise |
M4 Dispatch_Switch_Cold |
104859.4 | +3.1% 95% CI [-3.4, +9.6] | 1767.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
106867.9 | +4.7% 95% CI [-3.8, +13.1] | 1766.0 | -0.7% 95% CI [-1.8, +0.4] | ≈ within noise |
M6 Dispatch_ExternalType |
90495.4 | -1.8% 95% CI [-6.2, +2.6] | 987.6 | -1.3% 95% CI [-3.2, +0.6] | ≈ within noise |
M7 Update_NoChange |
56453.0 | -2.8% 95% CI [-4.1, -1.4] | 452.1 | +1.8% 95% CI [-2.2, +5.9] | ≈ within noise |
M8 Update_OneLeafChanged |
41747.3 | -0.5% 95% CI [-1.2, +0.3] | 536.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2858939.0 | 0.0% 95% CI [-1.0, +1.0] | 184278.1 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
87070.4 | -1.3% 95% CI [-2.9, +0.4] | 3095.2 | 0.0% 95% CI [0.0, +0.1] | ≈ within noise |
M11 ModifierEHS_Frequency |
46262.7 | +0.4% 95% CI [-1.7, +2.6] | 638.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M12 Pool_Rent_HotPath |
117711.9 | +0.5% 95% CI [-0.7, +1.7] | 1099.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
115.0 | -4.2% 95% CI [-17.0, +8.5] | 26.7 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M14 Dsl_Rebuild_Cascade |
1781671.1 | -13.5% 95% CI [-14.8, -12.2] | 2915828.9 | -23.5% 95% CI [-23.5, -23.5] | ✅ improvement |
C207 ChangeHandler_DpRead_Coalesce |
1271.5 | -1.9% 95% CI [-6.6, +2.8] | 0.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
212.6 | -0.4% 95% CI [-7.2, +6.5] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
11924.2 | +2.7% 95% CI [0.0, +5.3] | 2676.3 | +3.6% 95% CI [+3.6, +3.6] |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 3.18 | 4.71 | 2.64 |
| Avg Reconcile (ms) ↓ | n/a | 19.9 | 137.1 |
| Avg Diff (ms) ↓ | n/a | 17.2 | 124.3 |
| Avg Memory (MB) ↓ | 264.1 | 198.4 | 287.4 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042993351.
Generated by .github/workflows/perf-compare.yml · PR 22952b2 vs main 023d879 · 2026-06-27T03:23:36Z · run log.
DO NOT MERGE. Throwaway measurement PR only.
Purpose: official /perf re-measure of #665 at its NEW head (e746f99 — adds the 2 cache-polish fixes: BrushHelper OrdinalIgnoreCase color cache + bounded ElementExtensions style-applier cache) against the M14 Dsl_Rebuild_Cascade micro, baseline = current origin/main (includes #692 flagship + M14). #665's 9 commits rebased cleanly onto main. Build 0/0, Reactor.Tests 9756 pass. Origin #665 stays pristine; this branch is deleted after the run.