DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main#719
DO-NOT-MERGE: official #665-vs-M14 re-measure on post-#692 main#719azchohfi wants to merge 8 commits into
Conversation
Restores the steady-state ShallowEquals -> CanSkipUpdate fast path for interactive/selectable grid cells (it was defeated for any cell carrying an event handler or a fluent .Set chain) and removes the dominant per-cell, per-frame allocations in the Element/DSL layer. FLAGSHIP-1 (Element.cs): replace the "any handler present => never skip" rule in ModifiersEqual with ModifierCallbacksEqual, a per-slot ReferenceEquals over all 27 callback/gesture/drag-drop slots. Reference-stable handlers now skip; capturing closures still force Update. Per-slot reference equality (not a pure presence bitmask) is the safe predicate because modifier handlers dispatch through ModifierEventHandlerState.Current*, which is refreshed only on the non-skip Update path -- presence-only would strand a stale closure on a skipped cell. Also closes a latent hole where 21 of the slots were ignored. FLAGSHIP-2 (Element.cs): add SettersEqual<T> (length + element-wise ReferenceEquals) and route all 51 ShallowEquals arms through it instead of ReferenceEquals(Setters). The .Set helpers append to a fresh array each render ([.. Setters, configure]); element-wise compare lets an unchanged chain of non-capturing lambdas / method groups (compiler-cached statics) skip again. _emptyLayout/_emptyVisual sentinels; compare LayoutModifiers by exact record equality and VisualModifiers field-by-field (brushes via BrushesEqual). Also fixes latent skip bugs: RequestedTheme/logical-inset and Scale/Rotation/Translation/CenterPoint were previously not compared. SingleAttachedDictionary for the common one-attached-value case (one .Grid per cell), avoiding a full Dictionary<Type,object> alloc per cell per frame. Color via GetBrush(Color), replacing one new SolidColorBrush (a thread-affine WinRT DependencyObject) per Parse call. Single shared brush cache. Star-track loop; InterspersedGrid exact-sized arrays; FilterChildren two-pass count+fill; shared static s_oneStar cross-axis track. #160 already satisfied (ShallowEquals is type-switch-first). Behavior preserved exactly (same render output, same events fire). Validated: 9693 unit tests pass; Modifier/Reconciler/Skip/Event/DataGrid/Flex selftests 0 failures; core lib Release x64 clean (0 AOT/trim warnings-as-errors). New xUnit coverage in PerfDiffSkipPathTests.cs and DeclarativeModifierTests.cs. Closes #664 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…LAGSHIP-2 safety revert Expands the diff/memory PR into the ElementExtensions fluent-modifier layer and makes the FLAGSHIP-2 setter comparison safe-conservative. FLAGSHIP-2 (safety revert): SettersEqual now returns true only for the same array instance (memoized element) or both-empty. Element-wise delegate-identity compare was unsafe: a .Set(...) is an apply-time imperative write whose effect can read mutable external state, so identity-equality does NOT imply the write is unnecessary. Setters keep the element on the Update path whenever any are present. LazyVStack/HStack scroll/repeater setter compares routed through SettersEqual (both-empty now compares equal). FLAGSHIP-1 handler ref-equality is unaffected (handlers dispatch later via ModifierEventHandlerState.Current*). #165/#157 — ModifyLayout/ModifyVisual bucket-merge entry points. ~36 pure-layout and pure-visual fluent modifiers (Margin/Padding/Width/Height/Min*/Max*/Align/ Opacity/Scale/Rotation/CenterPoint/CornerRadius/Background/Foreground/WithBorder/ Translation) merge their delta straight into the Layout/Visual slot instead of allocating a throwaway parent ElementModifiers (and its bucket shim) per chained call. Semantics are identical to Modify(el, new ElementModifiers { <field> }): proven against ElementModifiers.Merge and asserted by *_Chain_Equals_HandBuilt tests. #168 (brush cache wiring) — public BrushHelper.Parse(string) still returns a fresh, caller-owned brush (unchanged vs main). New internal ParseShared(string) returns a shared per-thread per-ARGB brush; Background/Foreground/WithBorder(string) hot paths route through it, so thousands of identical cells reuse one brush instead of allocating a WinRT DependencyObject per cell per render. Pure allocation win — the diff compares brushes structurally (BrushesEqual = color+opacity), so behavior is unchanged. #174 (ApplyStyle cache) — cache the OnMount delegate per style name so an unchanged .ApplyStyle("X") chain reuses one delegate instead of a fresh capturing closure + display class per render. Memory-only: OnMountAction is excluded from ModifiersEqual (runs at mount only), so this is not a skip-path change. Out of scope / deferred: #167 ThemeBindings (Reconciler ripple); #158/#161-164. Validation: core lib Release x64 0 warnings/0 errors; Reactor.Tests 9704 pass/0 fail/ 64 skip (PerfDiffSkipPathTests 31/31); selftests Modifier/Skip/Event/Brush/DataGrid/ Flex 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ernal; strengthen Dsl tests Addresses pr-review findings (multi-model cross-check + panel) on the draft: - correctness (HIGH, found by multi-model gpt-5.4 cross-check): UniformGrid wrote the positioned `.Grid(...)` cells back into the array returned by FilterChildren. After #173, FilterChildren's no-null fast path returns the caller's `items` array ALIASED (no copy), so an explicit `UniformGrid(orientation, myArray)` call corrupted the caller's array with Grid-positioned wrappers — a regression vs main (where FilterChildren always copied). UniformGrid now positions into its own Element[] and never mutates `filtered`. Other 16 FilterChildren consumers only pass the result as read-only Children, and InterspersedGrid already builds its own array, so this was the sole offender. Allocation-neutral on the hot (no-null) path vs main. + regression test asserting the source array is untouched. - api-ergonomics / packaging / alternative-solution / security (converged): make BrushHelper.GetBrush(Color) internal. It was a NEW public method (main never exposed it) returning a shared, mutable SolidColorBrush from a [ThreadStatic] cache — a public-surface + shared-mutable-brush footgun. Only ParseShared (same assembly) calls it; confirmed no external references repo-wide. Docs updated so the public entry point is Parse(string) (fresh, caller-owned). - test-coverage: #170 ForEach tests now assert rendered Content/index sequence and empty-input behavior (not just counts); added headless-safe BrushHelper.ParseColor coverage for the color path behind the shared-brush modifiers. Validation: core lib Release x64 0/0; Reactor.Tests 9711 pass/0 fail/64 skip (PerfDiffSkipPathTests 38/38); selftests Grid/Reconciler/Modifier/Skip/Event/Brush/ DataGrid/Flex 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…h sharing Review-pass fixes on PR #665 (pr-review skill + gpt-5.4 multi-model cross-check): H1 (correctness, HIGH) - ModifiersEqual let an element with a stable .OnTapped take the skip path without comparing OnUnmountAction. The skip path bypasses ApplyModifiers, which is what re-registers the latest teardown in the reconciler's _onUnmountActions on every Update; a changed .OnUnmount closure was therefore stranded and the stale first-render teardown would fire at unmount. Fixed by comparing OnUnmountAction by ReferenceEquals (mirrors the handler/.Ref slots). Zero grid-workload cost (cells have null teardown both renders). The plain-leaf no-handler case was a pre-existing latent bug on main; this also closes it (strictly more correct). Regression test added. H2 (multi-model, HIGH) - reverted #168 shared-brush sharing. main deliberately creates a fresh SolidColorBrush per call because a brush is a thread-affine DependencyObject with mutable Color/Opacity and cannot be safely shared across controls; an imperative .Set(tb => mutate tb.Background) would otherwise leak to every peer cell and poison the cache. Removed ParseShared/GetBrush/_brushCache; Background/Foreground/WithBorder(string) now use the fresh-brush Parse. Kept the safe ParseColor string->Color cache (pre-existing on main) and the #165/#157 ModifyVisual bucket-merge alloc win. Test-coverage (MEDIUM) - added a 27-slot ModifierCallbacksEqual sweep proving every event/gesture/drag-drop handler slot participates by reference (guards against a forgotten or mis-paired slot). Validation: src/Reactor Release x64 0 warn / 0 err (AOT-as-errors gate); tests/Reactor.Tests 9713 pass / 0 fail / 64 skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s header Copilot review (comment on 783030a) flagged the test file's header comment still claimed a '#168 brush-cache' validated by the selftest/AppTests tiers. After the shared-brush revert, BrushHelper caches only the parsed Color and Parse returns a fresh brush per call, so there is no shared brush instance to validate on a real thread. Reworded the header to match. Comment-only change; 9713 pass / 0 fail unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Locks the realistic DataGrid cell shape (reference-stable OnPointerPressed + OnTapped, all other routed slots null) at the ShallowEquals/element level: the cell stays skip-eligible across renders, but declines the skip the instant any secondary routed slot (OnDoubleTapped/OnGotFocus/OnLostFocus) changes identity, so that slot's ModifierEventHandlerState.Current* is refreshed instead of dispatching a stale closure. Complements the per-slot 27-way sweep, which already proves ModifierCallbacksEqual compares the full Current*-dispatched set. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ModifierCallbacksEqual added 6 gesture/drag slots (Pan/Pinch/Rotate/ LongPress/DragSource/DropTarget) that the historical diff never compared. Those dispatch through separate per-element gesture/drag state, not ModifierEventHandlerState.Current*, so comparing them forced Update where the framework previously skipped — re-arming an in-flight gesture mid-interaction. For a held long-press this re-registered the handler between its Began and Ended phases, so the released callback fired against a refreshed closure and double-dispatched (E2E GestureTests .Interactive_OnLongPress_FiresAfterHold: count 2 vs expected 1). Narrow the predicate to the 21 routed-input handlers (the actual grid-cell skip lever), matching main's gesture behavior exactly and preserving observable behavior. Grid cells use only routed handlers, so the perf win is unaffected. Tests: replace the 27-slot sweep with a 21-slot sweep + a behavior-preservation guard proving gesture/drag closures that change identity stay skip-eligible (so a future contributor can't silently re-add them and reintroduce the double-fire). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot review on a6adfdd flagged that AttachedEqual's foreach iterates the IReadOnlyDictionary interface, so SingleAttachedDictionary's yield-based enumerator allocates an IEnumerator per diff — partially defeating #155's per-cell allocation cut for the common one-.Grid-per-cell case. Special-case SingleAttachedDictionary on either side and compare the lone slot via TryGetValue (Count is already known equal), so the hot path is allocation free. Behavior-preserving: same equality result the foreach produced. Tests: AttachedEqual_SingleEntry_FastPath_Matches_Semantics (incl. asymmetric b-is-Single branch) + AttachedEqual_SingleEntry_DoesNotAllocate (10k calls, asserts < 2KB via GC.GetAllocatedBytesForCurrentThread). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf |
| foreach (var gc in group.Children) | ||
| { | ||
| if (gc is not null and not EmptyElement) | ||
| count++; | ||
| } |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 2.55 | 2.47 | -1.9% 95% CI [-8.2, +4.4] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 140.9 | 141.3 | +1.4% 95% CI [-2.5, +5.4] | ≈ within noise |
| Avg Diff (ms) ↓ | 124.5 | 127.6 | +3.0% 95% CI [-1.1, +7.2] | ≈ within noise |
| Avg Memory (MB) ↓ | 288.5 | 285.7 | -0.4% 95% CI [-1.3, +0.6] | ≈ within noise |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 15.81 | 15.65 | -2.1% 95% CI [-9.5, +5.4] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 40.3 | 38.7 | +2.7% 95% CI [-8.1, +13.4] | ≈ within noise |
| Avg Diff (ms) ↓ | 37.8 | 36.2 | +2.8% 95% CI [-8.2, +13.8] | ≈ within noise |
| Avg Memory (MB) ↓ | 266.8 | 266.5 | -0.1% 95% CI [-0.8, +0.5] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 5766814 | 5836460 | +0.7% 95% CI [-0.8, +2.1] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 285.71 | 262.93 | -10.8% 95% CI [-22.1, +0.4] | ≈ within noise |
Keyed-list workload (StressPerf.KeyedList, --percent 50)
A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 20.03 | 20.12 | +1.0% 95% CI [-2.0, +4.0] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 16.6 | 16.5 | -1.9% 95% CI [-4.1, +0.2] | ≈ within noise |
| Avg Diff (ms) ↓ | 16.5 | 16.3 | -1.8% 95% CI [-4.0, +0.3] | ≈ within noise |
| Avg Memory (MB) ↓ | 168.9 | 168.9 | 0.0% 95% CI [-0.9, +0.9] | ≈ within noise |
Allocation (keyed-list) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 314713 | 313059 | -0.6% 95% CI [-0.9, -0.2] | ✅ improvement |
| Gen0 GC / 1k renders ↓ | 15.19 | 15.12 | +0.7% 95% CI [-11.1, +12.4] | ≈ within noise |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
152070.4 | -1.1% 95% CI [-4.3, +2.2] | 1140.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
111143.4 | +2.1% 95% CI [-2.6, +6.8] | 3383.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
234028.1 | -0.3% 95% CI [-2.8, +2.2] | 8485.6 | -1.6% 95% CI [-3.2, -0.1] | ≈ within noise |
M4 Dispatch_Switch_Cold |
115800.3 | -0.8% 95% CI [-2.4, +0.8] | 1767.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
119592.2 | -0.5% 95% CI [-6.9, +6.0] | 1766.0 | +0.8% 95% CI [-1.3, +2.8] | ≈ within noise |
M6 Dispatch_ExternalType |
91811.4 | +1.8% 95% CI [-2.8, +6.4] | 987.6 | +1.5% 95% CI [-2.2, +5.2] | ≈ within noise |
M7 Update_NoChange |
56826.5 | -2.0% 95% CI [-2.8, -1.1] | 452.1 | -2.0% 95% CI [-12.6, +8.6] | ≈ within noise |
M8 Update_OneLeafChanged |
42156.9 | +0.2% 95% CI [-0.7, +1.2] | 536.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2920899.4 | -1.4% 95% CI [-3.6, +0.7] | 184278.1 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
91718.8 | +2.3% 95% CI [+0.1, +4.4] | 3095.2 | 0.0% 95% CI [-0.1, +0.1] | ≈ within noise |
M11 ModifierEHS_Frequency |
47625.3 | +0.5% 95% CI [-0.5, +1.6] | 638.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M12 Pool_Rent_HotPath |
118637.6 | +0.1% 95% CI [-1.5, +1.7] | 1099.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
160.1 | +3.0% 95% CI [-6.3, +12.3] | 26.7 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M14 Dsl_Rebuild_Cascade |
1899232.5 | -13.2% 95% CI [-14.6, -11.9] | 2915828.9 | -23.5% 95% CI [-23.5, -23.5] | ✅ improvement |
C207 ChangeHandler_DpRead_Coalesce |
1433.9 | +1.4% 95% CI [-2.5, +5.3] | 0.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
222.5 | -2.6% 95% CI [-8.2, +2.9] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
12252.4 | +4.0% 95% CI [+2.1, +6.0] | 2676.3 | +3.6% 95% CI [+3.6, +3.6] |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 3.21 | 4.75 | 2.47 |
| Avg Reconcile (ms) ↓ | n/a | 20.0 | 141.3 |
| Avg Diff (ms) ↓ | n/a | 17.7 | 127.6 |
| Avg Memory (MB) ↓ | 264.0 | 197.3 | 285.7 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042987674.
Generated by .github/workflows/perf-compare.yml · PR bd16d0e vs main 023d879 · 2026-06-27T01:43:17Z · run log.
DO NOT MERGE. Throwaway measurement PR only.
Purpose: official /perf re-measure of #665 against the M14 Dsl_Rebuild_Cascade micro, with the baseline now INCLUDING #692 (flagship reconciler self-merge guard, merged to main as c497c94) + M14 (#711). This is #665's 8 commits rebased cleanly onto current origin/main. Build 0/0, Reactor.Tests 9754 pass. Origin #665 stays pristine; this branch is deleted after the run.