Skip to content

DO-NOT-MERGE: re-validate #666 on post-flagship main#727

Closed
azchohfi wants to merge 8 commits into
mainfrom
sweep-666
Closed

DO-NOT-MERGE: re-validate #666 on post-flagship main#727
azchohfi wants to merge 8 commits into
mainfrom
sweep-666

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

Throwaway re-validation of #666 on origin/main b9ace1e (=#692+M14+#665+#649). DO NOT MERGE; origin #666 pristine.

azchohfi and others added 8 commits June 26, 2026 21:45
Reduces redundant WinRT/COM-RCW boundary crossings and per-element
allocations on the diff/reconcile/patch hot path (data-grid stress
workload, StocksGrid). Targets the dominant Reconcile gap.

Reconciler.cs / Update.cs / Mount.cs:
- #16 UnmountRecursive: read ReactorAttached.State once (was 5x GetValue).
- #17 UnmountAndCollect: read element tag once (was 5x GetElementTag).
- #14 Update: coalesce repeated `is FrameworkElement` RCW type-tests to one.
- #15 Cache the component rerender delegate on ComponentNode; recreate
  only when the requestRerender identity changes.
- #18 UpdateDefaultAutomationName: ref/ordinal short-circuit before the
  AutomationProperties.GetName COM read.
- #19 ApplyModifiers accepts null = clear-all (no fresh ElementModifiers).
- #20 PopulateDirtyAncestorPath iterates a tracked self-triggered set
  instead of the whole _componentNodes dict.
- #21 UnmountAndPool reuses a borrowed scratch list (reentrant calls fall
  back to a fresh list).
- #22 RefreshRealizedContainers reuses a cleared field list.
- #23 ApplyStyleToElement ref-equal + ActualTheme guard; ApplyThemeBindings
  gated on the ShallowEquals fast path.
- #24 BuildCacheKey: thread-static StringBuilder + pair buffer, single pass.
- #25 HasConsumedContextChanged compares via Context<T>/EqualityComparer<T>
  (no boxing of value-type context).
- #26 ApplyResourceOverrides: no ToHashSet/ToArray, ref-equal fast path.
- #27 WireReferenceListEdge: static loop, no per-edge List/closures.
- #28 ApplyStaggerDelays: hoisted static animation-key array.
- #29 PushContextDisposable: single-pair push overload, pooled PopOnDispose.

Reconciler.KeyedItemsBinding.cs:
- #39 ref-equality cache; only realloc + SetItemViewSource on change.

Reconciler.Mount.cs:
- #40 LayoutFootgunDetector gated by GridElement pre-filter.

AccessibilityScanner.cs (debug-only scan path):
- #62 compute GetChildren once in Walk, pass into Push.
- #63 cache sibling-text array in ScanContext, invalidate on Push.
- #64 BuildContext single-pass with pooled scratch (no LINQ chains).
- #65 IsLikelyEmoji explicit span loop (no LINQ).
- #66 CheckTabIndexGaps sorts a rented buffer (no OrderBy/ToList).

Behavior preserved exactly: coalesced COM reads observe the same values
at the same times; unmount/cleanup ordering unchanged; scanner emits
identical findings. AOT/trim clean (no new reflection).

Adds tests/Reactor.Tests/ReconcilerInteropPerfTests.cs (11 tests:
rerender-delegate caching, context-change compare, cache-key reuse,
single-pair context push, scanner output parity).

Closes #654

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-pass fixes for PR #666 (issue #654), all behavior-preserving and
within the assigned reconcile-path files:

- Context #25 (H1): CurrentValueEquals now reproduces the old object.Equals
  semantics EXACTLY — reference types use the virtual object.Equals (so an
  IEquatable<T> type without an object.Equals override keeps reference-equality
  change detection instead of collapsing distinct instances and skipping a
  required context rerender), value types keep the no-box EqualityComparer<T>
  path but guard the unbox with `is T` so a same-slot context-type swap returns
  false (as before) rather than throwing InvalidCastException.
- #26 ApplyResourceOverrides (CO2): removal is driven by `managed` vs new
  overrides, not an old-vs-new ref comparison; the shallow-skip fast path calls
  this with old===new yet still must reconcile away stale managed keys.
- #20 self-triggered set (CO3): add an Unmounted tombstone so a cached rerender
  closure firing after teardown can't resurrect the node into _selfTriggeredNodes
  (which the dirty-path pass would never clear again).
- #16/#17 unmount tag re-read (CO4): the V1/type unmount dispatch re-reads the
  attached tag AFTER the user .OnUnmount/RunCleanups callbacks (which may detach
  state via the public APIs), matching the pre-coalesce per-site reads; pre-callback
  branches still share the single top-of-method read.
- #23 ClearThemeBindings (AS1): clear the companion effective-theme marker DP
  alongside the style marker so the pair stays consistent across pool/reuse.

Tests (tests/Reactor.Tests/ReconcilerInteropPerfTests.cs):
- #15 rerender-delegate caching: identity reuse vs source-identity recreate;
  invocation marks SelfTriggered and bubbles to the source; post-unmount tombstone
  refuses a late self-trigger (#20).
- #25 (H1) reference-IEquatable keeps reference-equality; value-type type mismatch
  returns false without throwing.
- Scanner output parity pinned to exact literal messages (#62-#66).

GetOrCreateComponentRerender made internal (InternalsVisibleTo Reactor.Tests) as a
test seam. Validation: dotnet build src/Reactor -c Release 0/0; tests 9687 passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Guards the boxed-Nullable<T> edge raised in Copilot review: a non-null int?
boxes to a boxed underlying int, and the value-type `is T` guard (T = int?)
still matches it, so an unchanged nullable context value is not spuriously
treated as changed. Proves the H1 hybrid CurrentValueEquals preserves the
old object.Equals nullable semantics against production code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds Assert.NotNull(wrapped) before invoking the cached rerender delegate in
the two CachedRerender_* tests. Asserts the GetOrCreateComponentRerender
contract (never returns null) and resolves a CodeQL "useless assignment"
false positive (its dataflow did not model the subsequent delegate
invocation as a read). Behavior-neutral test hardening.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-pass fixes from the pr-review skill (correctness + alternative-solution +
test-coverage dimensions) and the multi-model cross-check, all within the existing
reconcile interop scope. No behavior change:

- Context<T>.CurrentValueEquals (CO-H1, high): the value-type arm routed through
  EqualityComparer<T>.Default, which dispatches to IEquatable<T> and diverges from
  the original object.Equals for structs whose IEquatable<T>.Equals disagrees with
  their object.Equals override. Unify both arms to current.Equals(lastBoxed) — for
  two non-null operands that is exactly object.Equals — and keep the reference-type
  ReferenceEquals short-circuit so same-instance / non-reflexive cases stay exact.
  Still avoids boxing the freshly-read value for primitives/Nullable<T>. Drops the
  now-unused System.Collections.Generic using.
- BuildCacheKey (AS-L1): release the thread-static scratch pair array when an outlier
  element grows it past a small cap, so it is not pinned for the thread's lifetime.
- RefreshRealizedContainers (AS-L2): only retain the realized-scratch list when its
  capacity stays modest; an oversized one-off refresh is collected, not pinned.
- Tests: assert scanner emission ORDER (the parity test sorts first), plus value-type
  IEquatable-vs-object.Equals divergence and the reference-equality fast-path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses a Copilot review comment on #666: a cached rerender closure (perf #15)
handed to a long-lived or cross-thread subscriber can outlive unmount and keep its
ComponentNode alive. Drop ComponentNode.Control in ClearSelfTriggered(unmounting:true)
so a leaked closure can't also pin the unmounted visual subtree. The dirty-path pass
(perf #20) is the only reader of Control and never visits an unmounted node, so this is
observably a no-op — it only releases the subtree for GC, reinforcing the #20 tombstone.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eview)

Addresses three Copilot re-review comments on the interop-perf PR, all the
same scratch-hygiene pattern already applied to t_cacheKeyPairs (#24) and
_realizedScratch (#22): a one-off outlier input can leave a thread-static /
instance scratch buffer holding an oversized backing array (or stale element
refs) for the rest of the thread/reconciler lifetime. Behavior-neutral — the
computed outputs are unchanged, so existing parity tests cover them.

- AccessibilityScanner.BuildContext: wrap the t_buildCtxChildren populate/use
  in try/finally so a mid-population exception can't leave element refs pinned
  on the thread-static list until the next call. Success-path timing of the
  clear is unchanged.
- BuildCacheKey: mirror the pairs-array retain-cap for the thread-static
  StringBuilder (t_cacheKeySb) — release it after a one-off unusually large key
  instead of pinning the grown backing buffer.
- UnmountAndPool: mirror the _realizedScratch capacity cap for
  _unmountPoolScratch — only retain the scratch list when it's still modestly
  sized, so a one-off large-subtree unmount doesn't pin an oversized array.

Validation: dotnet build src/Reactor -c Release -> 0 warnings / 0 errors;
dotnet test tests/Reactor.Tests -> 9691 passed / 0 failed / 64 skipped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…heck)

A multi-model (gpt-5.4) cross-check of the four reconcile/scan scratch retain-caps
flagged the BuildCacheKey site: at the degenerate combined-cap case (an element with
>64 theme bindings AND a >1024-char key rendered every frame), discarding the
thread-static buffers reallocates a KeyValuePair[] (2x the size of the pre-#24
baseline's string[]) plus a fresh StringBuilder — i.e. WORSE than origin/main, which
the perf PR must never be. The size-cap's only job was bounding memory retention, but
Array.Clear(pairs) / sb.Clear() already drop the referenced strings/ThemeRefs, so
nothing is pinned regardless. Removing the caps makes BuildCacheKey unconditional-
retain, which is strictly cheaper than baseline in every case (the dominant
StringBuilder backing buffer is reused, not realloc'd) and provably never worse.

The other two caps are unchanged — the cross-check CONFIRMED them: _realizedScratch
and _unmountPoolScratch discard to EXACTLY the origin/main allocation shape
(List<UIElement> / List<FrameworkElement>), so they cannot be worse than baseline, and
both sit well above the 50-row StocksGrid steady-state high-water (~64 <= 256) so they
never trim-and-regrow on the stress workload.

Behavior-neutral: BuildCacheKey return values are unchanged (only buffer-retention
differs), so existing parity tests cover it.

Validation: dotnet build src/Reactor -c Release -> 0 warnings / 0 errors;
dotnet test tests/Reactor.Tests -> 9691 passed / 0 failed / 64 skipped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@azchohfi

Copy link
Copy Markdown
Collaborator Author

/perf


// Same instance recorded and current: object.Equals(x, x) == true via the
// ReferenceEquals short-circuit even though x.Equals(x) == false. Match it exactly.
Assert.Equal(object.Equals(x, x), NonReflexiveCtx.CurrentValueEquals(scope, x));
Comment on lines +740 to +746
foreach (var c in GetChildren(el))
{
if (c is null or EmptyElement) continue;
children.Add(c);
if (childText is null && c is TextBlockElement tb)
childText = tb.Content;
}
@github-actions

Copy link
Copy Markdown

⚡ Reactor perf comparison

Workload: StressPerf.ReactorOptimized StocksGrid · --percent 50 --duration 10 · x64 Release · median of 12 paired runs (2 warmup dropped); Δ is the mean change with a 95% CI · PR head and main built and run interleaved on the same runner.

Regression vs main baseline

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 2.62 2.74 +5.3% 95% CI [-8.7, +19.2] ≈ within noise
Avg Reconcile (ms) ↓ 136.5 133.9 -1.7% 95% CI [-11.8, +8.3] ≈ within noise
Avg Diff (ms) ↓ 124.2 121.4 -1.3% 95% CI [-11.5, +8.9] ≈ within noise
Avg Memory (MB) ↓ 284.7 287.1 +0.7% 95% CI [-0.4, +1.9] ≈ within noise

Low-mutation skip-floor (--percent 0)

At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 16.46 16.20 -1.1% 95% CI [-4.6, +2.3] ≈ within noise
Avg Reconcile (ms) ↓ 32.4 31.4 -2.9% 95% CI [-9.5, +3.7] ≈ within noise
Avg Diff (ms) ↓ 30.5 29.6 -3.0% 95% CI [-9.8, +3.7] ≈ within noise
Avg Memory (MB) ↓ 268.7 268.7 0.0% 95% CI [-0.4, +0.3] ≈ within noise

Allocation (Reactor) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 5758965 5722942 -0.4% 95% CI [-2.2, +1.4] ≈ within noise
Gen0 GC / 1k renders ↓ 226.50 227.78 +0.2% 95% CI [-11.9, +12.3] ≈ within noise

Keyed-list workload (StressPerf.KeyedList, --percent 50)

A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyedReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.

Metric main (baseline) This PR Δ (95% CI) Status
Renders/sec ↑ 23.11 23.15 -0.3% 95% CI [-2.8, +2.1] ≈ within noise
Avg Reconcile (ms) ↓ 14.0 13.9 -0.2% 95% CI [-2.0, +1.6] ≈ within noise
Avg Diff (ms) ↓ 13.8 13.7 -0.5% 95% CI [-2.3, +1.3] ≈ within noise
Avg Memory (MB) ↓ 169.7 169.8 -0.2% 95% CI [-0.8, +0.4] ≈ within noise

Allocation (keyed-list) — lower is better

Metric main (baseline) This PR Δ (95% CI) Status
Alloc bytes/render ↓ 312927 307416 -1.7% 95% CI [-2.0, -1.5] ✅ improvement
Gen0 GC / 1k renders ↓ 17.13 16.91 -1.2% 95% CI [-5.6, +3.2] ≈ within noise

Reconciler micro-benchmarks (PerfBench.ControlModel)

Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.

Bench main ns/op Δ ns (95% CI) main B/op Δ alloc (95% CI) Status
M1 Mount_Leaf_NoCallback 147036.4 -1.2% 95% CI [-2.2, -0.2] 1140.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M2 Mount_Leaf_OneCallback 104137.5 +1.0% 95% CI [-3.6, +5.5] 3383.3 0.0% 95% CI [0.0, 0.0] ≈ within noise
M3 Mount_Leaf_ThreeCallbacks 204283.1 +0.7% 95% CI [-2.5, +4.0] 8460.3 +0.3% 95% CI [-1.1, +1.7] ≈ within noise
M4 Dispatch_Switch_Cold 98706.6 -0.5% 95% CI [-2.7, +1.8] 1767.8 0.0% 95% CI [0.0, 0.0] ≈ within noise
M5 Dispatch_Switch_Warm 98522.6 +1.9% 95% CI [-3.2, +7.1] 1766.0 +0.4% 95% CI [-1.0, +1.9] ≈ within noise
M6 Dispatch_ExternalType 89757.6 +1.0% 95% CI [-3.1, +5.0] 987.6 +0.7% 95% CI [-1.9, +3.4] ≈ within noise
M7 Update_NoChange 55017.6 +5.8% 95% CI [-4.2, +15.8] 452.1 +0.7% 95% CI [-7.1, +8.4] ≈ within noise
M8 Update_OneLeafChanged 41080.2 0.0% 95% CI [-1.2, +1.2] 536.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
M9 Update_AllChanged 2787244.5 -1.1% 95% CI [-2.1, -0.2] 184278.1 0.0% 95% CI [0.0, 0.0] ≈ within noise
M10 EventHandlerState_Alloc 82689.7 -1.4% 95% CI [-2.9, +0.1] 3095.2 0.0% 95% CI [0.0, +0.1] ≈ within noise
M11 ModifierEHS_Frequency 44718.8 +0.6% 95% CI [-0.1, +1.3] 638.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M12 Pool_Rent_HotPath 116404.1 -1.0% 95% CI [-1.8, -0.3] 1099.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
M13 Setters_Suppression_Scope 90.8 +5.5% 95% CI [-0.5, +11.4] 26.7 0.0% 95% CI [0.0, 0.0] ≈ within noise
M14 Dsl_Rebuild_Cascade 1476794.6 -15.4% 95% CI [-16.5, -14.3] 2231828.9 0.0% 95% CI [0.0, 0.0] ≈ within noise
C207 ChangeHandler_DpRead_Coalesce 1137.3 +0.5% 95% CI [-3.7, +4.7] 0.6 0.0% 95% CI [0.0, 0.0] ≈ within noise
OAlloc Optional_Element_Alloc 221.0 -2.4% 95% CI [-8.4, +3.6] 528.0 0.0% 95% CI [0.0, 0.0] ≈ within noise
OUpdate Optional_Reconciler_Update 11924.5 +0.3% 95% CI [-0.7, +1.2] 2772.3 0.0% 95% CI [0.0, 0.0] ≈ within noise

Cross-framework reference (same StocksGrid workload)

Metric vanilla WinUI3¹ Rust windows-reactor² Reactor (this PR)
Renders/sec ↑ 3.45 4.80 2.74
Avg Reconcile (ms) ↓ n/a 18.2 133.9
Avg Diff (ms) ↓ n/a 16.5 121.4
Avg Memory (MB) ↓ 264.2 195.5 287.1

↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement / ⚠️ regression require the CI to exclude 0.
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042999896.
Generated by .github/workflows/perf-compare.yml · PR 5ef7a80 vs main e8572a0 · 2026-06-27T05:26:56Z · run log.

@azchohfi azchohfi closed this Jun 27, 2026
@azchohfi azchohfi deleted the sweep-666 branch June 27, 2026 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant