DO-NOT-MERGE: #657 revert for keyed-regression confirm#734
Conversation
|
/perf |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 2.66 | 2.63 | -1.5% 95% CI [-7.3, +4.2] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 121.7 | 122.5 | +2.6% 95% CI [-1.9, +7.1] | ≈ within noise |
| Avg Diff (ms) ↓ | 112.1 | 111.8 | +2.4% 95% CI [-2.3, +7.2] | ≈ within noise |
| Avg Memory (MB) ↓ | 283.8 | 283.8 | -0.2% 95% CI [-1.2, +0.8] | ≈ within noise |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.59 | 16.48 | -1.2% 95% CI [-8.5, +6.0] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 37.1 | 35.0 | -1.8% 95% CI [-7.7, +4.1] | ≈ within noise |
| Avg Diff (ms) ↓ | 35.0 | 33.0 | -2.1% 95% CI [-8.2, +4.0] | ≈ within noise |
| Avg Memory (MB) ↓ | 266.0 | 265.4 | -0.2% 95% CI [-0.5, +0.1] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 4848013 | 4884956 | +1.4% 95% CI [+0.2, +2.6] | |
| Gen0 GC / 1k renders ↓ | 192.31 | 200.00 | +8.1% 95% CI [-3.9, +20.1] | ≈ within noise |
Keyed-list workload (StressPerf.KeyedList, --percent 50)
A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.34 | 18.73 | +16.1% 95% CI [+12.3, +20.0] | ✅ improvement |
| Avg Reconcile (ms) ↓ | 20.6 | 17.6 | -15.6% 95% CI [-17.9, -13.4] | ✅ improvement |
| Avg Diff (ms) ↓ | 20.4 | 17.4 | -15.6% 95% CI [-17.8, -13.3] | ✅ improvement |
| Avg Memory (MB) ↓ | 164.2 | 167.9 | +1.7% 95% CI [+1.0, +2.4] |
Allocation (keyed-list) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 216279 | 314216 | +45.2% 95% CI [+44.4, +46.0] | |
| Gen0 GC / 1k renders ↓ | 11.83 | 15.67 | +36.1% 95% CI [+27.7, +44.5] |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
149999.2 | +0.6% 95% CI [-2.3, +3.6] | 1140.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
109267.4 | -1.0% 95% CI [-5.7, +3.7] | 3383.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
225433.3 | -1.8% 95% CI [-5.8, +2.1] | 8395.4 | +1.6% 95% CI [+0.2, +2.9] | ≈ within noise |
M4 Dispatch_Switch_Cold |
112542.6 | -3.6% 95% CI [-8.3, +1.0] | 1767.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
111827.4 | +1.1% 95% CI [-7.6, +9.7] | 1805.9 | -1.4% 95% CI [-3.6, +0.8] | ≈ within noise |
M6 Dispatch_ExternalType |
91199.2 | +0.6% 95% CI [-0.5, +1.6] | 1028.6 | -2.4% 95% CI [-6.4, +1.5] | ≈ within noise |
M7 Update_NoChange |
55403.2 | +0.2% 95% CI [-0.5, +0.8] | 370.1 | +8.4% 95% CI [-3.1, +19.8] | ≈ within noise |
M8 Update_OneLeafChanged |
42066.4 | +0.8% 95% CI [-2.0, +3.7] | 536.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2884322.0 | +0.1% 95% CI [-1.2, +1.4] | 184278.1 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
86233.9 | -0.1% 95% CI [-2.6, +2.4] | 3095.2 | 0.0% 95% CI [0.0, +0.1] | ≈ within noise |
M11 ModifierEHS_Frequency |
45952.8 | +1.3% 95% CI [-0.5, +3.2] | 638.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M12 Pool_Rent_HotPath |
117647.9 | +1.6% 95% CI [+0.1, +3.1] | 1099.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
107.1 | +28.9% 95% CI [+5.0, +52.8] | 26.7 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M14 Dsl_Rebuild_Cascade |
1580037.0 | +0.7% 95% CI [-1.7, +3.1] | 2231828.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
C207 ChangeHandler_DpRead_Coalesce |
1262.9 | +6.4% 95% CI [-9.9, +22.8] | 0.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
214.5 | +4.4% 95% CI [-2.6, +11.5] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
12558.9 | -0.8% 95% CI [-3.1, +1.5] | 2772.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 3.06 | 4.65 | 2.63 |
| Avg Reconcile (ms) ↓ | n/a | 19.7 | 122.5 |
| Avg Diff (ms) ↓ | n/a | 18.3 | 111.8 |
| Avg Memory (MB) ↓ | 263.3 | 197.8 | 283.8 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1043025925.
Generated by .github/workflows/perf-compare.yml · PR 5274c5a vs main 0002f19 · 2026-06-27T16:46:48Z · run log.
Measurement-only: reverts #657 (628fb7f) off current main to confirm its keyed-list reconcile/diff regression as an inverse signal. Do not merge; will be torn down.