Conversation
YogaNode property setters (FlexDirection/Justify/Align/Width/Height/ Min/Max/margins/padding/border/position/gap/...) called MarkDirtyAndPropagate unconditionally. FlexPanel re-applies the same container and child style every MeasureOverride, so the root and every cell were re-dirtied each frame and the Yoga layout cache never hit. Add 'if (current == value) return;' guards to every setter so unchanged values do not dirty the tree, mirroring upstream Yoga's updateStyle. Route FlexPanel.ApplyAttachedProperties direct node.Style writes (FlexGrow/FlexShrink/FlexBasis/AlignSelf/PositionType/Position) through the guarded setters so a genuinely changed flex item still relayouts. Add unit tests asserting same-value assignment does not dirty, a real change does, and that re-applying identical style keeps the layout cache hot (child measure function not re-invoked). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) YogaStyle allocated 8 separate YogaValue[] arrays per instance (Margin/Position/Padding/Border x9, Gap x3, Dimensions/Min/Max x2). For a ~200-node tree that is ~1600 GC objects, a dominant Yoga memory regression vs the C++ original which uses inline members. Replace the arrays with fixed-size InlineArray structs (EdgeValues, GutterValues, DimensionValues) embedded directly in the YogaStyle heap object. Exposed as ref-returning properties so existing indexer reads and writes (style.Margin[i], style.Position[i] = v) are unchanged. Edge-resolution helpers take ReadOnlySpan<YogaValue>. Buffers are explicitly seeded to YogaValue.Undefined / Auto in a new constructor because default(YogaValue) is (0, Undefined) not the (NaN, Undefined) sentinel — keeping == and Resolve() byte-identical. InlineArray indexing is compiler-generated and AOT/trim-safe. All 704 Yoga-filtered tests pass unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LayoutResults allocated 7 float[] arrays (dimensions/measured/raw x2, position/margin/border/padding x4) plus a CachedMeasurement[8] per node. For a ~200-node tree that is ~1600 GC objects, a dominant part of the Yoga memory gap vs the C++ original which uses inline members. Replace them with fixed-size InlineArray structs (Float2, Float4, CachedMeasurementArray) embedded directly in the heap object. The public Get/Set accessors are byte-for-byte identical; CachedMeasurements is a ref-returning property so in-place element mutation (CachedMeasurements[idx].AvailableWidth = ...) is unchanged. Cached-measurement slots are still seeded via new CachedMeasurement() in the constructor/Reset because the struct's -1 field initializers do not run for default-initialized inline-array elements. The NaN dimension seeds are preserved explicitly. InlineArray indexing is AOT/trim-safe. All 704 Yoga-filtered tests pass unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ApplyAttachedProperties read ~11 attached DPs per child per MeasureOverride via GetValue, each boxing a double/enum. Attached DPs only change through the property system (OnChildPropertyChanged), so snapshot the values in a per-child AttachedProps cache and invalidate on that callback and on add/remove. Width/ Height/Margin/Visibility are still re-read each pass (they change without our callback). No layout-numeric change; 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CalculateLayoutImpl and ComputeFlexBasisForChildren each allocated a new List<YogaNode> per container per frame (2N allocs). Rent from the existing FlexLineHelper thread-static pool and return at the single method exit. Both methods have a single exit and recurse into fresh rented lists, so the LIFO pool stays balanced. 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GetLayoutChildren was a yield iterator that allocated a state-machine enumerator on every call; it is used on hot per-frame paths (baseline, trailing positions, wrap-reverse, pixel rounding). Return a value-type LayoutChildren enumerable whose struct enumerator walks _children by index with zero heap allocations in the common (no Display.Contents) case, falling back to the allocating iterator only for a Contents subtree. Also iterate RoundLayoutResultsToPixelGrid children by index instead of foreach over the IReadOnlyList Children property (which boxed List<T>.Enumerator per node). 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
IsBaselineLayout(node) was recomputed inside JustifyMainAxis for every flex line, plus again at STEP 8 — each call walking all layout children. Its result is invariant within a single layout pass (depends only on the node's FlexDirection/AlignItems and children's AlignSelf/PositionType), so compute it once in CalculateLayoutImpl and thread it into JustifyMainAxis / STEP 8. No cross-frame cache, so no dirty-invalidation hazard. 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CalculateFlexLine allocated a new sealed FlexLine per flex line per frame. Add a thread-static FlexLine pool; rent in CalculateFlexLine and return at the per-line loop tail in CalculateLayoutImpl. The ItemsInFlow list travels with the pooled FlexLine (cleared on rent) instead of being drawn from the node-list pool. A FlexLine never escapes its loop iteration and recursion rents distinct lines, so the LIFO pool stays balanced. 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The #138 setter equality guards stopped FlexPanel from re-dirtying every child each MeasureOverride. That re-dirtying used to reset Layout.ComputedFlexBasis to NaN (via MarkDirtyAndPropagate), forcing a fresh flex-basis computation each pass. ComputeFlexBasisForChild only recomputed a cached basis when it was NaN (or under the off-by-default WebFlexBasis feature). With the guards, a flex child first measured in a min-content probe (undefined main axis, basis measured from CONTENT) then reused that stale content-basis in the definite-width pass where an explicit flex-basis:0 must apply, breaking flex-grow distribution in nested FlexPanels (Flex selftests FlexNested_RowInCol_ContentWidth / FlexNested_Deep_L2LeftWidth). Fix: invalidate the resolved-basis cache when ComputedFlexBasisGeneration differs from the current generation, regardless of WebFlexBasis. Each top-level CalculateLayout bumps the generation, so this reproduces the old always-dirty basis freshness without re-dirtying the root (the frame-level layout cache still hits). All 704 Yoga fixtures unchanged; Flex selftest 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ReturnFlexLine pushed the FlexLine to the thread-static pool without clearing its ItemsInFlow list, so each pooled slot pinned the previous flex line's YogaNode references (and the UI subtrees they reach) between layout passes until the slot was next rented. For a PR whose goal is cutting per-node memory, that is a regression. Move the reset (ItemsInFlow.Clear + scalar fields) into ReturnFlexLine before pushing, matching ReturnList's clear-before-pool contract, and simplify RentFlexLine to pop-or-new. The only path into the pool is ReturnFlexLine, so a popped line is always clean at rent time. Found by the pr-review skill (correctness, high; multi-model confirmed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a pure-YogaNode unit test reproducing the nested flex-grow regression the #138 setter guards exposed: a flex-basis:0 grow item measured at content during an undefined-main-axis pass must not leak that content size into a later definite-width pass. Verified to FAIL (217 vs 200) without the per-generation basis invalidation in ComputeFlexBasisForChild and pass with it. The 704 single-generation fixtures never exercised this multi-generation path; the Flex selftest that originally caught it is outside the Yoga-only edit scope. Found by the pr-review skill (test-coverage, medium). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…en enumerator (#145) Copilot review flagged that the inner IEnumerator<YogaNode> used to flatten a Display.Contents subtree was dropped without Dispose() at natural completion, retaining the iterator's captured subtree references until the next GC. Dispose _inner at both completion points and implement IDisposable on the struct enumerator so a foreach that breaks out early mid-subtree (e.g. baseline detection in AlgorithmUtils) also releases it via the finally-Dispose. foreach calls Dispose on a struct enumerator that implements IDisposable as a non-boxing constrained call, so the common no-inner path stays allocation-free and AOT-safe. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The #147 attached-DP read cache is invalidated via OnChildPropertyChanged (while the child is parented) and the SyncYogaTree removal sweep. Neither fires when an attached DP changes while the child is detached from its panel and the child is removed+re-added before the next measure: OnChildPropertyChanged cannot reach the owning panel to drop the entry, and the removal sweep skips a child that is present again. ApplyAttachedProperties then trusted a stale snapshot. Flag such elements in a static weak-keyed ConditionalWeakTable from the detached branch of OnChildPropertyChanged; ApplyAttachedProperties consults it only on a cache hit and forces a re-read (clearing the marker). Steady-state cost is one empty-CWT probe per cached child per sync, preserving the #147 win. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add three Yoga unit tests surfaced by the pre-GO review gate: - Setter_RealChange_MarksDirty now also covers aspectRatio (#138 real-change arm). - Setter_AspectRatio_DegenerateNormalizesToAuto locks in the 0/Infinity->NaN normalization: re-applying a degenerate ratio is a no-op (NaN==NaN guard, so the layout cache is not defeated), while a genuine ratio change still dirties. - FlexLineHelper_RentAndReturnFlexLine_ReusesAndResets asserts the #144 pooled FlexLine is reused AND fully reset on return (no stale scalars or pinned YogaNode refs leak across layout passes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The setter-guard and flex-basis-generation test sections (added earlier on this branch) restarted the file's section numbering at 6/7 after section 12. Continue the sequence as 13/14 so the section headings stay unique and navigable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf |
| { | ||
| // Degenerate aspect ratios act as auto | ||
| _style.AspectRatio = (value == 0 || float.IsInfinity(value)) ? float.NaN : value; | ||
| float normalized = (value == 0 || float.IsInfinity(value)) ? float.NaN : value; |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 3.65 | 3.67 | +1.2% 95% CI [-2.6, +5.0] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 101.4 | 101.3 | -0.8% 95% CI [-3.8, +2.2] | ≈ within noise |
| Avg Diff (ms) ↓ | 90.4 | 90.4 | -1.3% 95% CI [-4.2, +1.6] | ≈ within noise |
| Avg Memory (MB) ↓ | 289.4 | 289.7 | -0.1% 95% CI [-0.7, +0.6] | ≈ within noise |
Low-mutation skip-floor (--percent 0)
At --percent 0 the workload mutates few cells per tick (always at least one), so reconcile/diff isolate the O(n) per-tick child skip-walk floor that higher mutation rates dilute — ChildReconciler re-walks every child each tick even when nothing moved. The closer --percent is to 0, the more this floor is the signal, so a structural-skip optimization shows up cleanly where the headline table above buries it. Δ is the mean paired change with a 95% CI.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 16.55 | 16.15 | -1.1% 95% CI [-6.3, +4.1] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 31.8 | 29.9 | -3.8% 95% CI [-10.1, +2.6] | ≈ within noise |
| Avg Diff (ms) ↓ | 29.8 | 28.1 | -3.9% 95% CI [-10.6, +2.9] | ≈ within noise |
| Avg Memory (MB) ↓ | 268.8 | 269.4 | +0.2% 95% CI [-0.1, +0.5] | ≈ within noise |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 5699870 | 5713848 | -0.1% 95% CI [-0.6, +0.4] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 223.61 | 223.61 | +0.1% 95% CI [-3.8, +3.9] | ≈ within noise |
Keyed-list workload (StressPerf.KeyedList, --percent 50)
A separate macro workload: a ~500-row stably keyed list whose rows are reordered / inserted / removed each tick. Because every child carries a key, the child reconciler takes its keyed arm (ReconcileKeyed → ReconcileKeyedMiddle, the LIS-based minimal-move pass) instead of the positional re-walk the StocksGrid tables above measure — so this is the sensitive macro signal for keyed-diff work the positional cells can never reach. Same interleaved paired-Δ 95% CI as the headline table.
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 15.77 | 15.70 | +0.9% 95% CI [-1.1, +2.9] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 9.2 | 9.3 | +1.0% 95% CI [-1.4, +3.5] | ≈ within noise |
| Avg Diff (ms) ↓ | 9.0 | 9.1 | +0.8% 95% CI [-1.4, +3.1] | ≈ within noise |
| Avg Memory (MB) ↓ | 166.1 | 165.9 | -0.1% 95% CI [-0.3, +0.1] | ≈ within noise |
Allocation (keyed-list) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 314092 | 314406 | +0.1% 95% CI [-0.2, +0.4] | ≈ within noise |
| Gen0 GC / 1k renders ↓ | 18.63 | 18.87 | +3.4% 95% CI [-9.5, +16.3] | ≈ within noise |
Reconciler micro-benchmarks (PerfBench.ControlModel)
Production --variant Reactor control-model path, ns-resolution and WinUI-undiluted (spec-047 M1–M13) — ↓ lower is better. Status tracks allocated bytes/op, the authoritative signal here; it is deterministic for structurally-fixed benches, while dispatcher / background-thread benches carry a small process-to-process offset, so a bench is flagged only when its 95% CI clears a ±3% minimum-effect band (real structural alloc changes are several percent to many-x). ns/op is shown for context but is not auto-flagged (its paired CI is rep-interleaved but the flag remains dormant pending a real-CI identical-binary band calibration). Δ is the mean paired change with a 95% CI.
| Bench | main ns/op |
Δ ns (95% CI) | main B/op |
Δ alloc (95% CI) | Status |
|---|---|---|---|---|---|
M1 Mount_Leaf_NoCallback |
93101.9 | +2.3% 95% CI [-0.4, +5.0] | 1140.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M2 Mount_Leaf_OneCallback |
67702.3 | 0.0% 95% CI [-1.5, +1.5] | 3383.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M3 Mount_Leaf_ThreeCallbacks |
154486.9 | -0.2% 95% CI [-3.7, +3.2] | 8760.5 | +0.1% 95% CI [-0.2, +0.3] | ≈ within noise |
M4 Dispatch_Switch_Cold |
70896.7 | -2.0% 95% CI [-4.2, +0.1] | 1767.8 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M5 Dispatch_Switch_Warm |
69693.1 | -3.1% 95% CI [-6.5, +0.3] | 1845.7 | +0.8% 95% CI [-0.4, +1.9] | ≈ within noise |
M6 Dispatch_ExternalType |
61927.7 | 0.0% 95% CI [-0.5, +0.5] | 1069.6 | +0.7% 95% CI [-1.9, +3.4] | ≈ within noise |
M7 Update_NoChange |
37803.0 | +0.1% 95% CI [-0.2, +0.4] | 370.1 | -1.2% 95% CI [-7.6, +5.3] | ≈ within noise |
M8 Update_OneLeafChanged |
26948.0 | -0.1% 95% CI [-1.0, +0.9] | 536.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M9 Update_AllChanged |
2495399.8 | +0.5% 95% CI [-0.4, +1.4] | 184278.1 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M10 EventHandlerState_Alloc |
47602.1 | +0.7% 95% CI [-1.5, +2.9] | 3101.8 | +0.2% 95% CI [-0.6, +1.1] | ≈ within noise |
M11 ModifierEHS_Frequency |
30971.6 | +1.9% 95% CI [+0.5, +3.4] | 640.5 | -0.1% 95% CI [-0.5, +0.4] | ≈ within noise |
M12 Pool_Rent_HotPath |
74600.6 | -1.5% 95% CI [-3.3, +0.2] | 1099.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M13 Setters_Suppression_Scope |
93.8 | +8.9% 95% CI [-1.1, +18.9] | 26.7 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
M14 Dsl_Rebuild_Cascade |
1279208.4 | +0.5% 95% CI [-1.6, +2.7] | 2231828.9 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
C207 ChangeHandler_DpRead_Coalesce |
898.7 | -0.1% 95% CI [-6.1, +6.0] | 0.6 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OAlloc Optional_Element_Alloc |
206.2 | +2.3% 95% CI [-3.3, +7.9] | 528.0 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
OUpdate Optional_Reconciler_Update |
9522.6 | +0.1% 95% CI [-1.0, +1.1] | 2772.3 | 0.0% 95% CI [0.0, 0.0] | ≈ within noise |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 4.58 | 6.34 | 3.67 |
| Avg Reconcile (ms) ↓ | n/a | 18.9 | 101.3 |
| Avg Diff (ms) ↓ | n/a | 17.1 | 90.4 |
| Avg Memory (MB) ↓ | 262.8 | 193.0 | 289.7 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
Reconciler micro-benchmarks run PerfBench.ControlModel --variant Reactor (M1–M13) as a headless loop bracketed by per-thread alloc + GC counters — ns-resolution and free of WinUI render / working-set dilution, so they resolve Core/Reconciler allocation deltas the macro StocksGrid workload cannot. main and PR each link their own src/Reactor build and are rep-interleaved (a fresh alternated process per rep); Δ is the paired 95% CI over per-rep means. The Status column tracks allocated bytes/op (deterministic for identical code); ns/op is informational — its paired CI is now unbiased but the flag stays dormant pending a real-CI identical-binary band calibration.
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 9V74 80-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1043000781.
Generated by .github/workflows/perf-compare.yml · PR e4194f4 vs main e8572a0 · 2026-06-27T05:35:15Z · run log.
Throwaway re-validation of #670 on origin/main b9ace1e (=#692+M14+#665+#649). DO NOT MERGE; origin #670 pristine.