perf: Yoga layout-cache guards + inline per-node arrays#670
Conversation
YogaNode property setters (FlexDirection/Justify/Align/Width/Height/ Min/Max/margins/padding/border/position/gap/...) called MarkDirtyAndPropagate unconditionally. FlexPanel re-applies the same container and child style every MeasureOverride, so the root and every cell were re-dirtied each frame and the Yoga layout cache never hit. Add 'if (current == value) return;' guards to every setter so unchanged values do not dirty the tree, mirroring upstream Yoga's updateStyle. Route FlexPanel.ApplyAttachedProperties direct node.Style writes (FlexGrow/FlexShrink/FlexBasis/AlignSelf/PositionType/Position) through the guarded setters so a genuinely changed flex item still relayouts. Add unit tests asserting same-value assignment does not dirty, a real change does, and that re-applying identical style keeps the layout cache hot (child measure function not re-invoked). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
) YogaStyle allocated 8 separate YogaValue[] arrays per instance (Margin/Position/Padding/Border x9, Gap x3, Dimensions/Min/Max x2). For a ~200-node tree that is ~1600 GC objects, a dominant Yoga memory regression vs the C++ original which uses inline members. Replace the arrays with fixed-size InlineArray structs (EdgeValues, GutterValues, DimensionValues) embedded directly in the YogaStyle heap object. Exposed as ref-returning properties so existing indexer reads and writes (style.Margin[i], style.Position[i] = v) are unchanged. Edge-resolution helpers take ReadOnlySpan<YogaValue>. Buffers are explicitly seeded to YogaValue.Undefined / Auto in a new constructor because default(YogaValue) is (0, Undefined) not the (NaN, Undefined) sentinel — keeping == and Resolve() byte-identical. InlineArray indexing is compiler-generated and AOT/trim-safe. All 704 Yoga-filtered tests pass unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LayoutResults allocated 7 float[] arrays (dimensions/measured/raw x2, position/margin/border/padding x4) plus a CachedMeasurement[8] per node. For a ~200-node tree that is ~1600 GC objects, a dominant part of the Yoga memory gap vs the C++ original which uses inline members. Replace them with fixed-size InlineArray structs (Float2, Float4, CachedMeasurementArray) embedded directly in the heap object. The public Get/Set accessors are byte-for-byte identical; CachedMeasurements is a ref-returning property so in-place element mutation (CachedMeasurements[idx].AvailableWidth = ...) is unchanged. Cached-measurement slots are still seeded via new CachedMeasurement() in the constructor/Reset because the struct's -1 field initializers do not run for default-initialized inline-array elements. The NaN dimension seeds are preserved explicitly. InlineArray indexing is AOT/trim-safe. All 704 Yoga-filtered tests pass unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ApplyAttachedProperties read ~11 attached DPs per child per MeasureOverride via GetValue, each boxing a double/enum. Attached DPs only change through the property system (OnChildPropertyChanged), so snapshot the values in a per-child AttachedProps cache and invalidate on that callback and on add/remove. Width/ Height/Margin/Visibility are still re-read each pass (they change without our callback). No layout-numeric change; 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CalculateLayoutImpl and ComputeFlexBasisForChildren each allocated a new List<YogaNode> per container per frame (2N allocs). Rent from the existing FlexLineHelper thread-static pool and return at the single method exit. Both methods have a single exit and recurse into fresh rented lists, so the LIFO pool stays balanced. 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GetLayoutChildren was a yield iterator that allocated a state-machine enumerator on every call; it is used on hot per-frame paths (baseline, trailing positions, wrap-reverse, pixel rounding). Return a value-type LayoutChildren enumerable whose struct enumerator walks _children by index with zero heap allocations in the common (no Display.Contents) case, falling back to the allocating iterator only for a Contents subtree. Also iterate RoundLayoutResultsToPixelGrid children by index instead of foreach over the IReadOnlyList Children property (which boxed List<T>.Enumerator per node). 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
IsBaselineLayout(node) was recomputed inside JustifyMainAxis for every flex line, plus again at STEP 8 — each call walking all layout children. Its result is invariant within a single layout pass (depends only on the node's FlexDirection/AlignItems and children's AlignSelf/PositionType), so compute it once in CalculateLayoutImpl and thread it into JustifyMainAxis / STEP 8. No cross-frame cache, so no dirty-invalidation hazard. 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CalculateFlexLine allocated a new sealed FlexLine per flex line per frame. Add a thread-static FlexLine pool; rent in CalculateFlexLine and return at the per-line loop tail in CalculateLayoutImpl. The ItemsInFlow list travels with the pooled FlexLine (cleared on rent) instead of being drawn from the node-list pool. A FlexLine never escapes its loop iteration and recursion rents distinct lines, so the LIFO pool stays balanced. 704 Yoga fixtures pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The #138 setter equality guards stopped FlexPanel from re-dirtying every child each MeasureOverride. That re-dirtying used to reset Layout.ComputedFlexBasis to NaN (via MarkDirtyAndPropagate), forcing a fresh flex-basis computation each pass. ComputeFlexBasisForChild only recomputed a cached basis when it was NaN (or under the off-by-default WebFlexBasis feature). With the guards, a flex child first measured in a min-content probe (undefined main axis, basis measured from CONTENT) then reused that stale content-basis in the definite-width pass where an explicit flex-basis:0 must apply, breaking flex-grow distribution in nested FlexPanels (Flex selftests FlexNested_RowInCol_ContentWidth / FlexNested_Deep_L2LeftWidth). Fix: invalidate the resolved-basis cache when ComputedFlexBasisGeneration differs from the current generation, regardless of WebFlexBasis. Each top-level CalculateLayout bumps the generation, so this reproduces the old always-dirty basis freshness without re-dirtying the root (the frame-level layout cache still hits). All 704 Yoga fixtures unchanged; Flex selftest 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ReturnFlexLine pushed the FlexLine to the thread-static pool without clearing its ItemsInFlow list, so each pooled slot pinned the previous flex line's YogaNode references (and the UI subtrees they reach) between layout passes until the slot was next rented. For a PR whose goal is cutting per-node memory, that is a regression. Move the reset (ItemsInFlow.Clear + scalar fields) into ReturnFlexLine before pushing, matching ReturnList's clear-before-pool contract, and simplify RentFlexLine to pop-or-new. The only path into the pool is ReturnFlexLine, so a popped line is always clean at rent time. Found by the pr-review skill (correctness, high; multi-model confirmed). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a pure-YogaNode unit test reproducing the nested flex-grow regression the #138 setter guards exposed: a flex-basis:0 grow item measured at content during an undefined-main-axis pass must not leak that content size into a later definite-width pass. Verified to FAIL (217 vs 200) without the per-generation basis invalidation in ComputeFlexBasisForChild and pass with it. The 704 single-generation fixtures never exercised this multi-generation path; the Flex selftest that originally caught it is outside the Yoga-only edit scope. Found by the pr-review skill (test-coverage, medium). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR targets the Yoga layout engine (src/Reactor/Yoga/) to improve frame-to-frame layout-cache hit rates (by preventing no-op style re-dirtying) and reduce per-node memory/GC pressure (by replacing per-node arrays with [InlineArray] inline buffers), with additional hot-path allocation reductions via pooling and iteration changes.
Changes:
- Add equality guards to
YogaNodestyle setters so re-applying identical style does not mark nodes dirty, enabling effective layout caching; add unit tests covering no-op sets vs real changes. - Replace multiple per-node heap arrays in
YogaStyleandLayoutResultswith inline[InlineArray]buffers while keeping accessor semantics consistent. - Reduce per-layout allocations by pooling child lists /
FlexLine, and by switching hot traversals away from allocating enumerators.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Reactor.Tests/YogaEdgeCaseTests.cs | Adds tests asserting no-op setter writes don’t dirty nodes and that layout cache stays hot across repeated layouts. |
| src/Reactor/Yoga/YogaStyle.cs | Replaces edge/gutter/dimension YogaValue[] arrays with inline buffers and updates edge computation helpers. |
| src/Reactor/Yoga/YogaNode.cs | Adds equality-guarded setters and introduces an allocation-free GetLayoutChildren() enumerator path. |
| src/Reactor/Yoga/YogaAlgorithm.cs | Uses pooled lists for layout child collection and computes baseline-layout flag once per layout pass; fixes flex-basis cache invalidation across generations. |
| src/Reactor/Yoga/LayoutResults.cs | Replaces multiple float arrays + cached-measurement array with inline buffers; updates reset/initialization accordingly. |
| src/Reactor/Yoga/FlexPanel.cs | Adds per-child attached-DP caching and routes updates through YogaNode setters to avoid bypassing dirty tracking. |
| src/Reactor/Yoga/AlgorithmUtils.cs | Avoids per-node enumerator boxing in pixel-rounding traversal and adds FlexLine pooling support. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…en enumerator (#145) Copilot review flagged that the inner IEnumerator<YogaNode> used to flatten a Display.Contents subtree was dropped without Dispose() at natural completion, retaining the iterator's captured subtree references until the next GC. Dispose _inner at both completion points and implement IDisposable on the struct enumerator so a foreach that breaks out early mid-subtree (e.g. baseline detection in AlgorithmUtils) also releases it via the finally-Dispose. foreach calls Dispose on a struct enumerator that implements IDisposable as a non-boxing constrained call, so the common no-inner path stays allocation-free and AOT-safe. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The #147 attached-DP read cache is invalidated via OnChildPropertyChanged (while the child is parented) and the SyncYogaTree removal sweep. Neither fires when an attached DP changes while the child is detached from its panel and the child is removed+re-added before the next measure: OnChildPropertyChanged cannot reach the owning panel to drop the entry, and the removal sweep skips a child that is present again. ApplyAttachedProperties then trusted a stale snapshot. Flag such elements in a static weak-keyed ConditionalWeakTable from the detached branch of OnChildPropertyChanged; ApplyAttachedProperties consults it only on a cache hit and forces a re-read (clearing the marker). Steady-state cost is one empty-CWT probe per cached child per sync, preserving the #147 win. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add three Yoga unit tests surfaced by the pre-GO review gate: - Setter_RealChange_MarksDirty now also covers aspectRatio (#138 real-change arm). - Setter_AspectRatio_DegenerateNormalizesToAuto locks in the 0/Infinity->NaN normalization: re-applying a degenerate ratio is a no-op (NaN==NaN guard, so the layout cache is not defeated), while a genuine ratio change still dirties. - FlexLineHelper_RentAndReturnFlexLine_ReusesAndResets asserts the #144 pooled FlexLine is reused AND fully reset on return (no stale scalars or pinned YogaNode refs leak across layout passes). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The setter-guard and flex-basis-generation test sections (added earlier on this branch) restarted the file's section numbering at 6/7 after section 12. Continue the sequence as 13/14 so the section headings stay unique and navigable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/perf |
⚡ Reactor perf comparisonWorkload: Regression vs
|
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Renders/sec ↑ | 2.62 | 2.48 | -3.6% 95% CI [-8.0, +0.9] | ≈ within noise |
| Avg Reconcile (ms) ↓ | 135.9 | 145.4 | +4.4% 95% CI [-0.2, +8.9] | ≈ within noise |
| Avg Diff (ms) ↓ | 125.3 | 134.8 | +4.8% 95% CI [-0.1, +9.6] | ≈ within noise |
| Avg Memory (MB) ↓ | 295.3 | 289.1 | -1.6% 95% CI [-2.8, -0.5] | ✅ improvement |
Allocation (Reactor) — lower is better
| Metric | main (baseline) |
This PR | Δ (95% CI) | Status |
|---|---|---|---|---|
| Alloc bytes/render ↓ | 9542343 | n/a | — | — |
| Gen0 GC / 1k renders ↓ | 269.23 | n/a | — | — |
Cross-framework reference (same StocksGrid workload)
| Metric | vanilla WinUI3¹ | Rust windows-reactor² |
Reactor (this PR) |
|---|---|---|---|
| Renders/sec ↑ | 3.24 | 4.95 | 2.48 |
| Avg Reconcile (ms) ↓ | n/a | 18.8 | 145.4 |
| Avg Diff (ms) ↓ | n/a | 17.0 | 134.8 |
| Avg Memory (MB) ↓ | 264.5 | 197.4 | 289.1 |
↑ higher is better · ↓ lower is better. Within noise = the 95% confidence interval of the paired Δ includes 0 (no change resolvable at this sample size); ✅ improvement /
Allocation metrics (alloc bytes/render, Gen0 GC) are the sensitive signal for allocation-reduction work, where the mean-ms / memory figures are largely flat. They read n/a for a harness built from a revision that predates them (rebase the PR onto main to populate them).
¹ vanilla WinUI3 = StressPerf.Direct (imperative; no virtual-DOM, so it has no reconcile/diff phase — those cells read n/a). Measured live on this runner.
² Rust = test_reactor_perf from microsoft/windows-rs — a port of this harness (same StocksGrid, same --percent/--duration CLI). Built from source and measured live on this runner.
Absolute numbers are runner-dependent — trust the Δ vs main, not the absolute values. Memory (working set) is the noisiest metric.
Runner: CPU: AMD EPYC 7763 64-Core Processor · 4 logical cores · 16 GB RAM · runner: GitHub Actions 1042818410.
Generated by .github/workflows/perf-compare.yml · PR 1323999 vs main 66d38dc · 2026-06-26T06:57:45Z · run log.
Closes #658
Summary
Targets the Yoga layout engine (
src/Reactor/Yoga/only) — implicated in both the Renders/sec gap (layout cache never hit → full re-layout every frame) and the ~1.5x Memory gap (per-node arrays where the C++ original uses inline members) on the StocksGrid data-grid stress workload.Fixes
Flagship — renders (layout cache):
YogaNodesetter equality guards (if (current == value) return;) onFlexDirection/Justify/Align/Width/Height/Min/margins/… soSetRootConstraints+ApplyAttachedPropertiesre-setting unchanged values each frame no longer re-dirties the root. This is what lets the frame-level layout cache actually hit for stable cells. Unit tests assert no-op sets don't dirty and real changes do (incl. theAspectRatiodegenerate-0/Infinity→autonormalization).Flagship — memory:
LayoutResults: 7 separatefloat[]+CachedMeasurement[]per node → inlineFloat2/Float4/cached-measurement storage via[InlineArray](Unsafe.Addindexing, AOT-safe). Public accessors unchanged.YogaStyle: 8 separateYogaValue[]arrays per instance → one inline fixed-layout edge struct via[InlineArray]+refindexer.Supporting allocations / boxing:
FlexPanelcaches last-seen attached-DP values per child; pushes only on change (removes 9N boxing GetValue/SetValue per frame). The cache is invalidated when a DP changes while the child is parented (OnChildPropertyChanged), by theSyncYogaTreeremoval sweep, and — for a DP changed while the child is detached and re-added before the next measure — via a static weak-keyed dirty flag consulted on cache hit.YogaAlgorithmrents thelayoutChildren/children lists from a pool and threads them down (removes 2N list allocs/frame).FlexLinepooled (no per-line heap alloc per frame); reset-on-return is unit-tested so no stale scalars or pinnedYogaNoderefs leak across passes.yieldenumerator / boxedList.Enumeratorin hot paths). TheDisplay.Contentsfallback enumerator is disposed (structEnumerator : IDisposable).IsBaselineLayoutcomputed once per layout pass instead of per flex line.Regression fix (caught by Flex selftest):
MeasureOverride; that re-dirty used to resetLayout.ComputedFlexBasis = NaN.ComputeFlexBasisForChildonly recomputed a cached basis when it wasNaN(or under the off-by-defaultWebFlexBasisflag), so a child first measured in a min-content probe (undefined main axis → basis from content) reused that stale content-basis in the later definite-width pass whereflex-basis:0must apply — breaking flex-grow distribution in nestedFlexPanels. Fix: invalidate the resolved-basis cache whenComputedFlexBasisGenerationdiffers from the current generation, regardless ofWebFlexBasis. Each top-levelCalculateLayoutbumps the generation, so this restores the old always-fresh basis without re-dirtying the root (the layout cache still hits).Deferred (intentionally, for correctness)
ComputeMinContentre-measure — load-bearing: content changes don't otherwise dirty the node. Removing it would drop genuine relayouts.LayoutResults.Reset— dead code (no callers); the GC concern is superseded by [Bug] Controls with readonly dependency properties are not supported #142 makingCachedMeasurementa struct.Validation
--filter ~Yoga): 708 passed, 0 failed, 46 skipped — encode Yoga's reference behavior; all preserved.tests/Reactor.Tests: 9684 passed, 0 failed, 64 skipped.--self-test --filter Flex): 0 failures, including the two nested flex-grow fixtures the regression fix restores.dotnet build src/Reactor/Reactor.csproj -c Release: coreReactor.dllbuilds with 0 warnings / 0 errors (the core lib treats AOT/trim warnings as errors, so theInlineArray/Unsafe/ConditionalWeakTableusage is confirmed AOT-safe).Layout correctness is paramount — no fixture numeric results changed.