Skip to content

test: flaky LT_OnMountUnmountBalanced selftest — unmount count asserted before async OnUnmount flush completes #685

Description

@azchohfi

Summary

Intermittent selftest failure: LT_OnMountUnmountBalanced (asserts LT_Unmounts_Exactly_30 + LT_NoLeak_MountsEqualUnmounts) in tests/Reactor.AppTests.Host/SelfTest/Fixtures/LifecycleTortureFixtures.cs. A generic lifecycle-torture fixture that mount/unmount-counts a TextBlock across rapid toggles.

Root cause (test-side race, not product code)

LT_Mounts_Exactly_30 passes (mounts == 30), but the final drive!(0) clear batch's OnUnmount callbacks have not all flushed when the unmount count is asserted → unmounts < 30 → assertion fails. The exactly-conserved unmount count is checked immediately after the clear batch rather than after a deterministic quiesce of the async OnUnmount dispatch.

Evidence

  • Observed on PR perf: cache per-render arrays/LINQ in DataGrid #669 CI, commit 9f640ba0, Selftests job run 28200113984: 1/1209 failed (only LT_OnMountUnmountBalanced).
  • gh run rerun 28200113984 --failedSelftests PASSED (7m15s) on the same commit, zero code change → confirmed non-deterministic.
  • Not caused by perf: cache per-render arrays/LINQ in DataGrid #669: that branch touches only 3 DataGrid files (no AppTests.Host changes; only static readonly immutable arrays + pure methods). Every DataGrid fixture passed; the failing fixture doesn't use DataGrid. Fixture last modified by unrelated commit 222a299a.

Impact

False-red on the Selftests CI job that can hit any PR intermittently. Relevant right now because the perf-improvement fleet is about to go through an individual per-PR gh pr ready + CI + /perf GO cycle — a flake-induced red Selftests run during GO must not be mistaken for a real regression.

Suggested fix direction

Make the unmount-count assertion await a deterministic OnUnmount flush (drain/quiesce the dispatcher so all OnUnmount callbacks from the final drive!(0) clear batch have run) before asserting the conserved count — rather than asserting immediately after the clear batch. Alternatively add an explicit sync point that guarantees all OnUnmount callbacks completed.

Scope note

Not part of the /perf self-contained harness build fix (#677) — that is narrowly the StressPerf harness build, not selftest stability. Tracking this separately so it can be fixed independently / post-fleet.

Refs #669. Flagged by the Datagrid-allocs perf session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtest

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions