perf(ci): startup / first-frame metric for /perf (PR6) by azchohfi · Pull Request #706 · microsoft/microsoft-ui-reactor

azchohfi · 2026-06-26T17:33:37Z

What

Adds a startup / first-frame section to the /perf comment — the from-scratch mount cost that every steady-state table excludes by construction (they window-average over benchmark ticks, after the mount). This is exactly where a #696-class mount/reconcile regression lives, and nothing in the harness measured it. Closes the user's originating "first frame rendered" question.

Class-B harness PR — 0 src/Reactor changes (everything under tests/stress_perf/**).

The four anchors (captured once per process, all "lower is better", ms from managed entry)

Field	What it is	Why
`firstReconcileDurationMs`	Reactor-isolated first-render reconcile-phase duration (the first `OnRenderComplete`'s `reconcile` arg)	The diagnostic. Same phase steady-state Avg Diff averages, so it's directly comparable — but undiluted by AOT/window/XAML bootstrap, so a 2× mount regression shows at full size instead of as a single-digit-% bootstrap blip. This is the field that makes the metric catch the user's class of bug.
`entryToFirstFrameMs`	managed entry → first composed frame	The human-facing "first frame rendered" headline.
`entryToFirstReconcileMs`	managed entry → first reconcile	Guaranteed-monotonic companion.
`windowOpenToFirstReconcileMs`	window `Activated` → first reconcile	n/a-guarded secondary — `Activated`-vs-mount ordering is non-deterministic across launches, so it emits JSON `null` → renders `n/a` rather than a garbage negative that would poison the paired CI.

How it's wired (0 src/Reactor)

StressPerf.Shared/StartupTiming.cs (new) — process-static monotonic entry/window-open anchors (Stopwatch.GetTimestamp()), idempotent first-wins, AOT/trim-safe.
StressPerf.Shared/PerfTracker.cs — RecordFirstRenderIfUnset (one-shot, called before the benchmark gate so it sees the full mount); first-frame capture gated so T_firstFrame ≥ T_firstReconcile; 4 nullable accessors; JSON emission (null for an un-captured anchor).
StressPerf.ReactorOptimized/Program.cs — MarkEntry() as the first statement of Main; subscribe PrimaryWindow.Activated → MarkWindowOpen(); RecordFirstRenderIfUnset(...) as the first line of OnRenderComplete.
ci/PerfLib.ps1 — parse the 4 optional fields (a head that predates them → n/a, exactly like the alloc fields), aggregate them as headline keys (reuses the paired-95%-CI machinery), Format-PerfStartupSection, a dormant flag + provisional band, and an info glyph.

The section piggybacks the existing headline per-rep launches (one sample/process) — so it costs zero extra CI time, and the cold first launch is warmup-dropped automatically because it rides the same per-rep metrics object that Run-PerfBenchmark.ps1's interleave loop already discards. Run-PerfBenchmark.ps1 needs no change.

⚠️ Flags for the reviewer / coordinator

This introducing run renders PR-side shape only — by design. perf-compare.yml builds the baseline exe from a separate main worktree and overlays the PR via ProjectReference (not Program.cs/PerfTracker.cs). So on this PR's /perf run the main side predates the startup fields → emits null → no paired Δ (the section renders a visible n/a note: "populates on the next /perf after this lands"). The paired Δ + the identical-binary band calibration only populate on the first run after merge. The gate-clean run here can prove PR-side shape (fields present, monotonic, firstReconcile >> AvgDiff), not the paired delta.
firstReconcileDurationMs is directly comparable to steady-state Avg Diff (same reconcile/diff-patch phase). Expected shape: firstReconcileDurationMs >> AvgDiffMs (mount creates every control; steady ticks patch a few).
Informational-first (dormant flag). $StartupAutoFlag ships $false — Δ + CI are shown but no row auto-flags better-or-worse until a real-CI identical-binary ~0-false-flag band calibration (same discipline as the micro ns flag; measurement-only, never changes what merges). Startup is the noisiest axis (one sample/launch + bootstrap process-to-process variance), so provisional band ≥ the micro band.
Carry-for-user (please confirm): the headline entryToFirstFrameMs (managed entry → first composed frame) should match whatever the user's dashboard labels "first frame rendered". Flagging so the harness number is directly comparable to what they originally raised.

Tests

+39 PerfLib.Tests assertions: parse (incl. JSON-null window-open + pre-metric head), aggregate, render (informational/dormant + armed verdict), §11 lineage both directions, monotonic shape, cold-first-launch warmup-drop, comment ordering/footnote, info glyph. PerfLib.Tests 322 + RunPerfBenchmark.Tests 76 green.

Docs: METHODOLOGY.md + ci/README.md startup sections.

🤖 Draft — held for coordinator independent-verify + merge (Class-B coordinator-merge discipline).

Capture the from-scratch mount cost that every steady-state /perf table excludes by construction (their windows baseline on the first benchmark tick, after the mount). This is where a #696-class mount/reconcile regression lives, and nothing else in the harness measured it — closing the user's originating "first frame rendered" question. StressPerf.ReactorOptimized now records four startup anchors once per process: - firstReconcileDurationMs — the Reactor-ISOLATED first-render reconcile-phase duration (first OnRenderComplete's reconcile arg = the same phase steady-state Avg Diff averages), undiluted by AOT/window/XAML bootstrap so a mount regression shows at full size rather than as a bootstrap-diluted blip. - entryToFirstFrameMs — managed entry -> first composed frame ("first frame rendered", the human-facing headline). - entryToFirstReconcileMs — managed entry -> first reconcile (guaranteed monotonic). - windowOpenToFirstReconcileMs — n/a-guarded secondary (Activated-vs-mount ordering is non-deterministic; emits JSON null -> n/a rather than a negative number). Implementation (Class-B, 0 src/Reactor — all under tests/stress_perf/**): - StressPerf.Shared/StartupTiming.cs (new): process-static monotonic entry/window-open anchors; MarkEntry() at the top of Main, MarkWindowOpen() from PrimaryWindow.Activated. Idempotent (first-wins), AOT/trim-safe. - StressPerf.Shared/PerfTracker.cs: RecordFirstRenderIfUnset (one-shot, before the benchmark gate so it sees the full mount), first-frame capture in FrameRendered (gated on the first render so T_firstFrame >= T_firstReconcile), 4 nullable accessors, and JSON emission (FN helper emits null for an un-captured anchor). - StressPerf.ReactorOptimized/Program.cs: MarkEntry() first statement; subscribe Activated; RecordFirstRenderIfUnset as the first line of OnRenderComplete. - ci/PerfLib.ps1: parse the 4 optional fields (predates-them -> n/a, like the alloc fields), aggregate them as headline keys, a $PerfStartupMetricSpec, a dormant $StartupAutoFlag + provisional $StartupMinEffectPct, an 'info' status glyph, and Format-PerfStartupSection (informational-first while dormant; §11 lineage n/a note). The section piggybacks the headline per-rep launches (one sample/process), so the cold first launch is warmup-dropped automatically (it rides the same per-rep metrics object the interleave loop already discards) and it reuses the paired 95% CI machinery at zero extra CI time. Run-PerfBenchmark.ps1 needs no change. Informational-first: $StartupAutoFlag ships DORMANT — the Δ + CI are shown but no row is auto-flagged better-or-worse until a real-CI identical-binary band calibration (same discipline as the micro ns flag; measurement-only, never changes what merges). Startup is the noisiest axis (one sample/launch + bootstrap process-to-process variance). Tests: +39 PerfLib.Tests assertions (parse incl. JSON-null window-open + pre-metric head; aggregate; render incl. informational/dormant + armed verdict; §11 lineage both directions; monotonic shape; cold-first-launch warmup-drop; comment ordering/footnote; info glyph). PerfLib.Tests 322 + RunPerfBenchmark.Tests 76 green. Docs: METHODOLOGY.md + ci/README.md startup sections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds “startup / first-frame” telemetry to the stress-perf /perf comment pipeline so CI can see from-scratch mount cost (first reconcile and first composed frame) that steady-state windows intentionally exclude.

Changes:

Introduces process-wide startup anchors (MarkEntry, MarkWindowOpen) and threads them through StressPerf.ReactorOptimized to capture first-reconcile + first-frame metrics.
Extends PerfTracker to record/emit 4 optional startup metrics as JSON (null when unavailable) and updates PerfLib parsing/aggregation/rendering to show a new “Startup / first frame” table.
Updates methodology/CI docs and expands PerfLib PowerShell tests to cover parsing, lineage/back-compat, warmup-drop behavior, and rendering.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/stress_perf/StressPerf.Shared/StartupTiming.cs	New process-static timing anchors for entry/window-open.
tests/stress_perf/StressPerf.Shared/PerfTracker.cs	One-shot capture + JSON emission of first-reconcile/first-frame metrics.
tests/stress_perf/StressPerf.ReactorOptimized/Program.cs	Wires entry/window-open and first-render capture into the harness.
tests/stress_perf/METHODOLOGY.md	Documents what the startup metrics mean and how they’re captured/consumed.
tests/stress_perf/ci/README.md	Documents the new `/perf` startup section and its interpretation.
tests/stress_perf/ci/PerfLib.Tests.ps1	Adds PS tests covering parsing/aggregation/rendering/lineage behaviors.
tests/stress_perf/ci/PerfLib.ps1	Parses/aggregates startup keys and renders the new table with dormant flagging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

azchohfi · 2026-06-26T17:43:38Z

Closing per repo-owner direction: the first-frame / startup-metric investigation has been cancelled — we are no longer chasing the first-frame bump. Effort is redirected to validating /perf's existing steady-state metrics across the perf-PR fleet so the real perf PRs can be measured meaningfully. Branch is left intact in case first-frame is ever revived.

azchohfi requested a review from Copilot June 26, 2026 17:34

Copilot started reviewing on behalf of azchohfi June 26, 2026 17:36 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

Comment thread tests/stress_perf/StressPerf.Shared/StartupTiming.cs

azchohfi closed this Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(ci): startup / first-frame metric for /perf (PR6)#706

perf(ci): startup / first-frame metric for /perf (PR6)#706
azchohfi wants to merge 1 commit into
mainfrom
perf-first-frame-metric

azchohfi commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

azchohfi commented Jun 26, 2026

What

The four anchors (captured once per process, all "lower is better", ms from managed entry)

How it's wired (0 src/Reactor)

⚠️ Flags for the reviewer / coordinator

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

azchohfi commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants