perf(ci): startup / first-frame metric for /perf (PR6)#706
Conversation
Capture the from-scratch mount cost that every steady-state /perf table excludes by construction (their windows baseline on the first benchmark tick, after the mount). This is where a #696-class mount/reconcile regression lives, and nothing else in the harness measured it — closing the user's originating "first frame rendered" question. StressPerf.ReactorOptimized now records four startup anchors once per process: - firstReconcileDurationMs — the Reactor-ISOLATED first-render reconcile-phase duration (first OnRenderComplete's reconcile arg = the same phase steady-state Avg Diff averages), undiluted by AOT/window/XAML bootstrap so a mount regression shows at full size rather than as a bootstrap-diluted blip. - entryToFirstFrameMs — managed entry -> first composed frame ("first frame rendered", the human-facing headline). - entryToFirstReconcileMs — managed entry -> first reconcile (guaranteed monotonic). - windowOpenToFirstReconcileMs — n/a-guarded secondary (Activated-vs-mount ordering is non-deterministic; emits JSON null -> n/a rather than a negative number). Implementation (Class-B, 0 src/Reactor — all under tests/stress_perf/**): - StressPerf.Shared/StartupTiming.cs (new): process-static monotonic entry/window-open anchors; MarkEntry() at the top of Main, MarkWindowOpen() from PrimaryWindow.Activated. Idempotent (first-wins), AOT/trim-safe. - StressPerf.Shared/PerfTracker.cs: RecordFirstRenderIfUnset (one-shot, before the benchmark gate so it sees the full mount), first-frame capture in FrameRendered (gated on the first render so T_firstFrame >= T_firstReconcile), 4 nullable accessors, and JSON emission (FN helper emits null for an un-captured anchor). - StressPerf.ReactorOptimized/Program.cs: MarkEntry() first statement; subscribe Activated; RecordFirstRenderIfUnset as the first line of OnRenderComplete. - ci/PerfLib.ps1: parse the 4 optional fields (predates-them -> n/a, like the alloc fields), aggregate them as headline keys, a $PerfStartupMetricSpec, a dormant $StartupAutoFlag + provisional $StartupMinEffectPct, an 'info' status glyph, and Format-PerfStartupSection (informational-first while dormant; §11 lineage n/a note). The section piggybacks the headline per-rep launches (one sample/process), so the cold first launch is warmup-dropped automatically (it rides the same per-rep metrics object the interleave loop already discards) and it reuses the paired 95% CI machinery at zero extra CI time. Run-PerfBenchmark.ps1 needs no change. Informational-first: $StartupAutoFlag ships DORMANT — the Δ + CI are shown but no row is auto-flagged better-or-worse until a real-CI identical-binary band calibration (same discipline as the micro ns flag; measurement-only, never changes what merges). Startup is the noisiest axis (one sample/launch + bootstrap process-to-process variance). Tests: +39 PerfLib.Tests assertions (parse incl. JSON-null window-open + pre-metric head; aggregate; render incl. informational/dormant + armed verdict; §11 lineage both directions; monotonic shape; cold-first-launch warmup-drop; comment ordering/footnote; info glyph). PerfLib.Tests 322 + RunPerfBenchmark.Tests 76 green. Docs: METHODOLOGY.md + ci/README.md startup sections. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds “startup / first-frame” telemetry to the stress-perf /perf comment pipeline so CI can see from-scratch mount cost (first reconcile and first composed frame) that steady-state windows intentionally exclude.
Changes:
- Introduces process-wide startup anchors (
MarkEntry,MarkWindowOpen) and threads them throughStressPerf.ReactorOptimizedto capture first-reconcile + first-frame metrics. - Extends
PerfTrackerto record/emit 4 optional startup metrics as JSON (nullwhen unavailable) and updates PerfLib parsing/aggregation/rendering to show a new “Startup / first frame” table. - Updates methodology/CI docs and expands PerfLib PowerShell tests to cover parsing, lineage/back-compat, warmup-drop behavior, and rendering.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/stress_perf/StressPerf.Shared/StartupTiming.cs | New process-static timing anchors for entry/window-open. |
| tests/stress_perf/StressPerf.Shared/PerfTracker.cs | One-shot capture + JSON emission of first-reconcile/first-frame metrics. |
| tests/stress_perf/StressPerf.ReactorOptimized/Program.cs | Wires entry/window-open and first-render capture into the harness. |
| tests/stress_perf/METHODOLOGY.md | Documents what the startup metrics mean and how they’re captured/consumed. |
| tests/stress_perf/ci/README.md | Documents the new /perf startup section and its interpretation. |
| tests/stress_perf/ci/PerfLib.Tests.ps1 | Adds PS tests covering parsing/aggregation/rendering/lineage behaviors. |
| tests/stress_perf/ci/PerfLib.ps1 | Parses/aggregates startup keys and renders the new table with dormant flagging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Closing per repo-owner direction: the first-frame / startup-metric investigation has been cancelled — we are no longer chasing the first-frame bump. Effort is redirected to validating /perf's existing steady-state metrics across the perf-PR fleet so the real perf PRs can be measured meaningfully. Branch is left intact in case first-frame is ever revived. |
What
Adds a startup / first-frame section to the
/perfcomment — the from-scratch mount cost that every steady-state table excludes by construction (they window-average over benchmark ticks, after the mount). This is exactly where a #696-class mount/reconcile regression lives, and nothing in the harness measured it. Closes the user's originating "first frame rendered" question.Class-B harness PR — 0
src/Reactorchanges (everything undertests/stress_perf/**).The four anchors (captured once per process, all "lower is better", ms from managed entry)
firstReconcileDurationMsOnRenderComplete'sreconcilearg)entryToFirstFrameMsentryToFirstReconcileMswindowOpenToFirstReconcileMsActivated→ first reconcileActivated-vs-mount ordering is non-deterministic across launches, so it emits JSONnull→ rendersn/arather than a garbage negative that would poison the paired CI.How it's wired (0 src/Reactor)
StressPerf.Shared/StartupTiming.cs(new) — process-static monotonic entry/window-open anchors (Stopwatch.GetTimestamp()), idempotent first-wins, AOT/trim-safe.StressPerf.Shared/PerfTracker.cs—RecordFirstRenderIfUnset(one-shot, called before the benchmark gate so it sees the full mount); first-frame capture gated soT_firstFrame ≥ T_firstReconcile; 4 nullable accessors; JSON emission (nullfor an un-captured anchor).StressPerf.ReactorOptimized/Program.cs—MarkEntry()as the first statement ofMain; subscribePrimaryWindow.Activated→MarkWindowOpen();RecordFirstRenderIfUnset(...)as the first line ofOnRenderComplete.ci/PerfLib.ps1— parse the 4 optional fields (a head that predates them →n/a, exactly like the alloc fields), aggregate them as headline keys (reuses the paired-95%-CI machinery),Format-PerfStartupSection, a dormant flag + provisional band, and aninfoglyph.The section piggybacks the existing headline per-rep launches (one sample/process) — so it costs zero extra CI time, and the cold first launch is warmup-dropped automatically because it rides the same per-rep metrics object that
Run-PerfBenchmark.ps1's interleave loop already discards.Run-PerfBenchmark.ps1needs no change.perf-compare.ymlbuilds the baseline exe from a separatemainworktree and overlays the PR via ProjectReference (notProgram.cs/PerfTracker.cs). So on this PR's/perfrun themainside predates the startup fields → emitsnull→ no paired Δ (the section renders a visible n/a note: "populates on the next/perfafter this lands"). The paired Δ + the identical-binary band calibration only populate on the first run after merge. The gate-clean run here can prove PR-side shape (fields present, monotonic,firstReconcile >> AvgDiff), not the paired delta.firstReconcileDurationMsis directly comparable to steady-state Avg Diff (same reconcile/diff-patch phase). Expected shape:firstReconcileDurationMs >> AvgDiffMs(mount creates every control; steady ticks patch a few).$StartupAutoFlagships$false— Δ + CI are shown but no row auto-flags better-or-worse until a real-CI identical-binary ~0-false-flag band calibration (same discipline as the micro ns flag; measurement-only, never changes what merges). Startup is the noisiest axis (one sample/launch + bootstrap process-to-process variance), so provisional band ≥ the micro band.entryToFirstFrameMs(managed entry → first composed frame) should match whatever the user's dashboard labels "first frame rendered". Flagging so the harness number is directly comparable to what they originally raised.Tests
+39
PerfLib.Testsassertions: parse (incl. JSON-nullwindow-open + pre-metric head), aggregate, render (informational/dormant + armed verdict), §11 lineage both directions, monotonic shape, cold-first-launch warmup-drop, comment ordering/footnote, info glyph. PerfLib.Tests 322 + RunPerfBenchmark.Tests 76 green.Docs:
METHODOLOGY.md+ci/README.mdstartup sections.🤖 Draft — held for coordinator independent-verify + merge (Class-B coordinator-merge discipline).