Skip to content

perf(ci): startup / first-frame metric for /perf (PR6)#706

Closed
azchohfi wants to merge 1 commit into
mainfrom
perf-first-frame-metric
Closed

perf(ci): startup / first-frame metric for /perf (PR6)#706
azchohfi wants to merge 1 commit into
mainfrom
perf-first-frame-metric

Conversation

@azchohfi

Copy link
Copy Markdown
Collaborator

What

Adds a startup / first-frame section to the /perf comment — the from-scratch mount cost that every steady-state table excludes by construction (they window-average over benchmark ticks, after the mount). This is exactly where a #696-class mount/reconcile regression lives, and nothing in the harness measured it. Closes the user's originating "first frame rendered" question.

Class-B harness PR — 0 src/Reactor changes (everything under tests/stress_perf/**).

The four anchors (captured once per process, all "lower is better", ms from managed entry)

Field What it is Why
firstReconcileDurationMs Reactor-isolated first-render reconcile-phase duration (the first OnRenderComplete's reconcile arg) The diagnostic. Same phase steady-state Avg Diff averages, so it's directly comparable — but undiluted by AOT/window/XAML bootstrap, so a 2× mount regression shows at full size instead of as a single-digit-% bootstrap blip. This is the field that makes the metric catch the user's class of bug.
entryToFirstFrameMs managed entry → first composed frame The human-facing "first frame rendered" headline.
entryToFirstReconcileMs managed entry → first reconcile Guaranteed-monotonic companion.
windowOpenToFirstReconcileMs window Activated → first reconcile n/a-guarded secondary — Activated-vs-mount ordering is non-deterministic across launches, so it emits JSON null → renders n/a rather than a garbage negative that would poison the paired CI.

How it's wired (0 src/Reactor)

  • StressPerf.Shared/StartupTiming.cs (new) — process-static monotonic entry/window-open anchors (Stopwatch.GetTimestamp()), idempotent first-wins, AOT/trim-safe.
  • StressPerf.Shared/PerfTracker.csRecordFirstRenderIfUnset (one-shot, called before the benchmark gate so it sees the full mount); first-frame capture gated so T_firstFrame ≥ T_firstReconcile; 4 nullable accessors; JSON emission (null for an un-captured anchor).
  • StressPerf.ReactorOptimized/Program.csMarkEntry() as the first statement of Main; subscribe PrimaryWindow.ActivatedMarkWindowOpen(); RecordFirstRenderIfUnset(...) as the first line of OnRenderComplete.
  • ci/PerfLib.ps1 — parse the 4 optional fields (a head that predates them → n/a, exactly like the alloc fields), aggregate them as headline keys (reuses the paired-95%-CI machinery), Format-PerfStartupSection, a dormant flag + provisional band, and an info glyph.

The section piggybacks the existing headline per-rep launches (one sample/process) — so it costs zero extra CI time, and the cold first launch is warmup-dropped automatically because it rides the same per-rep metrics object that Run-PerfBenchmark.ps1's interleave loop already discards. Run-PerfBenchmark.ps1 needs no change.

⚠️ Flags for the reviewer / coordinator

  1. This introducing run renders PR-side shape only — by design. perf-compare.yml builds the baseline exe from a separate main worktree and overlays the PR via ProjectReference (not Program.cs/PerfTracker.cs). So on this PR's /perf run the main side predates the startup fields → emits null → no paired Δ (the section renders a visible n/a note: "populates on the next /perf after this lands"). The paired Δ + the identical-binary band calibration only populate on the first run after merge. The gate-clean run here can prove PR-side shape (fields present, monotonic, firstReconcile >> AvgDiff), not the paired delta.
  2. firstReconcileDurationMs is directly comparable to steady-state Avg Diff (same reconcile/diff-patch phase). Expected shape: firstReconcileDurationMs >> AvgDiffMs (mount creates every control; steady ticks patch a few).
  3. Informational-first (dormant flag). $StartupAutoFlag ships $false — Δ + CI are shown but no row auto-flags better-or-worse until a real-CI identical-binary ~0-false-flag band calibration (same discipline as the micro ns flag; measurement-only, never changes what merges). Startup is the noisiest axis (one sample/launch + bootstrap process-to-process variance), so provisional band ≥ the micro band.
  4. Carry-for-user (please confirm): the headline entryToFirstFrameMs (managed entry → first composed frame) should match whatever the user's dashboard labels "first frame rendered". Flagging so the harness number is directly comparable to what they originally raised.

Tests

+39 PerfLib.Tests assertions: parse (incl. JSON-null window-open + pre-metric head), aggregate, render (informational/dormant + armed verdict), §11 lineage both directions, monotonic shape, cold-first-launch warmup-drop, comment ordering/footnote, info glyph. PerfLib.Tests 322 + RunPerfBenchmark.Tests 76 green.

Docs: METHODOLOGY.md + ci/README.md startup sections.


🤖 Draft — held for coordinator independent-verify + merge (Class-B coordinator-merge discipline).

Capture the from-scratch mount cost that every steady-state /perf table excludes by
construction (their windows baseline on the first benchmark tick, after the mount).
This is where a #696-class mount/reconcile regression lives, and nothing else in the
harness measured it — closing the user's originating "first frame rendered" question.

StressPerf.ReactorOptimized now records four startup anchors once per process:
  - firstReconcileDurationMs  — the Reactor-ISOLATED first-render reconcile-phase
    duration (first OnRenderComplete's reconcile arg = the same phase steady-state
    Avg Diff averages), undiluted by AOT/window/XAML bootstrap so a mount regression
    shows at full size rather than as a bootstrap-diluted blip.
  - entryToFirstFrameMs       — managed entry -> first composed frame ("first frame
    rendered", the human-facing headline).
  - entryToFirstReconcileMs   — managed entry -> first reconcile (guaranteed monotonic).
  - windowOpenToFirstReconcileMs — n/a-guarded secondary (Activated-vs-mount ordering
    is non-deterministic; emits JSON null -> n/a rather than a negative number).

Implementation (Class-B, 0 src/Reactor — all under tests/stress_perf/**):
  - StressPerf.Shared/StartupTiming.cs (new): process-static monotonic entry/window-open
    anchors; MarkEntry() at the top of Main, MarkWindowOpen() from PrimaryWindow.Activated.
    Idempotent (first-wins), AOT/trim-safe.
  - StressPerf.Shared/PerfTracker.cs: RecordFirstRenderIfUnset (one-shot, before the
    benchmark gate so it sees the full mount), first-frame capture in FrameRendered
    (gated on the first render so T_firstFrame >= T_firstReconcile), 4 nullable accessors,
    and JSON emission (FN helper emits null for an un-captured anchor).
  - StressPerf.ReactorOptimized/Program.cs: MarkEntry() first statement; subscribe
    Activated; RecordFirstRenderIfUnset as the first line of OnRenderComplete.
  - ci/PerfLib.ps1: parse the 4 optional fields (predates-them -> n/a, like the alloc
    fields), aggregate them as headline keys, a $PerfStartupMetricSpec, a dormant
    $StartupAutoFlag + provisional $StartupMinEffectPct, an 'info' status glyph, and
    Format-PerfStartupSection (informational-first while dormant; §11 lineage n/a note).

The section piggybacks the headline per-rep launches (one sample/process), so the cold
first launch is warmup-dropped automatically (it rides the same per-rep metrics object
the interleave loop already discards) and it reuses the paired 95% CI machinery at zero
extra CI time. Run-PerfBenchmark.ps1 needs no change.

Informational-first: $StartupAutoFlag ships DORMANT — the Δ + CI are shown but no row is
auto-flagged better-or-worse until a real-CI identical-binary band calibration (same
discipline as the micro ns flag; measurement-only, never changes what merges). Startup is
the noisiest axis (one sample/launch + bootstrap process-to-process variance).

Tests: +39 PerfLib.Tests assertions (parse incl. JSON-null window-open + pre-metric head;
aggregate; render incl. informational/dormant + armed verdict; §11 lineage both
directions; monotonic shape; cold-first-launch warmup-drop; comment ordering/footnote;
info glyph). PerfLib.Tests 322 + RunPerfBenchmark.Tests 76 green.

Docs: METHODOLOGY.md + ci/README.md startup sections.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “startup / first-frame” telemetry to the stress-perf /perf comment pipeline so CI can see from-scratch mount cost (first reconcile and first composed frame) that steady-state windows intentionally exclude.

Changes:

  • Introduces process-wide startup anchors (MarkEntry, MarkWindowOpen) and threads them through StressPerf.ReactorOptimized to capture first-reconcile + first-frame metrics.
  • Extends PerfTracker to record/emit 4 optional startup metrics as JSON (null when unavailable) and updates PerfLib parsing/aggregation/rendering to show a new “Startup / first frame” table.
  • Updates methodology/CI docs and expands PerfLib PowerShell tests to cover parsing, lineage/back-compat, warmup-drop behavior, and rendering.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/stress_perf/StressPerf.Shared/StartupTiming.cs New process-static timing anchors for entry/window-open.
tests/stress_perf/StressPerf.Shared/PerfTracker.cs One-shot capture + JSON emission of first-reconcile/first-frame metrics.
tests/stress_perf/StressPerf.ReactorOptimized/Program.cs Wires entry/window-open and first-render capture into the harness.
tests/stress_perf/METHODOLOGY.md Documents what the startup metrics mean and how they’re captured/consumed.
tests/stress_perf/ci/README.md Documents the new /perf startup section and its interpretation.
tests/stress_perf/ci/PerfLib.Tests.ps1 Adds PS tests covering parsing/aggregation/rendering/lineage behaviors.
tests/stress_perf/ci/PerfLib.ps1 Parses/aggregates startup keys and renders the new table with dormant flagging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/stress_perf/StressPerf.Shared/StartupTiming.cs
@azchohfi

Copy link
Copy Markdown
Collaborator Author

Closing per repo-owner direction: the first-frame / startup-metric investigation has been cancelled — we are no longer chasing the first-frame bump. Effort is redirected to validating /perf's existing steady-state metrics across the perf-PR fleet so the real perf PRs can be measured meaningfully. Branch is left intact in case first-frame is ever revived.

@azchohfi azchohfi closed this Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants