feat: add StatVar for numerically stable statistics in audit and logs commands by Copilot · Pull Request #26479 · github/gh-aw

Copilot · 2026-04-15T18:37:54Z

Summary

Adds a pkg/stats.StatVar statistical variable type and uses it throughout the audit and logs commands to compute richer, more statistically correct metrics.

Changes

New: `pkg/stats.StatVar`

Accumulates float64 observations incrementally
Uses Welford's online algorithm for numerically stable mean and variance (avoids catastrophic cancellation with large, tightly-clustered values)
Computes: count, sum, min, max, mean, population variance/stddev, sample variance/stddev (Bessel's correction), and exact median
Memory usage is documented: suitable for tens–hundreds of observations (typical agentic workflow scale)

`pkg/workflow/metrics.go` — `LogMetrics`

Added MedianTimeBetweenTurns time.Duration
Added StdDevTimeBetweenTurns time.Duration (sample stddev)

`pkg/cli/copilot_events_jsonl.go`

Replaced manual TBT accumulation loop with stats.StatVar
Now sets AvgTimeBetweenTurns, MaxTimeBetweenTurns, MedianTimeBetweenTurns, and StdDevTimeBetweenTurns

`pkg/cli/audit_cross_run.go` — `MetricsTrendData`

Added MedianCost, StdDevCost, MedianTokens, StdDevTokens, MedianTurns, StdDevTurns, MedianDurationNs, StdDevDurationNs
buildMetricsTrend() now uses StatVar for all four metric dimensions (cost, tokens, turns, duration), replacing verbose manual min/max/sum loops

`pkg/cli/health_metrics.go`

CalculateWorkflowHealth() now uses StatVar internally for duration, token, and cost averages

Test coverage

pkg/stats/statvar_test.go: 9 test cases covering empty, single, two-observation, odd/even count, known variance (Wikipedia example), numerical stability, median idempotency, and all-identical inputs

… commands - Add pkg/stats.StatVar using Welford's online algorithm (mean, variance, stddev, median, min, max, count, sum) - Add MedianTimeBetweenTurns and StdDevTimeBetweenTurns to workflow.LogMetrics - Use StatVar in copilot_events_jsonl.go for TBT computation - Add median/stddev fields to MetricsTrendData in audit_cross_run.go - Use StatVar in buildMetricsTrend() replacing manual accumulation - Use StatVar in health_metrics.go CalculateWorkflowHealth() Agent-Logs-Url: https://github.com/github/gh-aw/sessions/30f4a60c-5387-4216-b4a2-56643a7c1967 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

… memory usage Agent-Logs-Url: https://github.com/github/gh-aw/sessions/30f4a60c-5387-4216-b4a2-56643a7c1967 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot

Pull request overview

Introduces a new pkg/stats.StatVar utility for numerically stable, incremental descriptive statistics (Welford) and wires it into CLI audit/log health metrics to compute richer aggregates (median/stddev in addition to avg/min/max).

Changes:

Added pkg/stats.StatVar (mean/variance/stddev + exact median) with unit tests.
Extended workflow.LogMetrics with median + sample stddev for time-between-turns and populated these in copilot_events_jsonl parsing.
Refactored cross-run audit and workflow health computations to use StatVar for cost/tokens/turns/duration aggregation.

Show a summary per file

File	Description
pkg/stats/statvar.go	New streaming stats accumulator (Welford) + exact median support.
pkg/stats/statvar_test.go	Unit tests for `StatVar` behavior and numerical stability.
pkg/workflow/metrics.go	Adds median/stddev fields for time-between-turns in `LogMetrics`.
pkg/cli/copilot_events_jsonl.go	Uses `StatVar` to compute avg/max/median/stddev TBT from timestamps.
pkg/cli/audit_cross_run.go	Uses `StatVar` to compute cross-run avg/median/stddev for key metrics.
pkg/cli/health_metrics.go	Uses `StatVar` internally to compute workflow health averages.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 6/6 changed files
Comments generated: 3

Copilot · 2026-04-15T18:47:57Z

@@ -94,6 +96,9 @@ func CalculateWorkflowHealth(workflowName string, runs []WorkflowRun, threshold
 		totalDuration += run.Duration
 		totalTokens += run.TokenUsage
 		totalCost += run.EstimatedCost
+		durationStats.Add(float64(run.Duration))


totalDuration is accumulated but never used in this function (there is no TotalDuration field on WorkflowHealth). This will fail compilation due to an unused variable. Remove totalDuration and the totalDuration += run.Duration line, or add a corresponding output field if the total duration is intended to be reported.

Copilot · 2026-04-15T18:47:57Z

+		if v < s.min {
+			s.min = v
+		}
+		if v > s.max {
+			s.max = v


The doc comment says NaN and ±Inf "will propagate through all derived statistics", but Min()/Max() may not propagate when NaN is added after initial finite values (comparisons with NaN are false, leaving min/max unchanged). Either update the comment to reflect actual behavior, or explicitly detect NaN/Inf in Add and set min/max accordingly (e.g., if math.IsNaN(v) then force min/max to NaN).

Suggested change

if v < s.min {

s.min = v

}

if v > s.max {

s.max = v

if math.IsNaN(v) {

s.min = v

s.max = v

} else {

if v < s.min {

s.min = v

}

if v > s.max {

s.max = v

}

Copilot · 2026-04-15T18:47:58Z

+// MetricsTrendData contains aggregated cost, token, turn, and duration statistics
+// across multiple runs, with spike detection for anomalous runs.


The MetricsTrendData type comment is duplicated twice back-to-back, which makes the docs noisy and harder to read. Remove one of the duplicate two-line blocks so the doc comment appears only once.

Suggested change

// MetricsTrendData contains aggregated cost, token, turn, and duration statistics

// across multiple runs, with spike detection for anomalous runs.

pelikhan · 2026-04-15T18:51:36Z

@copilot review all comments

… comment, clarify NaN doc - health_metrics.go: remove unused totalDuration variable - audit_cross_run.go: remove duplicate MetricsTrendData comment block - statvar.go: clarify Add() doc to accurately reflect NaN/Min/Max behavior Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b393d657-9b75-407b-8c0e-e9dcd97e4bbe Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-15T19:00:04Z

@copilot review all comments

Addressed all three review comments in 39f35b1:

health_metrics.go — removed unused totalDuration variable and accumulation line
audit_cross_run.go — removed the duplicate MetricsTrendData comment block
statvar.go — updated the Add() doc comment to accurately state that NaN will propagate through sum/mean/variance but that Min/Max may not update correctly for NaN inputs due to IEEE 754 comparison semantics, and notes that in practice all inputs are finite (time.Duration, token counts, costs)

Copilot AI and others added 2 commits April 15, 2026 18:35

docs: address code review feedback - document statistical choices and…

65e09a9

… memory usage Agent-Logs-Url: https://github.com/github/gh-aw/sessions/30f4a60c-5387-4216-b4a2-56643a7c1967 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Apr 15, 2026

Copilot created this pull request from a session on behalf of pelikhan April 15, 2026 18:38 View session

Copilot finished work on behalf of pelikhan April 15, 2026 18:39

Copilot AI requested a review from pelikhan April 15, 2026 18:39

pelikhan marked this pull request as ready for review April 15, 2026 18:43

Copilot AI review requested due to automatic review settings April 15, 2026 18:43

Copilot started reviewing on behalf of pelikhan April 15, 2026 18:44 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Copilot started work on behalf of pelikhan April 15, 2026 18:51 View session

Copilot finished work on behalf of pelikhan April 15, 2026 19:00

Merge branch 'main' into copilot/update-audit-logs-statistics

1ec639c

pelikhan merged commit 01c6be5 into main Apr 15, 2026
51 checks passed

pelikhan deleted the copilot/update-audit-logs-statistics branch April 15, 2026 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add StatVar for numerically stable statistics in audit and logs commands#26479

feat: add StatVar for numerically stable statistics in audit and logs commands#26479
pelikhan merged 4 commits intomainfrom
copilot/update-audit-logs-statistics

Copilot AI commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

Copilot AI Apr 15, 2026

Uh oh!

pelikhan commented Apr 15, 2026

Uh oh!

Copilot AI commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// MetricsTrendData contains aggregated cost, token, turn, and duration statistics
		// across multiple runs, with spike detection for anomalous runs.

Conversation

Copilot AI commented Apr 15, 2026

Summary

Changes

New: pkg/stats.StatVar

pkg/workflow/metrics.go — LogMetrics

pkg/cli/copilot_events_jsonl.go

pkg/cli/audit_cross_run.go — MetricsTrendData

pkg/cli/health_metrics.go

Test coverage

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Apr 15, 2026

Uh oh!

Copilot AI commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New: `pkg/stats.StatVar`

`pkg/workflow/metrics.go` — `LogMetrics`

`pkg/cli/copilot_events_jsonl.go`

`pkg/cli/audit_cross_run.go` — `MetricsTrendData`

`pkg/cli/health_metrics.go`