Refactor: Move MetricsCollector to server package #212

jrepp · 2025-11-21T21:13:17Z

Summary

Refactor MetricsCollector from cmd package to server package
Fixes import cycle between cmd and server packages
Add NDJSON reporter for demo monitor integration

Changes

Created server/metrics.go with MetricsCollector implementation
Removed MetricsCollector from cmd/root.go
Updated imports in cmd/enumerate.go, cmd/mixed.go, cmd/multicast.go, cmd/register.go
Added cmd/ndjson_reporter.go for streaming metrics

Test plan

All existing tests pass
No circular dependency issues
Module builds successfully

Stack

PR 1 (this PR): Metrics refactor
PR 2: Server implementation (depends on this)
PR 3: Test suite (depends on PR 2)

Move MetricsCollector from cmd package to server package to enable import from both cmd and server without circular dependencies. Add NDJSON reporter for demo monitor integration. User request: "implement tests and integration tests for the prism-loadtest binary to validate it's websocket connection and API is working through automated tests that validate the responses of the server under test" Co-Authored-By: Claude <[email protected]>

Copilot

Pull request overview

This PR refactors the MetricsCollector from the cmd package to the server package to resolve an import cycle between cmd and server packages. It also introduces an NDJSON reporter for streaming metrics to a demo monitor integration.

Key Changes:

Moved MetricsCollector implementation from cmd/root.go to server/metrics.go
Updated all command files to import and use server.MetricsCollector
Added cmd/ndjson_reporter.go for streaming metrics in NDJSON format

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
cmd/prism-loadtest/server/metrics.go	New file containing the relocated MetricsCollector implementation with exported Mu field
cmd/prism-loadtest/go.mod	Added gorilla/mux and gorilla/websocket dependencies
cmd/prism-loadtest/cmd/root.go	Removed MetricsCollector implementation and sync import
cmd/prism-loadtest/cmd/register.go	Updated to use server.MetricsCollector
cmd/prism-loadtest/cmd/ndjson_reporter.go	New NDJSON reporter for streaming metrics to demo monitor
cmd/prism-loadtest/cmd/multicast.go	Updated to use server.MetricsCollector
cmd/prism-loadtest/cmd/mixed.go	Updated to use server.MetricsCollector, added NDJSON reporting, and metrics combining logic
cmd/prism-loadtest/cmd/enumerate.go	Updated to use server.MetricsCollector

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-21T21:14:32Z

cmd/prism-loadtest/cmd/mixed.go

+		collector.Mu.Lock()
+
+		combined.TotalRequests += collector.TotalRequests
+		combined.SuccessfulReqs += collector.SuccessfulReqs
+		combined.FailedReqs += collector.FailedReqs
+		combined.TotalLatencyNs += collector.TotalLatencyNs
+
+		if collector.MinLatencyNs < combined.MinLatencyNs {
+			combined.MinLatencyNs = collector.MinLatencyNs
+		}
+		if collector.MaxLatencyNs > combined.MaxLatencyNs {
+			combined.MaxLatencyNs = collector.MaxLatencyNs
+		}
+
+		// Merge latency buckets
+		for bucket, count := range collector.LatencyBuckets {
+			combined.LatencyBuckets[bucket] += count
+		}
+
+		collector.Mu.Unlock()


The combineMetrics function directly manipulates the Mu lock of multiple collectors. If any collector is being used concurrently (e.g., recording metrics while combining), this could lead to incorrect combined metrics. Consider having collectors provide a thread-safe snapshot method instead of exposing the mutex.

Suggested change

collector.Mu.Lock()

combined.TotalRequests += collector.TotalRequests

combined.SuccessfulReqs += collector.SuccessfulReqs

combined.FailedReqs += collector.FailedReqs

combined.TotalLatencyNs += collector.TotalLatencyNs

if collector.MinLatencyNs < combined.MinLatencyNs {

combined.MinLatencyNs = collector.MinLatencyNs

}

if collector.MaxLatencyNs > combined.MaxLatencyNs {

combined.MaxLatencyNs = collector.MaxLatencyNs

}

// Merge latency buckets

for bucket, count := range collector.LatencyBuckets {

combined.LatencyBuckets[bucket] += count

}

collector.Mu.Unlock()

snap := collector.Snapshot()

combined.TotalRequests += snap.TotalRequests

combined.SuccessfulReqs += snap.SuccessfulReqs

combined.FailedReqs += snap.FailedReqs

combined.TotalLatencyNs += snap.TotalLatencyNs

if snap.MinLatencyNs < combined.MinLatencyNs {

combined.MinLatencyNs = snap.MinLatencyNs

}

if snap.MaxLatencyNs > combined.MaxLatencyNs {

combined.MaxLatencyNs = snap.MaxLatencyNs

}

// Merge latency buckets

for bucket, count := range snap.LatencyBuckets {

combined.LatencyBuckets[bucket] += count

}

Copilot · 2025-11-21T21:14:33Z

cmd/prism-loadtest/cmd/mixed.go

+		collector.Mu.Lock()
+
+		combined.TotalRequests += collector.TotalRequests
+		combined.SuccessfulReqs += collector.SuccessfulReqs
+		combined.FailedReqs += collector.FailedReqs
+		combined.TotalLatencyNs += collector.TotalLatencyNs
+
+		if collector.MinLatencyNs < combined.MinLatencyNs {
+			combined.MinLatencyNs = collector.MinLatencyNs
+		}
+		if collector.MaxLatencyNs > combined.MaxLatencyNs {
+			combined.MaxLatencyNs = collector.MaxLatencyNs
+		}
+
+		// Merge latency buckets
+		for bucket, count := range collector.LatencyBuckets {
+			combined.LatencyBuckets[bucket] += count
+		}
+
+		collector.Mu.Unlock()


The combineMetrics function directly manipulates the Mu lock of multiple collectors. If any collector is being used concurrently (e.g., recording metrics while combining), this could lead to incorrect combined metrics. Consider having collectors provide a thread-safe snapshot method instead of exposing the mutex.

Suggested change

collector.Mu.Lock()

combined.TotalRequests += collector.TotalRequests

combined.SuccessfulReqs += collector.SuccessfulReqs

combined.FailedReqs += collector.FailedReqs

combined.TotalLatencyNs += collector.TotalLatencyNs

if collector.MinLatencyNs < combined.MinLatencyNs {

combined.MinLatencyNs = collector.MinLatencyNs

}

if collector.MaxLatencyNs > combined.MaxLatencyNs {

combined.MaxLatencyNs = collector.MaxLatencyNs

}

// Merge latency buckets

for bucket, count := range collector.LatencyBuckets {

combined.LatencyBuckets[bucket] += count

}

collector.Mu.Unlock()

snapshot := collector.Snapshot()

combined.TotalRequests += snapshot.TotalRequests

combined.SuccessfulReqs += snapshot.SuccessfulReqs

combined.FailedReqs += snapshot.FailedReqs

combined.TotalLatencyNs += snapshot.TotalLatencyNs

if snapshot.MinLatencyNs < combined.MinLatencyNs {

combined.MinLatencyNs = snapshot.MinLatencyNs

}

if snapshot.MaxLatencyNs > combined.MaxLatencyNs {

combined.MaxLatencyNs = snapshot.MaxLatencyNs

}

// Merge latency buckets

for bucket, count := range snapshot.LatencyBuckets {

combined.LatencyBuckets[bucket] += count

}

Copilot · 2025-11-21T21:14:33Z

cmd/prism-loadtest/cmd/ndjson_reporter.go

+	collector.Mu.Lock()
+	defer collector.Mu.Unlock()
+
+	elapsed := time.Since(collector.StartTime)
+	throughput := float64(collector.TotalRequests) / elapsed.Seconds()
+
+	successRate := float64(100)
+	if collector.TotalRequests > 0 {
+		successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100
+	}
+
+	p50, p95, p99 := collector.CalculatePercentiles()
+
+	metrics := map[string]float64{
+		"throughput":      throughput,
+		"latency_p50":     float64(p50.Microseconds()) / 1000.0, // Convert to ms
+		"latency_p95":     float64(p95.Microseconds()) / 1000.0,
+		"latency_p99":     float64(p99.Microseconds()) / 1000.0,
+		"success_rate":    successRate,
+		"total_requests":  float64(collector.TotalRequests),
+		"failed_requests": float64(collector.FailedReqs),


The ReportFromCollector method locks the collector and calls CalculatePercentiles(), which expects the caller to already hold the lock (as noted in line 113 of metrics.go). This works but creates a confusing API contract. Consider adding a GetSnapshot() method to MetricsCollector that returns a snapshot of metrics while holding the lock internally.

Suggested change

collector.Mu.Lock()

defer collector.Mu.Unlock()

elapsed := time.Since(collector.StartTime)

throughput := float64(collector.TotalRequests) / elapsed.Seconds()

successRate := float64(100)

if collector.TotalRequests > 0 {

successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100

}

p50, p95, p99 := collector.CalculatePercentiles()

metrics := map[string]float64{

"throughput": throughput,

"latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms

"latency_p95": float64(p95.Microseconds()) / 1000.0,

"latency_p99": float64(p99.Microseconds()) / 1000.0,

"success_rate": successRate,

"total_requests": float64(collector.TotalRequests),

"failed_requests": float64(collector.FailedReqs),

// Get a snapshot of metrics in a thread-safe way

snapshot := collector.GetSnapshot()

metrics := map[string]float64{

"throughput": snapshot.Throughput,

"latency_p50": float64(snapshot.LatencyP50.Microseconds()) / 1000.0, // Convert to ms

"latency_p95": float64(snapshot.LatencyP95.Microseconds()) / 1000.0,

"latency_p99": float64(snapshot.LatencyP99.Microseconds()) / 1000.0,

"success_rate": snapshot.SuccessRate,

"total_requests": float64(snapshot.TotalRequests),

"failed_requests": float64(snapshot.FailedRequests),

Copilot · 2025-11-21T21:14:33Z

cmd/prism-loadtest/cmd/ndjson_reporter.go

+	collector.Mu.Lock()
+	defer collector.Mu.Unlock()
+
+	elapsed := time.Since(collector.StartTime)
+	throughput := float64(collector.TotalRequests) / elapsed.Seconds()
+
+	successRate := float64(100)
+	if collector.TotalRequests > 0 {
+		successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100
+	}
+
+	p50, p95, p99 := collector.CalculatePercentiles()
+
+	metrics := map[string]float64{
+		"throughput":      throughput,
+		"latency_p50":     float64(p50.Microseconds()) / 1000.0, // Convert to ms
+		"latency_p95":     float64(p95.Microseconds()) / 1000.0,
+		"latency_p99":     float64(p99.Microseconds()) / 1000.0,
+		"success_rate":    successRate,
+		"total_requests":  float64(collector.TotalRequests),
+		"failed_requests": float64(collector.FailedReqs),


The ReportFromCollector method locks the collector and calls CalculatePercentiles(), which expects the caller to already hold the lock (as noted in line 113 of metrics.go). This works but creates a confusing API contract. Consider adding a GetSnapshot() method to MetricsCollector that returns a snapshot of metrics while holding the lock internally.

Suggested change

collector.Mu.Lock()

defer collector.Mu.Unlock()

elapsed := time.Since(collector.StartTime)

throughput := float64(collector.TotalRequests) / elapsed.Seconds()

successRate := float64(100)

if collector.TotalRequests > 0 {

successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100

}

p50, p95, p99 := collector.CalculatePercentiles()

metrics := map[string]float64{

"throughput": throughput,

"latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms

"latency_p95": float64(p95.Microseconds()) / 1000.0,

"latency_p99": float64(p99.Microseconds()) / 1000.0,

"success_rate": successRate,

"total_requests": float64(collector.TotalRequests),

"failed_requests": float64(collector.FailedReqs),

snapshot := collector.GetSnapshot()

metrics := map[string]float64{

"throughput": snapshot.Throughput,

"latency_p50": float64(snapshot.LatencyP50.Microseconds()) / 1000.0, // Convert to ms

"latency_p95": float64(snapshot.LatencyP95.Microseconds()) / 1000.0,

"latency_p99": float64(snapshot.LatencyP99.Microseconds()) / 1000.0,

"success_rate": snapshot.SuccessRate,

"total_requests": float64(snapshot.TotalRequests),

"failed_requests": float64(snapshot.FailedRequests),

mergify · 2025-11-21T21:39:44Z

🧪 CI Insights

Here's what we observed from your CI run for e65a0d6.

🟢 All jobs passed!

But CI Insights is watching 👀

mergify · 2025-11-21T21:40:06Z

The PR Status Check has failed. Please review the CI logs and fix any issues.

Common issues:

Test failures
Linting errors
Documentation validation failures

You can run checks locally:

task test-parallel-fast  # Run tests
task lint-parallel       # Run linters
uv run tooling/validate_docs.py  # Validate docs

User request: "are all of these changes actually tested? let's move through the PRs and run local tests to validate the PR, then check the CI and respond to any code review" Co-Authored-By: Claude <[email protected]>

mergify · 2025-11-21T22:42:24Z

The PR Status Check has failed. Please review the CI logs and fix any issues.

Common issues:

Test failures
Linting errors
Documentation validation failures

You can run checks locally:

task test-parallel-fast  # Run tests
task lint-parallel       # Run linters
uv run tooling/validate_docs.py  # Validate docs

codecov-commenter · 2025-11-25T18:19:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 30.59%. Comparing base (891f9ef) to head (e65a0d6).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #212   +/-   ##
=======================================
  Coverage   30.59%   30.59%           
=======================================
  Files           8        8           
  Lines         572      572           
=======================================
  Hits          175      175           
  Misses        397      397

Flag	Coverage Δ
acceptance	`30.59% <ø> (ø)`
integration	`30.59% <ø> (ø)`
unittests	`30.59% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mergify

Automatically approving PR from repo owner

Copilot AI review requested due to automatic review settings November 21, 2025 21:13

jrepp mentioned this pull request Nov 21, 2025

Add loadtest HTTP server with WebSocket support #213

Open

5 tasks

mergify bot added go Pull requests that update go code size/s labels Nov 21, 2025

jrepp mentioned this pull request Nov 21, 2025

Add comprehensive test suite for loadtest server #214

Open

4 tasks

Copilot AI reviewed Nov 21, 2025

View reviewed changes

Fix unused import in root.go

1ac231f

User request: "are all of these changes actually tested? let's move through the PRs and run local tests to validate the PR, then check the CI and respond to any code review" Co-Authored-By: Claude <[email protected]>

Merge branch 'main' into loadtest-metrics-refactor

e65a0d6

mergify bot approved these changes Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor: Move MetricsCollector to server package #212

Refactor: Move MetricsCollector to server package #212

Uh oh!

jrepp commented Nov 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

Copilot AI Nov 21, 2025

Uh oh!

mergify bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025

Uh oh!

mergify bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refactor: Move MetricsCollector to server package #212

Are you sure you want to change the base?

Refactor: Move MetricsCollector to server package #212

Uh oh!

Conversation

jrepp commented Nov 21, 2025

Summary

Changes

Test plan

Stack

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧪 CI Insights

🟢 All jobs passed!

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

mergify bot commented Nov 21, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025

Codecov Report

Uh oh!

mergify bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mergify bot commented Nov 21, 2025 •

edited

Loading