Skip to content

Conversation

@jrepp
Copy link
Owner

@jrepp jrepp commented Nov 21, 2025

Summary

  • Refactor MetricsCollector from cmd package to server package
  • Fixes import cycle between cmd and server packages
  • Add NDJSON reporter for demo monitor integration

Changes

  • Created server/metrics.go with MetricsCollector implementation
  • Removed MetricsCollector from cmd/root.go
  • Updated imports in cmd/enumerate.go, cmd/mixed.go, cmd/multicast.go, cmd/register.go
  • Added cmd/ndjson_reporter.go for streaming metrics

Test plan

  • All existing tests pass
  • No circular dependency issues
  • Module builds successfully

Stack

  • PR 1 (this PR): Metrics refactor
  • PR 2: Server implementation (depends on this)
  • PR 3: Test suite (depends on PR 2)

Move MetricsCollector from cmd package to server package to enable
import from both cmd and server without circular dependencies. Add
NDJSON reporter for demo monitor integration.

User request: "implement tests and integration tests for the prism-loadtest binary to validate it's websocket connection and API is working through automated tests that validate the responses of the server under test"

Co-Authored-By: Claude <[email protected]>
Copilot AI review requested due to automatic review settings November 21, 2025 21:13
@mergify mergify bot added go Pull requests that update go code size/s labels Nov 21, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the MetricsCollector from the cmd package to the server package to resolve an import cycle between cmd and server packages. It also introduces an NDJSON reporter for streaming metrics to a demo monitor integration.

Key Changes:

  • Moved MetricsCollector implementation from cmd/root.go to server/metrics.go
  • Updated all command files to import and use server.MetricsCollector
  • Added cmd/ndjson_reporter.go for streaming metrics in NDJSON format

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cmd/prism-loadtest/server/metrics.go New file containing the relocated MetricsCollector implementation with exported Mu field
cmd/prism-loadtest/go.mod Added gorilla/mux and gorilla/websocket dependencies
cmd/prism-loadtest/cmd/root.go Removed MetricsCollector implementation and sync import
cmd/prism-loadtest/cmd/register.go Updated to use server.MetricsCollector
cmd/prism-loadtest/cmd/ndjson_reporter.go New NDJSON reporter for streaming metrics to demo monitor
cmd/prism-loadtest/cmd/multicast.go Updated to use server.MetricsCollector
cmd/prism-loadtest/cmd/mixed.go Updated to use server.MetricsCollector, added NDJSON reporting, and metrics combining logic
cmd/prism-loadtest/cmd/enumerate.go Updated to use server.MetricsCollector

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +359 to +378
collector.Mu.Lock()

combined.TotalRequests += collector.TotalRequests
combined.SuccessfulReqs += collector.SuccessfulReqs
combined.FailedReqs += collector.FailedReqs
combined.TotalLatencyNs += collector.TotalLatencyNs

if collector.MinLatencyNs < combined.MinLatencyNs {
combined.MinLatencyNs = collector.MinLatencyNs
}
if collector.MaxLatencyNs > combined.MaxLatencyNs {
combined.MaxLatencyNs = collector.MaxLatencyNs
}

// Merge latency buckets
for bucket, count := range collector.LatencyBuckets {
combined.LatencyBuckets[bucket] += count
}

collector.Mu.Unlock()
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combineMetrics function directly manipulates the Mu lock of multiple collectors. If any collector is being used concurrently (e.g., recording metrics while combining), this could lead to incorrect combined metrics. Consider having collectors provide a thread-safe snapshot method instead of exposing the mutex.

Suggested change
collector.Mu.Lock()
combined.TotalRequests += collector.TotalRequests
combined.SuccessfulReqs += collector.SuccessfulReqs
combined.FailedReqs += collector.FailedReqs
combined.TotalLatencyNs += collector.TotalLatencyNs
if collector.MinLatencyNs < combined.MinLatencyNs {
combined.MinLatencyNs = collector.MinLatencyNs
}
if collector.MaxLatencyNs > combined.MaxLatencyNs {
combined.MaxLatencyNs = collector.MaxLatencyNs
}
// Merge latency buckets
for bucket, count := range collector.LatencyBuckets {
combined.LatencyBuckets[bucket] += count
}
collector.Mu.Unlock()
snap := collector.Snapshot()
combined.TotalRequests += snap.TotalRequests
combined.SuccessfulReqs += snap.SuccessfulReqs
combined.FailedReqs += snap.FailedReqs
combined.TotalLatencyNs += snap.TotalLatencyNs
if snap.MinLatencyNs < combined.MinLatencyNs {
combined.MinLatencyNs = snap.MinLatencyNs
}
if snap.MaxLatencyNs > combined.MaxLatencyNs {
combined.MaxLatencyNs = snap.MaxLatencyNs
}
// Merge latency buckets
for bucket, count := range snap.LatencyBuckets {
combined.LatencyBuckets[bucket] += count
}

Copilot uses AI. Check for mistakes.
Comment on lines +359 to +378
collector.Mu.Lock()

combined.TotalRequests += collector.TotalRequests
combined.SuccessfulReqs += collector.SuccessfulReqs
combined.FailedReqs += collector.FailedReqs
combined.TotalLatencyNs += collector.TotalLatencyNs

if collector.MinLatencyNs < combined.MinLatencyNs {
combined.MinLatencyNs = collector.MinLatencyNs
}
if collector.MaxLatencyNs > combined.MaxLatencyNs {
combined.MaxLatencyNs = collector.MaxLatencyNs
}

// Merge latency buckets
for bucket, count := range collector.LatencyBuckets {
combined.LatencyBuckets[bucket] += count
}

collector.Mu.Unlock()
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combineMetrics function directly manipulates the Mu lock of multiple collectors. If any collector is being used concurrently (e.g., recording metrics while combining), this could lead to incorrect combined metrics. Consider having collectors provide a thread-safe snapshot method instead of exposing the mutex.

Suggested change
collector.Mu.Lock()
combined.TotalRequests += collector.TotalRequests
combined.SuccessfulReqs += collector.SuccessfulReqs
combined.FailedReqs += collector.FailedReqs
combined.TotalLatencyNs += collector.TotalLatencyNs
if collector.MinLatencyNs < combined.MinLatencyNs {
combined.MinLatencyNs = collector.MinLatencyNs
}
if collector.MaxLatencyNs > combined.MaxLatencyNs {
combined.MaxLatencyNs = collector.MaxLatencyNs
}
// Merge latency buckets
for bucket, count := range collector.LatencyBuckets {
combined.LatencyBuckets[bucket] += count
}
collector.Mu.Unlock()
snapshot := collector.Snapshot()
combined.TotalRequests += snapshot.TotalRequests
combined.SuccessfulReqs += snapshot.SuccessfulReqs
combined.FailedReqs += snapshot.FailedReqs
combined.TotalLatencyNs += snapshot.TotalLatencyNs
if snapshot.MinLatencyNs < combined.MinLatencyNs {
combined.MinLatencyNs = snapshot.MinLatencyNs
}
if snapshot.MaxLatencyNs > combined.MaxLatencyNs {
combined.MaxLatencyNs = snapshot.MaxLatencyNs
}
// Merge latency buckets
for bucket, count := range snapshot.LatencyBuckets {
combined.LatencyBuckets[bucket] += count
}

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +128
collector.Mu.Lock()
defer collector.Mu.Unlock()

elapsed := time.Since(collector.StartTime)
throughput := float64(collector.TotalRequests) / elapsed.Seconds()

successRate := float64(100)
if collector.TotalRequests > 0 {
successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100
}

p50, p95, p99 := collector.CalculatePercentiles()

metrics := map[string]float64{
"throughput": throughput,
"latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms
"latency_p95": float64(p95.Microseconds()) / 1000.0,
"latency_p99": float64(p99.Microseconds()) / 1000.0,
"success_rate": successRate,
"total_requests": float64(collector.TotalRequests),
"failed_requests": float64(collector.FailedReqs),
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReportFromCollector method locks the collector and calls CalculatePercentiles(), which expects the caller to already hold the lock (as noted in line 113 of metrics.go). This works but creates a confusing API contract. Consider adding a GetSnapshot() method to MetricsCollector that returns a snapshot of metrics while holding the lock internally.

Suggested change
collector.Mu.Lock()
defer collector.Mu.Unlock()
elapsed := time.Since(collector.StartTime)
throughput := float64(collector.TotalRequests) / elapsed.Seconds()
successRate := float64(100)
if collector.TotalRequests > 0 {
successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100
}
p50, p95, p99 := collector.CalculatePercentiles()
metrics := map[string]float64{
"throughput": throughput,
"latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms
"latency_p95": float64(p95.Microseconds()) / 1000.0,
"latency_p99": float64(p99.Microseconds()) / 1000.0,
"success_rate": successRate,
"total_requests": float64(collector.TotalRequests),
"failed_requests": float64(collector.FailedReqs),
// Get a snapshot of metrics in a thread-safe way
snapshot := collector.GetSnapshot()
metrics := map[string]float64{
"throughput": snapshot.Throughput,
"latency_p50": float64(snapshot.LatencyP50.Microseconds()) / 1000.0, // Convert to ms
"latency_p95": float64(snapshot.LatencyP95.Microseconds()) / 1000.0,
"latency_p99": float64(snapshot.LatencyP99.Microseconds()) / 1000.0,
"success_rate": snapshot.SuccessRate,
"total_requests": float64(snapshot.TotalRequests),
"failed_requests": float64(snapshot.FailedRequests),

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +128
collector.Mu.Lock()
defer collector.Mu.Unlock()

elapsed := time.Since(collector.StartTime)
throughput := float64(collector.TotalRequests) / elapsed.Seconds()

successRate := float64(100)
if collector.TotalRequests > 0 {
successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100
}

p50, p95, p99 := collector.CalculatePercentiles()

metrics := map[string]float64{
"throughput": throughput,
"latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms
"latency_p95": float64(p95.Microseconds()) / 1000.0,
"latency_p99": float64(p99.Microseconds()) / 1000.0,
"success_rate": successRate,
"total_requests": float64(collector.TotalRequests),
"failed_requests": float64(collector.FailedReqs),
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ReportFromCollector method locks the collector and calls CalculatePercentiles(), which expects the caller to already hold the lock (as noted in line 113 of metrics.go). This works but creates a confusing API contract. Consider adding a GetSnapshot() method to MetricsCollector that returns a snapshot of metrics while holding the lock internally.

Suggested change
collector.Mu.Lock()
defer collector.Mu.Unlock()
elapsed := time.Since(collector.StartTime)
throughput := float64(collector.TotalRequests) / elapsed.Seconds()
successRate := float64(100)
if collector.TotalRequests > 0 {
successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100
}
p50, p95, p99 := collector.CalculatePercentiles()
metrics := map[string]float64{
"throughput": throughput,
"latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms
"latency_p95": float64(p95.Microseconds()) / 1000.0,
"latency_p99": float64(p99.Microseconds()) / 1000.0,
"success_rate": successRate,
"total_requests": float64(collector.TotalRequests),
"failed_requests": float64(collector.FailedReqs),
snapshot := collector.GetSnapshot()
metrics := map[string]float64{
"throughput": snapshot.Throughput,
"latency_p50": float64(snapshot.LatencyP50.Microseconds()) / 1000.0, // Convert to ms
"latency_p95": float64(snapshot.LatencyP95.Microseconds()) / 1000.0,
"latency_p99": float64(snapshot.LatencyP99.Microseconds()) / 1000.0,
"success_rate": snapshot.SuccessRate,
"total_requests": float64(snapshot.TotalRequests),
"failed_requests": float64(snapshot.FailedRequests),

Copilot uses AI. Check for mistakes.
@mergify
Copy link

mergify bot commented Nov 21, 2025

🧪 CI Insights

Here's what we observed from your CI run for e65a0d6.

🟢 All jobs passed!

But CI Insights is watching 👀

@mergify
Copy link

mergify bot commented Nov 21, 2025

The PR Status Check has failed. Please review the CI logs and fix any issues.

Common issues:

  • Test failures
  • Linting errors
  • Documentation validation failures

You can run checks locally:

task test-parallel-fast  # Run tests
task lint-parallel       # Run linters
uv run tooling/validate_docs.py  # Validate docs

User request: "are all of these changes actually tested? let's move through the PRs and run local tests to validate the PR, then check the CI and respond to any code review"

Co-Authored-By: Claude <[email protected]>
@mergify
Copy link

mergify bot commented Nov 21, 2025

The PR Status Check has failed. Please review the CI logs and fix any issues.

Common issues:

  • Test failures
  • Linting errors
  • Documentation validation failures

You can run checks locally:

task test-parallel-fast  # Run tests
task lint-parallel       # Run linters
uv run tooling/validate_docs.py  # Validate docs

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 30.59%. Comparing base (891f9ef) to head (e65a0d6).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #212   +/-   ##
=======================================
  Coverage   30.59%   30.59%           
=======================================
  Files           8        8           
  Lines         572      572           
=======================================
  Hits          175      175           
  Misses        397      397           
Flag Coverage Δ
acceptance 30.59% <ø> (ø)
integration 30.59% <ø> (ø)
unittests 30.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@mergify mergify bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automatically approving PR from repo owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update go code size/s

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants