-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor: Move MetricsCollector to server package #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Move MetricsCollector from cmd package to server package to enable import from both cmd and server without circular dependencies. Add NDJSON reporter for demo monitor integration. User request: "implement tests and integration tests for the prism-loadtest binary to validate it's websocket connection and API is working through automated tests that validate the responses of the server under test" Co-Authored-By: Claude <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the MetricsCollector from the cmd package to the server package to resolve an import cycle between cmd and server packages. It also introduces an NDJSON reporter for streaming metrics to a demo monitor integration.
Key Changes:
- Moved
MetricsCollectorimplementation fromcmd/root.gotoserver/metrics.go - Updated all command files to import and use
server.MetricsCollector - Added
cmd/ndjson_reporter.gofor streaming metrics in NDJSON format
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| cmd/prism-loadtest/server/metrics.go | New file containing the relocated MetricsCollector implementation with exported Mu field |
| cmd/prism-loadtest/go.mod | Added gorilla/mux and gorilla/websocket dependencies |
| cmd/prism-loadtest/cmd/root.go | Removed MetricsCollector implementation and sync import |
| cmd/prism-loadtest/cmd/register.go | Updated to use server.MetricsCollector |
| cmd/prism-loadtest/cmd/ndjson_reporter.go | New NDJSON reporter for streaming metrics to demo monitor |
| cmd/prism-loadtest/cmd/multicast.go | Updated to use server.MetricsCollector |
| cmd/prism-loadtest/cmd/mixed.go | Updated to use server.MetricsCollector, added NDJSON reporting, and metrics combining logic |
| cmd/prism-loadtest/cmd/enumerate.go | Updated to use server.MetricsCollector |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| collector.Mu.Lock() | ||
|
|
||
| combined.TotalRequests += collector.TotalRequests | ||
| combined.SuccessfulReqs += collector.SuccessfulReqs | ||
| combined.FailedReqs += collector.FailedReqs | ||
| combined.TotalLatencyNs += collector.TotalLatencyNs | ||
|
|
||
| if collector.MinLatencyNs < combined.MinLatencyNs { | ||
| combined.MinLatencyNs = collector.MinLatencyNs | ||
| } | ||
| if collector.MaxLatencyNs > combined.MaxLatencyNs { | ||
| combined.MaxLatencyNs = collector.MaxLatencyNs | ||
| } | ||
|
|
||
| // Merge latency buckets | ||
| for bucket, count := range collector.LatencyBuckets { | ||
| combined.LatencyBuckets[bucket] += count | ||
| } | ||
|
|
||
| collector.Mu.Unlock() |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combineMetrics function directly manipulates the Mu lock of multiple collectors. If any collector is being used concurrently (e.g., recording metrics while combining), this could lead to incorrect combined metrics. Consider having collectors provide a thread-safe snapshot method instead of exposing the mutex.
| collector.Mu.Lock() | |
| combined.TotalRequests += collector.TotalRequests | |
| combined.SuccessfulReqs += collector.SuccessfulReqs | |
| combined.FailedReqs += collector.FailedReqs | |
| combined.TotalLatencyNs += collector.TotalLatencyNs | |
| if collector.MinLatencyNs < combined.MinLatencyNs { | |
| combined.MinLatencyNs = collector.MinLatencyNs | |
| } | |
| if collector.MaxLatencyNs > combined.MaxLatencyNs { | |
| combined.MaxLatencyNs = collector.MaxLatencyNs | |
| } | |
| // Merge latency buckets | |
| for bucket, count := range collector.LatencyBuckets { | |
| combined.LatencyBuckets[bucket] += count | |
| } | |
| collector.Mu.Unlock() | |
| snap := collector.Snapshot() | |
| combined.TotalRequests += snap.TotalRequests | |
| combined.SuccessfulReqs += snap.SuccessfulReqs | |
| combined.FailedReqs += snap.FailedReqs | |
| combined.TotalLatencyNs += snap.TotalLatencyNs | |
| if snap.MinLatencyNs < combined.MinLatencyNs { | |
| combined.MinLatencyNs = snap.MinLatencyNs | |
| } | |
| if snap.MaxLatencyNs > combined.MaxLatencyNs { | |
| combined.MaxLatencyNs = snap.MaxLatencyNs | |
| } | |
| // Merge latency buckets | |
| for bucket, count := range snap.LatencyBuckets { | |
| combined.LatencyBuckets[bucket] += count | |
| } |
| collector.Mu.Lock() | ||
|
|
||
| combined.TotalRequests += collector.TotalRequests | ||
| combined.SuccessfulReqs += collector.SuccessfulReqs | ||
| combined.FailedReqs += collector.FailedReqs | ||
| combined.TotalLatencyNs += collector.TotalLatencyNs | ||
|
|
||
| if collector.MinLatencyNs < combined.MinLatencyNs { | ||
| combined.MinLatencyNs = collector.MinLatencyNs | ||
| } | ||
| if collector.MaxLatencyNs > combined.MaxLatencyNs { | ||
| combined.MaxLatencyNs = collector.MaxLatencyNs | ||
| } | ||
|
|
||
| // Merge latency buckets | ||
| for bucket, count := range collector.LatencyBuckets { | ||
| combined.LatencyBuckets[bucket] += count | ||
| } | ||
|
|
||
| collector.Mu.Unlock() |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combineMetrics function directly manipulates the Mu lock of multiple collectors. If any collector is being used concurrently (e.g., recording metrics while combining), this could lead to incorrect combined metrics. Consider having collectors provide a thread-safe snapshot method instead of exposing the mutex.
| collector.Mu.Lock() | |
| combined.TotalRequests += collector.TotalRequests | |
| combined.SuccessfulReqs += collector.SuccessfulReqs | |
| combined.FailedReqs += collector.FailedReqs | |
| combined.TotalLatencyNs += collector.TotalLatencyNs | |
| if collector.MinLatencyNs < combined.MinLatencyNs { | |
| combined.MinLatencyNs = collector.MinLatencyNs | |
| } | |
| if collector.MaxLatencyNs > combined.MaxLatencyNs { | |
| combined.MaxLatencyNs = collector.MaxLatencyNs | |
| } | |
| // Merge latency buckets | |
| for bucket, count := range collector.LatencyBuckets { | |
| combined.LatencyBuckets[bucket] += count | |
| } | |
| collector.Mu.Unlock() | |
| snapshot := collector.Snapshot() | |
| combined.TotalRequests += snapshot.TotalRequests | |
| combined.SuccessfulReqs += snapshot.SuccessfulReqs | |
| combined.FailedReqs += snapshot.FailedReqs | |
| combined.TotalLatencyNs += snapshot.TotalLatencyNs | |
| if snapshot.MinLatencyNs < combined.MinLatencyNs { | |
| combined.MinLatencyNs = snapshot.MinLatencyNs | |
| } | |
| if snapshot.MaxLatencyNs > combined.MaxLatencyNs { | |
| combined.MaxLatencyNs = snapshot.MaxLatencyNs | |
| } | |
| // Merge latency buckets | |
| for bucket, count := range snapshot.LatencyBuckets { | |
| combined.LatencyBuckets[bucket] += count | |
| } |
| collector.Mu.Lock() | ||
| defer collector.Mu.Unlock() | ||
|
|
||
| elapsed := time.Since(collector.StartTime) | ||
| throughput := float64(collector.TotalRequests) / elapsed.Seconds() | ||
|
|
||
| successRate := float64(100) | ||
| if collector.TotalRequests > 0 { | ||
| successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100 | ||
| } | ||
|
|
||
| p50, p95, p99 := collector.CalculatePercentiles() | ||
|
|
||
| metrics := map[string]float64{ | ||
| "throughput": throughput, | ||
| "latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms | ||
| "latency_p95": float64(p95.Microseconds()) / 1000.0, | ||
| "latency_p99": float64(p99.Microseconds()) / 1000.0, | ||
| "success_rate": successRate, | ||
| "total_requests": float64(collector.TotalRequests), | ||
| "failed_requests": float64(collector.FailedReqs), |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ReportFromCollector method locks the collector and calls CalculatePercentiles(), which expects the caller to already hold the lock (as noted in line 113 of metrics.go). This works but creates a confusing API contract. Consider adding a GetSnapshot() method to MetricsCollector that returns a snapshot of metrics while holding the lock internally.
| collector.Mu.Lock() | |
| defer collector.Mu.Unlock() | |
| elapsed := time.Since(collector.StartTime) | |
| throughput := float64(collector.TotalRequests) / elapsed.Seconds() | |
| successRate := float64(100) | |
| if collector.TotalRequests > 0 { | |
| successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100 | |
| } | |
| p50, p95, p99 := collector.CalculatePercentiles() | |
| metrics := map[string]float64{ | |
| "throughput": throughput, | |
| "latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms | |
| "latency_p95": float64(p95.Microseconds()) / 1000.0, | |
| "latency_p99": float64(p99.Microseconds()) / 1000.0, | |
| "success_rate": successRate, | |
| "total_requests": float64(collector.TotalRequests), | |
| "failed_requests": float64(collector.FailedReqs), | |
| // Get a snapshot of metrics in a thread-safe way | |
| snapshot := collector.GetSnapshot() | |
| metrics := map[string]float64{ | |
| "throughput": snapshot.Throughput, | |
| "latency_p50": float64(snapshot.LatencyP50.Microseconds()) / 1000.0, // Convert to ms | |
| "latency_p95": float64(snapshot.LatencyP95.Microseconds()) / 1000.0, | |
| "latency_p99": float64(snapshot.LatencyP99.Microseconds()) / 1000.0, | |
| "success_rate": snapshot.SuccessRate, | |
| "total_requests": float64(snapshot.TotalRequests), | |
| "failed_requests": float64(snapshot.FailedRequests), |
| collector.Mu.Lock() | ||
| defer collector.Mu.Unlock() | ||
|
|
||
| elapsed := time.Since(collector.StartTime) | ||
| throughput := float64(collector.TotalRequests) / elapsed.Seconds() | ||
|
|
||
| successRate := float64(100) | ||
| if collector.TotalRequests > 0 { | ||
| successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100 | ||
| } | ||
|
|
||
| p50, p95, p99 := collector.CalculatePercentiles() | ||
|
|
||
| metrics := map[string]float64{ | ||
| "throughput": throughput, | ||
| "latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms | ||
| "latency_p95": float64(p95.Microseconds()) / 1000.0, | ||
| "latency_p99": float64(p99.Microseconds()) / 1000.0, | ||
| "success_rate": successRate, | ||
| "total_requests": float64(collector.TotalRequests), | ||
| "failed_requests": float64(collector.FailedReqs), |
Copilot
AI
Nov 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ReportFromCollector method locks the collector and calls CalculatePercentiles(), which expects the caller to already hold the lock (as noted in line 113 of metrics.go). This works but creates a confusing API contract. Consider adding a GetSnapshot() method to MetricsCollector that returns a snapshot of metrics while holding the lock internally.
| collector.Mu.Lock() | |
| defer collector.Mu.Unlock() | |
| elapsed := time.Since(collector.StartTime) | |
| throughput := float64(collector.TotalRequests) / elapsed.Seconds() | |
| successRate := float64(100) | |
| if collector.TotalRequests > 0 { | |
| successRate = float64(collector.SuccessfulReqs) / float64(collector.TotalRequests) * 100 | |
| } | |
| p50, p95, p99 := collector.CalculatePercentiles() | |
| metrics := map[string]float64{ | |
| "throughput": throughput, | |
| "latency_p50": float64(p50.Microseconds()) / 1000.0, // Convert to ms | |
| "latency_p95": float64(p95.Microseconds()) / 1000.0, | |
| "latency_p99": float64(p99.Microseconds()) / 1000.0, | |
| "success_rate": successRate, | |
| "total_requests": float64(collector.TotalRequests), | |
| "failed_requests": float64(collector.FailedReqs), | |
| snapshot := collector.GetSnapshot() | |
| metrics := map[string]float64{ | |
| "throughput": snapshot.Throughput, | |
| "latency_p50": float64(snapshot.LatencyP50.Microseconds()) / 1000.0, // Convert to ms | |
| "latency_p95": float64(snapshot.LatencyP95.Microseconds()) / 1000.0, | |
| "latency_p99": float64(snapshot.LatencyP99.Microseconds()) / 1000.0, | |
| "success_rate": snapshot.SuccessRate, | |
| "total_requests": float64(snapshot.TotalRequests), | |
| "failed_requests": float64(snapshot.FailedRequests), |
🧪 CI InsightsHere's what we observed from your CI run for e65a0d6. 🟢 All jobs passed!But CI Insights is watching 👀 |
|
The PR Status Check has failed. Please review the CI logs and fix any issues. Common issues:
You can run checks locally: task test-parallel-fast # Run tests
task lint-parallel # Run linters
uv run tooling/validate_docs.py # Validate docs |
User request: "are all of these changes actually tested? let's move through the PRs and run local tests to validate the PR, then check the CI and respond to any code review" Co-Authored-By: Claude <[email protected]>
|
The PR Status Check has failed. Please review the CI logs and fix any issues. Common issues:
You can run checks locally: task test-parallel-fast # Run tests
task lint-parallel # Run linters
uv run tooling/validate_docs.py # Validate docs |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #212 +/- ##
=======================================
Coverage 30.59% 30.59%
=======================================
Files 8 8
Lines 572 572
=======================================
Hits 175 175
Misses 397 397
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Automatically approving PR from repo owner
Summary
Changes
server/metrics.gowith MetricsCollector implementationcmd/root.gocmd/enumerate.go,cmd/mixed.go,cmd/multicast.go,cmd/register.gocmd/ndjson_reporter.gofor streaming metricsTest plan
Stack