Skip to content

Conversation

@ilana-n
Copy link
Contributor

@ilana-n ilana-n commented Oct 24, 2025

GPU telemetry endpoint reachability logging

Problem

When the default DCGM endpoints (localhost:9400 and localhost:9401) were reachable, the system logged "2/0 endpoints reachable" instead of "2/2 endpoints reachable".

Root Cause

The _compute_endpoints_for_display method in telemetry_manager.py returned an empty list when _user_explicitly_configured_telemetry was False, causing endpoints_configured to be empty even if endpoints_reachable contained 2 endpoints.

Fix

Updated the endpoint display logic to show reachable default endpoints regardless of configuration method:

  • If reachable defaults exist → always include them in display
  • Properly combine user-provided endpoints with reachable defaults
  • Only return empty list when no defaults are reachable AND user didn't configure telemetry

Testing

  • Added assertions to two tests to ensure we test for this proper logging: test_configure_sends_enabled_status_when_endpoints_reachable and test_configure_no_shutdown_when_no_endpoints_reachable
  • Also fixed some old tests that were using print statements instead of proper assertions

@codecov
Copy link

codecov bot commented Oct 24, 2025

Codecov Report

❌ Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/aiperf/gpu_telemetry/telemetry_manager.py 83.33% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@ilana-n ilana-n force-pushed the ilana/fix-telemetry-endpoints-logs branch from a963b9d to ec0cf4e Compare October 24, 2025 20:48
Copy link
Contributor

@ajcasagrande ajcasagrande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants