ai-dynamo · ilana-n · Oct 30, 2025 · Oct 17, 2025 · Oct 20, 2025 · Oct 20, 2025
diff --git a/README.md b/README.md
@@ -57,6 +57,7 @@ Features
 | **[Sequence Distributions](docs/tutorials/sequence-distributions.md)** | Mixed ISL/OSL pairings | Benchmarking mixed use cases |
 | **[Goodput](docs/tutorials/goodput.md)** | Throughput of requests meeting user-defined SLOs | SLO validation, capacity planning, runtime/model comparisons |
 | **[Request Rate with Max Concurrency](docs/tutorials/request-rate-concurrency.md)** | Dual control of request timing and concurrent connection ceiling (Poisson or constant modes) | Testing API rate/concurrency limits, avoiding thundering herd, realistic client simulation |
+| **[GPU Telemetry](docs/tutorials/gpu-telemetry.md)** | Real-time GPU metrics collection via DCGM (power, utilization, memory, temperature, etc) | Performance optimization, resource monitoring, multi-node telemetry |
 | **[Template Endpoint](docs/tutorials/template-endpoint.md)** | Benchmark custom APIs with flexible Jinja2 request templates | Custom API formats, rapid prototyping, non-standard endpoints |
 
 ### Working with Benchmark Data

diff --git a/docs/tutorials/gpu-telemetry.md b/docs/tutorials/gpu-telemetry.md
@@ -12,7 +12,7 @@ This guide shows you how to collect GPU metrics (power, utilization, memory, tem
 This guide covers two setup paths depending on your inference backend:
 
 ### Path 1: Dynamo (Built-in DCGM)
-If you're using **Dynamo**, it comes with DCGM pre-configured on port 9401. No additional setup needed! Just use the `--gpu-telemetry` flag to enable console display and optionally add additional DCGM url endpoints.
+If you're using **Dynamo**, it comes with DCGM pre-configured on port 9401. No additional setup needed! Just use the `--gpu-telemetry` flag to enable console display and optionally add additional DCGM url endpoints. URLs can be specified with or without the `http://` prefix (e.g., `localhost:9400` or `http://localhost:9400`).
 
 ### Path 2: Other Inference Servers (Custom DCGM)
 If you're using **any other inference backend**, you'll need to set up DCGM separately.
@@ -28,15 +28,28 @@ AIPerf provides GPU telemetry collection with the `--gpu-telemetry` flag. Here's
 
 ### How the `--gpu-telemetry` Flag Works
 
-| Usage | Command | What Gets Collected (If Available) | Console Display | CSV/JSON Export |
-|-------|---------|---------------------|-----------------|-----------------|
-| **No flag** | `aiperf profile --model MODEL ...` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` | ❌ No | ✅ Yes |
-| **Flag only** | `aiperf profile --model MODEL ... --gpu-telemetry` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` | ✅ Yes | ✅ Yes |
-| **Custom URLs** | `aiperf profile --model MODEL ... --gpu-telemetry http://node1:9400/metrics http://node2:9400/metrics` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` + custom URLs | ✅ Yes | ✅ Yes |
+| Usage | Command | What Gets Collected (If Available) | Console Display | Dashboard View | CSV/JSON Export |
+|-------|---------|---------------------|-----------------|----------------|-----------------|
+| **No flag** | `aiperf profile --model MODEL ...` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` | ❌ No | ❌ No | ✅ Yes |
+| **Flag only** | `aiperf profile --model MODEL ... --gpu-telemetry` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` | ✅ Yes | ❌ No | ✅ Yes |
+| **Dashboard mode** | `aiperf profile --model MODEL ... --gpu-telemetry dashboard` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` | ✅ Yes | ✅ Yes | ✅ Yes |
+| **Custom URLs** | `aiperf profile --model MODEL ... --gpu-telemetry node1:9400 http://node2:9400/metrics` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` + custom URLs | ✅ Yes | ❌ No | ✅ Yes |
+| **Dashboard + URLs** | `aiperf profile --model MODEL ... --gpu-telemetry dashboard localhost:9400` | `http://localhost:9400/metrics` + `http://localhost:9401/metrics` + custom URLs | ✅ Yes | ✅ Yes | ✅ Yes |
 
 > [!IMPORTANT]
 > The default endpoints `http://localhost:9400/metrics` and `http://localhost:9401/metrics` are ALWAYS attempted for telemetry collection, regardless of whether the `--gpu-telemetry` flag is used. The flag primarily controls whether metrics are displayed on the console and allows you to specify additional custom DCGM exporter endpoints.
 
+> [!NOTE]
+> When specifying custom DCGM exporter URLs, the `http://` prefix is optional. URLs like `localhost:9400` will automatically be treated as `http://localhost:9400`. Both formats work identically.
+
+### Real-Time Dashboard View
+
+Adding `dashboard` to the `--gpu-telemetry` flag enables a live terminal UI (TUI) that displays GPU metrics in real-time during your benchmark runs:
+
+```bash
+aiperf profile --model MODEL ... --gpu-telemetry dashboard
+```
+
 ---
 
 # 1: Using Dynamo
@@ -48,7 +61,7 @@ Dynamo includes DCGM out of the box on port 9401 - no extra setup needed!
 ```bash
 # Set environment variables
 export AIPERF_REPO_TAG="main"
-export DYNAMO_PREBUILT_IMAGE_TAG="nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.0"
+export DYNAMO_PREBUILT_IMAGE_TAG="nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.5.1"
 export MODEL="Qwen/Qwen3-0.6B"
 
 # Download the Dynamo container
@@ -99,7 +112,7 @@ uv pip install ./aiperf
 
 ```bash
 # Wait for Dynamo API to be ready (up to 15 minutes)
-timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen3-0.6B\",\"messages\":[{\"role\":\"user\",\"content\":\"a\"}],\"max_completion_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "Dynamo not ready after 15min"; exit 1; }
+timeout 900 bash -c 'while [ "$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"Qwen/Qwen3-0.6B\",\"messages\":[{\"role\":\"user\",\"content\":\"a\"}],\"max_completion_tokens\":1}")" != "200" ]; do sleep 2; done' || { echo "Dynamo not ready after 15min"; exit 1; }
 ```
 ```bash
 # Wait for DCGM Exporter to be ready (up to 2 minutes after Dynamo is ready)
@@ -116,7 +129,7 @@ aiperf profile \
     --endpoint-type chat \
     --endpoint /v1/chat/completions \
     --streaming \
-    --url localhost:8080 \
+    --url localhost:8000 \
     --synthetic-input-tokens-mean 100 \
     --synthetic-input-tokens-stddev 0 \
     --output-tokens-mean 200 \
@@ -131,6 +144,9 @@ aiperf profile \
     --gpu-telemetry
 ```
 
+> [!TIP]
+> The `dashboard` keyword enables a live terminal UI for real-time GPU telemetry visualization. Press `5` to maximize the GPU Telemetry panel during the benchmark run.
+
 ---
 
 # 2: Using Other Inference Server
@@ -279,6 +295,12 @@ aiperf profile \
     --gpu-telemetry
 ```
 
+> [!TIP]
+> The `dashboard` keyword enables a live terminal UI for real-time GPU telemetry visualization. Press `5` to maximize the GPU Telemetry panel during the benchmark run.
+
+> [!TIP]
+> The `dashboard` keyword enables a live terminal UI for real-time GPU telemetry visualization. Press `5` to maximize the GPU Telemetry panel during the benchmark run.
+
 ## Multi-Node GPU Telemetry Example
 
 For distributed setups with multiple nodes, you can collect GPU telemetry from all nodes simultaneously:
@@ -287,12 +309,13 @@ For distributed setups with multiple nodes, you can collect GPU telemetry from a
 # Example: Collecting telemetry from 3 nodes in a distributed setup
 # Note: The default endpoints http://localhost:9400/metrics and http://localhost:9401/metrics
 #       are always attempted in addition to these custom URLs
+# URLs can be specified with or without the http:// prefix
 aiperf profile \
     --model Qwen/Qwen3-0.6B \
     --endpoint-type chat \
     --endpoint /v1/chat/completions \
     --streaming \
-    --url localhost:8080 \
+    --url localhost:8000 \
     --synthetic-input-tokens-mean 100 \
     --synthetic-input-tokens-stddev 0 \
     --output-tokens-mean 200 \
@@ -304,14 +327,14 @@ aiperf profile \
     --warmup-request-count 1 \
     --conversation-num 8 \
     --random-seed 100 \
-    --gpu-telemetry http://node1:9400/metrics http://node2:9400/metrics http://node3:9400/metrics
+    --gpu-telemetry node1:9400 node2:9400 http://node3:9400/metrics
 ```
 
 This will collect GPU metrics from:
 - `http://localhost:9400/metrics` (default, always attempted)
 - `http://localhost:9401/metrics` (default, always attempted)
-- `http://node1:9400/metrics` (custom node 1)
-- `http://node2:9400/metrics` (custom node 2)
+- `http://node1:9400` (custom node 1, normalized from `node1:9400`)
+- `http://node2:9400` (custom node 2, normalized from `node2:9400`)
 - `http://node3:9400/metrics` (custom node 3)
 
 All metrics are displayed on the console and saved to the output CSV and JSON files, with GPU indices and hostnames distinguishing metrics from different nodes.

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -17,6 +17,8 @@ nav:
     - Time-based Benchmarking: tutorials/time-based-benchmarking.md
     - Sequence Distributions: tutorials/sequence-distributions.md
     - Goodput: tutorials/goodput.md
+    - Request Rate with Max Concurrency: tutorials/request-rate-concurrency.md
+    - GPU Telemetry: tutorials/gpu-telemetry.md
     - Template Endpoint: tutorials/template-endpoint.md
   - Reference:
     - Architecture: architecture.md

diff --git a/src/aiperf/common/config/user_config.py b/src/aiperf/common/config/user_config.py
@@ -19,7 +19,7 @@
 from aiperf.common.config.loadgen_config import LoadGeneratorConfig
 from aiperf.common.config.output_config import OutputConfig
 from aiperf.common.config.tokenizer_config import TokenizerConfig
-from aiperf.common.enums import CustomDatasetType
+from aiperf.common.enums import CustomDatasetType, GPUTelemetryMode
 from aiperf.common.enums.timing_enums import RequestRateMode, TimingMode
 from aiperf.common.utils import load_json_str
 
@@ -224,6 +224,44 @@ def _count_dataset_entries(self) -> int:
         ),
     ]
 
+    _gpu_telemetry_mode: GPUTelemetryMode = GPUTelemetryMode.SUMMARY
+    _gpu_telemetry_urls: list[str] = []
+
+    @model_validator(mode="after")
+    def _parse_gpu_telemetry_config(self) -> Self:
+        """Parse gpu_telemetry list into mode and URLs."""
+        if not self.gpu_telemetry:
+            return self
+
+        mode = GPUTelemetryMode.SUMMARY
+        urls = []
+
+        for item in self.gpu_telemetry:
+            if item in ["dashboard"]:
+                mode = GPUTelemetryMode.REALTIME_DASHBOARD
+            elif item.startswith("http") or ":" in item:
+                normalized_url = item if item.startswith("http") else f"http://{item}"
+                urls.append(normalized_url)
+
+        self._gpu_telemetry_mode = mode
+        self._gpu_telemetry_urls = urls
+        return self
+
+    @property
+    def gpu_telemetry_mode(self) -> GPUTelemetryMode:
+        """Get the GPU telemetry display mode (parsed from gpu_telemetry list)."""
+        return self._gpu_telemetry_mode
+
+    @gpu_telemetry_mode.setter
+    def gpu_telemetry_mode(self, value: GPUTelemetryMode) -> None:
+        """Set the GPU telemetry display mode."""
+        self._gpu_telemetry_mode = value
+
+    @property
+    def gpu_telemetry_urls(self) -> list[str]:
+        """Get the parsed GPU telemetry DCGM endpoint URLs."""
+        return self._gpu_telemetry_urls
+
     @model_validator(mode="after")
     def _compute_config(self) -> Self:
         """Compute additional configuration.

diff --git a/src/aiperf/common/enums/__init__.py b/src/aiperf/common/enums/__init__.py
@@ -96,6 +96,9 @@
 from aiperf.common.enums.system_enums import (
     SystemState,
 )
+from aiperf.common.enums.telemetry_enums import (
+    GPUTelemetryMode,
+)
 from aiperf.common.enums.timing_enums import (
     CreditPhase,
     RequestRateMode,
@@ -131,6 +134,7 @@
     "ExportLevel",
     "FrequencyMetricUnit",
     "FrequencyMetricUnitInfo",
+    "GPUTelemetryMode",
     "GenericMetricUnit",
     "ImageFormat",
     "LifecycleState",

diff --git a/src/aiperf/common/enums/command_enums.py b/src/aiperf/common/enums/command_enums.py
@@ -14,6 +14,7 @@ class CommandType(CaseInsensitiveStrEnum):
     SHUTDOWN = "shutdown"
     SHUTDOWN_WORKERS = "shutdown_workers"
     SPAWN_WORKERS = "spawn_workers"
+    START_REALTIME_TELEMETRY = "start_realtime_telemetry"
 
 
 class CommandResponseStatus(CaseInsensitiveStrEnum):

diff --git a/src/aiperf/common/enums/message_enums.py b/src/aiperf/common/enums/message_enums.py
@@ -41,6 +41,7 @@ class MessageType(CaseInsensitiveStrEnum):
     PROFILE_PROGRESS = "profile_progress"
     PROFILE_RESULTS = "profile_results"
     REALTIME_METRICS = "realtime_metrics"
+    REALTIME_TELEMETRY_METRICS = "realtime_telemetry_metrics"
     REGISTRATION = "registration"
     SERVICE_ERROR = "service_error"
     STATUS = "status"

diff --git a/src/aiperf/common/enums/telemetry_enums.py b/src/aiperf/common/enums/telemetry_enums.py
@@ -0,0 +1,11 @@
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
+from aiperf.common.enums.base_enums import CaseInsensitiveStrEnum
+
+
+class GPUTelemetryMode(CaseInsensitiveStrEnum):
+    """GPU telemetry display mode."""
+
+    SUMMARY = "summary"
+    REALTIME_DASHBOARD = "realtime_dashboard"
diff --git a/src/aiperf/common/hooks.py b/src/aiperf/common/hooks.py
@@ -44,6 +44,7 @@ class AIPerfHook(CaseInsensitiveStrEnum):
     ON_INIT = "@on_init"
     ON_MESSAGE = "@on_message"
     ON_REALTIME_METRICS = "@on_realtime_metrics"
+    ON_REALTIME_TELEMETRY_METRICS = "@on_realtime_telemetry_metrics"
     ON_PROFILING_PROGRESS = "@on_profiling_progress"
     ON_PULL_MESSAGE = "@on_pull_message"
     ON_RECORDS_PROGRESS = "@on_records_progress"
@@ -348,6 +349,21 @@ def _on_realtime_metrics(self, metrics: list[MetricResult]) -> None:
     return _hook_decorator(AIPerfHook.ON_REALTIME_METRICS, func)
 
 
+def on_realtime_telemetry_metrics(func: Callable) -> Callable:
+    """Decorator to specify that the function is a hook that should be called when real-time GPU telemetry metrics are received.
+    See :func:`aiperf.common.hooks._hook_decorator`.
+
+    Example:
+    ```python
+    class MyPlugin(RealtimeMetricsMixin):
+        @on_realtime_telemetry_metrics
+        def _on_realtime_telemetry_metrics(self, metrics: list[MetricResult]) -> None:
+            pass
+    ```
+    """
+    return _hook_decorator(AIPerfHook.ON_REALTIME_TELEMETRY_METRICS, func)
+
+
 def on_pull_message(
     *message_types: MessageTypeT | Callable[[SelfT], Iterable[MessageTypeT]],
 ) -> Callable:

diff --git a/src/aiperf/common/messages/__init__.py b/src/aiperf/common/messages/__init__.py
@@ -31,6 +31,7 @@
     ShutdownCommand,
     ShutdownWorkersCommand,
     SpawnWorkersCommand,
+    StartRealtimeTelemetryCommand,
     TargetedServiceMessage,
 )
 from aiperf.common.messages.credit_messages import (
@@ -75,6 +76,7 @@
 )
 from aiperf.common.messages.telemetry_messages import (
     ProcessTelemetryResultMessage,
+    RealtimeTelemetryMetricsMessage,
     TelemetryRecordsMessage,
     TelemetryStatusMessage,
 )
@@ -127,13 +129,15 @@
     "ProfileStartCommand",
     "RealtimeMetricsCommand",
     "RealtimeMetricsMessage",
+    "RealtimeTelemetryMetricsMessage",
     "RecordsProcessingStatsMessage",
     "RegisterServiceCommand",
     "RegistrationMessage",
     "RequiresRequestNSMixin",
     "ShutdownCommand",
     "ShutdownWorkersCommand",
     "SpawnWorkersCommand",
+    "StartRealtimeTelemetryCommand",
     "StatusMessage",
     "TargetedServiceMessage",
     "TelemetryRecordsMessage",

diff --git a/src/aiperf/common/messages/command_messages.py b/src/aiperf/common/messages/command_messages.py
@@ -242,6 +242,17 @@ class RealtimeMetricsCommand(CommandMessage):
     command: CommandTypeT = CommandType.REALTIME_METRICS
 
 
+class StartRealtimeTelemetryCommand(CommandMessage):
+    """Command to start the realtime telemetry background task in RecordsManager.
+
+    This command is sent when the user dynamically enables the telemetry dashboard
+    by pressing the telemetry option in the UI. This always sets the GPU telemetry
+    mode to REALTIME_DASHBOARD.
+    """
+
+    command: CommandTypeT = CommandType.START_REALTIME_TELEMETRY
+
+
 class SpawnWorkersCommand(CommandMessage):
     command: CommandTypeT = CommandType.SPAWN_WORKERS
 

diff --git a/src/aiperf/common/messages/telemetry_messages.py b/src/aiperf/common/messages/telemetry_messages.py
@@ -5,7 +5,12 @@
 
 from aiperf.common.enums import MessageType
 from aiperf.common.messages.service_messages import BaseServiceMessage
-from aiperf.common.models import ErrorDetails, ProcessTelemetryResult, TelemetryRecord
+from aiperf.common.models import (
+    ErrorDetails,
+    MetricResult,
+    ProcessTelemetryResult,
+    TelemetryRecord,
+)
 from aiperf.common.types import MessageTypeT
 
 
@@ -19,6 +24,10 @@ class TelemetryRecordsMessage(BaseServiceMessage):
         ...,
         description="The ID of the telemetry data collector that collected the records.",
     )
+    dcgm_url: str = Field(
+        ...,
+        description="The DCGM endpoint URL that was contacted (e.g., 'http://localhost:9400/metrics')",
+    )
     records: list[TelemetryRecord] = Field(
         ..., description="The telemetry records collected from GPU monitoring"
     )
@@ -62,3 +71,13 @@ class TelemetryStatusMessage(BaseServiceMessage):
         default_factory=list,
         description="List of DCGM endpoint URLs that were reachable and will provide data",
     )
+
+
+class RealtimeTelemetryMetricsMessage(BaseServiceMessage):
+    """Message from the records manager to show real-time GPU telemetry metrics."""
+
+    message_type: MessageTypeT = MessageType.REALTIME_TELEMETRY_METRICS
+
+    metrics: list[MetricResult] = Field(
+        ..., description="The current real-time GPU telemetry metrics."
+    )
diff --git a/src/aiperf/common/mixins/__init__.py b/src/aiperf/common/mixins/__init__.py
@@ -44,6 +44,9 @@
 from aiperf.common.mixins.realtime_metrics_mixin import (
     RealtimeMetricsMixin,
 )
+from aiperf.common.mixins.realtime_telemetry_metrics_mixin import (
+    RealtimeTelemetryMetricsMixin,
+)
 from aiperf.common.mixins.reply_client_mixin import (
     ReplyClientMixin,
 )
@@ -67,6 +70,7 @@
     "ProgressTrackerMixin",
     "PullClientMixin",
     "RealtimeMetricsMixin",
+    "RealtimeTelemetryMetricsMixin",
     "ReplyClientMixin",
     "TaskManagerMixin",
     "WorkerTrackerMixin",