Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Oct 10, 2025

Overview:

This PR adds passthrough of SGLang and vLLM Prometheus metrics to Dynamo backend workers. Engine metrics are now directly exposed via the Dynamo /metrics endpoint.

Details:

  • Added prometheus_utils.py module for engine metrics text exposition using register_prometheus_exposition_text_callback()
  • Implemented Rust callback mechanism register_prometheus_exposition_text_callback to append raw Prometheus text from engine registries
  • Renamed metrics-related functions to use prometheus_ prefix for consistency (register_prometheus_update_callback, execute_prometheus_update_callbacks)
  • Added documentation for vLLM and SGLang metrics with descriptions matching upstream HELP text
  • Created env_utils.py matching Rust env_is_truthy() in lib/runtime/src/config.rs
  • Updated vLLM and SGLang main.py to use register_engine_metrics_callback() with DYN_ENGINE_METRICS_ENABLED flag
  • Engine metrics appear after initialization; Dynamo runtime metrics available immediately

Where should the reviewer start?

  • components/src/dynamo/common/prometheus_utils.py - New Python utilities for metrics exposition
  • lib/runtime/src/lib.rs and lib/runtime/src/distributed.rs - Rust callback implementation
  • lib/runtime/src/system_status_server.rs - Metrics handler with exposition text callback execution
  • components/src/dynamo/vllm/main.py and components/src/dynamo/sglang/main.py - Integration examples
  • components/src/dynamo/common/prometheus_utils.*.md - Metrics documentation

Related Issues:

DIS-440, DIS-441, #3407

/coderabbit profile chill

Summary by CodeRabbit

  • New Features

    • Prometheus metrics passthrough for SGLang and vLLM, with multiprocess support and optional prefix filtering.
    • Simple environment toggle to enable engine metrics.
    • System status server now aggregates and serves per-hierarchy exposition text.
    • Python API to define and update counters, gauges, histograms, and labeled variants, with scrape-time callbacks.
  • Documentation

    • Added comprehensive Prometheus metric references for SGLang and vLLM.
    • New README covering Python metrics integration and workflows.
  • Examples

    • Added a Python example demonstrating metrics registration and update callbacks.

@keivenchang keivenchang requested review from a team as code owners October 10, 2025 01:04
@github-actions github-actions bot added the feat label Oct 10, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

Walkthrough

Adds environment and Prometheus utility modules, exposes them via dynamo.common, and integrates Prometheus multiprocess metrics and callback passthrough into SGLang and vLLM endpoints. Introduces Python Prometheus metrics bindings (Rust + type stubs) and example usage. Refactors runtime to support Prometheus update and exposition text callbacks, updates server to aggregate per-hierarchy metrics, and adds comprehensive metrics documentation.

Changes

Cohort / File(s) Summary
Common utilities exposure
components/src/dynamo/common/__init__.py
Expose env_utils and prometheus_utils via imports and __all__; update module docs.
Env utilities
components/src/dynamo/common/env_utils.py
Add env_is_truthy(name: str) -> bool for standardized env var truthiness.
Prometheus utils
components/src/dynamo/common/prometheus_utils.py
Add helpers: enable-check via env, registry exposition text generation with optional prefix filter, and endpoint metrics callback registration with logging and error handling.
SGLang integration
components/src/dynamo/sglang/main.py
Initialize Prometheus multiprocess registry and register endpoint metrics callback for generation and embedding endpoints.
vLLM integration
components/src/dynamo/vllm/main.py
Initialize Prometheus multiprocess metrics, register endpoint metrics callback using global REGISTRY, and log multiprocess dir state.
Python bindings (Rust impl)
lib/bindings/python/rust/prometheus_metrics.rs
Add PyO3 wrappers for Prometheus metric types; add PyRuntimeMetrics factory and callback registration for update and exposition text; module export setup.
Python type stubs
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi
Provide typed stubs for metric wrappers and RuntimeMetrics, including factory and callback methods; define __all__.
Python example
lib/bindings/python/examples/metrics/server_with_callback.py
Example server creating metrics via endpoint API and registering an update callback to mutate metrics on scrape.
Runtime core: types and registry
lib/runtime/src/lib.rs
Replace generic runtime callback with PrometheusUpdateCallback and PrometheusExpositionTextCallback; adjust MetricsRegistryEntry fields, constructors, clone, and execution methods for update and exposition text paths.
Runtime distributed layer
lib/runtime/src/distributed.rs
Switch to Prometheus-specific callbacks for registration and execution; add exposition text callback support; note unresolved merge markers in diff context.
Runtime metrics assembly
lib/runtime/src/metrics.rs
Execute update callbacks before gather; append exposition text to encoded output; update tests to new APIs.
System status server
lib/runtime/src/system_status_server.rs
Execute update callbacks across all hierarchies; aggregate top-level metrics plus per-hierarchy exposition text into single response; improved error handling.
Service comments
lib/runtime/src/service.rs
Update comment to reference execute_prometheus_update_callbacks().
Documentation: SGLang metrics
components/src/dynamo/common/prometheus_utils.sglang.md
Add reference for SGLang-exposed metrics, categories, labels, and update procedures.
Documentation: vLLM metrics
components/src/dynamo/common/prometheus_utils.vllm.md
Add reference for vLLM-exposed metrics and operational guidance.
Python metrics README
lib/bindings/python/examples/metrics/README.md
Document Python metrics registry usage, update methods, vector metrics, and architecture; includes run instructions and sample outputs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Env as Env
  participant App as SGLang/vLLM Main
  participant Prom as Prometheus REGISTRY
  participant EP as Endpoint.metrics
  participant RT as Runtime (Rust)

  Env->>App: DYN_ENGINE_METRICS_ENABLED
  App->>Prom: Init multiprocess REGISTRY
  App->>EP: register_engine_metrics_callback(callback)
  note over EP,App: Callback queries runtime for exposition text and/or updates

  EP->>RT: execute_prometheus_update_callbacks()
  RT-->>EP: Ok/Err per callback
  EP->>RT: execute_prometheus_exposition_text_callbacks()
  RT-->>EP: concatenated exposition text (may be empty)

  EP-->>Prom: Exposition text (REGISTRY metrics + appended text)
Loading
sequenceDiagram
  autonumber
  participant Client as HTTP Client
  participant Srv as SystemStatusServer
  participant Dist as DistributedRuntime
  participant Reg as MetricsRegistry
  participant Hier as Hierarchies[*]

  Client->>Srv: GET /metrics
  Srv->>Dist: get_registry()
  Dist-->>Srv: Reg
  Srv->>Reg: get_all_hierarchy_names()
  loop each hierarchy
    Srv->>Dist: execute_prometheus_update_callbacks(hierarchy)
    Srv->>Dist: get_exposition_text(hierarchy)
    Dist-->>Srv: text (possibly empty)
  end
  Srv-->>Client: 200 OK with base metrics + appended per-hierarchy text
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Poem

In burrows of bytes I twitch my nose,
New metrics bloom where the registry grows.
SGLang, vLLM, we hop in sync,
With callbacks queued before scrapes blink.
Rust and Python share the trail—
Exposition tails wag without fail.
Prometheus, fetch! Another metric grail. 🐇📈

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title concisely and accurately describes the primary change by indicating that passthrough metrics for both SGLang and vLLM are being added to the Dynamo backend worker, matching the core objective of the changeset and following conventional commit style.
Description Check ✅ Passed The description adheres to the repository template by providing distinct Overview, Details, Where should the reviewer start, and Related Issues sections and clearly explains the changes and relevant files. It outlines the new modules, Rust callback updates, and integration points effectively. The only minor formatting issues are that the Related Issues section lists identifiers without the required action keywords and there is an extraneous “/coderabbit profile chill” line at the end.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/runtime/src/system_status_server.rs (1)

196-223: Resolve merge conflict markers

<<<<<<< Updated upstream/=======/>>>>>>> Stashed changes are still in the checked-in file, so this module won’t compile. Please finish the merge resolution (keep the new callback iteration logic) and drop the markers.

lib/runtime/src/distributed.rs (1)

303-352: Resolve the merge conflict in register_prometheus_update_callback.

There are leftover <<<<<<< Updated upstream / >>>>>>> Stashed changes markers, so the build will fail until the conflict is resolved. Please clean this section up so only the intended implementation remains.

🧹 Nitpick comments (2)
components/src/dynamo/common/prometheus_utils.sglang.md (1)

410-427: Specify the fenced code block language.

Markdown lint flags this fence because it has no language. Please tag it (e.g., text) so tooling renders it correctly.

-```
+```text
 # HELP request_total_slots Total request slots available
 # TYPE request_total_slots gauge
 request_total_slots{dynamo_namespace="ns556",dynamo_component="cp556",dynamo_endpoint="ep556"} 1024
@@
 # TYPE internal_update_count counter
 internal_update_count{dynamo_namespace="ns556",dynamo_component="cp556",dynamo_endpoint="ep556",type="internal"} 1
lib/runtime/src/lib.rs (1)

137-155: Consider final newline for Prometheus format.

The newline handling correctly separates outputs from different callbacks. However, the final concatenated text may not end with a newline if the last callback's output doesn't include one. While Prometheus text format generally tolerates this, it's conventional for text exposition to end with a newline.

Consider appending a final newline if the result is non-empty:

         }
     }
+    if !result.is_empty() && !result.ends_with('\n') {
+        result.push('\n');
+    }
     result
 }

Additionally, the error handling approach (log and continue) is appropriate for metrics gathering where partial failures shouldn't block the entire scrape.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b640f28 and bb9efe2.

📒 Files selected for processing (16)
  • components/src/dynamo/common/__init__.py (1 hunks)
  • components/src/dynamo/common/env_utils.py (1 hunks)
  • components/src/dynamo/common/prometheus_utils.py (1 hunks)
  • components/src/dynamo/common/prometheus_utils.sglang.md (1 hunks)
  • components/src/dynamo/common/prometheus_utils.vllm.md (1 hunks)
  • components/src/dynamo/sglang/main.py (4 hunks)
  • components/src/dynamo/vllm/main.py (6 hunks)
  • lib/bindings/python/examples/metrics/README.md (1 hunks)
  • lib/bindings/python/examples/metrics/server_with_callback.py (1 hunks)
  • lib/bindings/python/rust/prometheus_metrics.rs (1 hunks)
  • lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (1 hunks)
  • lib/runtime/src/distributed.rs (4 hunks)
  • lib/runtime/src/lib.rs (2 hunks)
  • lib/runtime/src/metrics.rs (6 hunks)
  • lib/runtime/src/service.rs (1 hunks)
  • lib/runtime/src/system_status_server.rs (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
components/src/dynamo/common/prometheus_utils.py (3)
components/src/dynamo/common/env_utils.py (1)
  • env_is_truthy (16-48)
lib/bindings/python/rust/prometheus_metrics.rs (2)
  • register_prometheus_exposition_text_callback (681-715)
  • result (697-697)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (1)
  • register_prometheus_exposition_text_callback (255-276)
lib/bindings/python/examples/metrics/server_with_callback.py (4)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (39)
  • Gauge (51-82)
  • IntCounter (126-142)
  • IntGauge (165-190)
  • IntGaugeVec (192-220)
  • create_intgauge (306-308)
  • create_gauge (286-288)
  • create_intgaugevec (310-312)
  • create_intcounter (298-300)
  • name (14-16)
  • name (32-34)
  • name (53-55)
  • name (86-88)
  • name (116-118)
  • name (128-130)
  • name (146-148)
  • const_labels (17-19)
  • const_labels (35-37)
  • const_labels (56-58)
  • const_labels (89-91)
  • const_labels (119-121)
  • const_labels (131-133)
  • const_labels (149-151)
  • inc (20-22)
  • inc (41-43)
  • inc (65-67)
  • inc (101-103)
  • inc (134-136)
  • inc (155-157)
  • inc (179-181)
  • inc (209-211)
  • get (26-28)
  • get (47-49)
  • get (62-64)
  • get (98-100)
  • get (140-142)
  • get (161-163)
  • get (176-178)
  • get (206-208)
  • register_prometheus_update_callback (231-253)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (33-63)
lib/bindings/python/src/dynamo/runtime/__init__.py (1)
  • dynamo_worker (22-48)
lib/bindings/python/rust/prometheus_metrics.rs (39)
  • create_intgauge (864-880)
  • create_gauge (762-778)
  • create_intgaugevec (884-901)
  • create_intcounter (823-839)
  • name (98-101)
  • name (141-144)
  • name (184-187)
  • name (236-239)
  • name (288-291)
  • name (361-364)
  • name (422-425)
  • name (495-498)
  • name (568-571)
  • const_labels (104-112)
  • const_labels (147-155)
  • const_labels (190-198)
  • const_labels (242-250)
  • const_labels (294-302)
  • const_labels (367-375)
  • const_labels (428-436)
  • const_labels (501-509)
  • const_labels (574-582)
  • inc (115-118)
  • inc (158-161)
  • inc (207-211)
  • inc (259-263)
  • inc (316-319)
  • inc (389-392)
  • inc (458-462)
  • inc (531-535)
  • get (127-129)
  • get (170-172)
  • get (221-224)
  • get (273-276)
  • get (311-313)
  • get (384-386)
  • get (452-455)
  • get (525-528)
  • register_prometheus_update_callback (674-676)
components/src/dynamo/vllm/main.py (3)
lib/bindings/python/rust/lib.rs (3)
  • _core (130-193)
  • component (729-735)
  • endpoint (627-633)
lib/bindings/python/src/dynamo/_core.pyi (4)
  • Endpoint (111-142)
  • get (1275-1276)
  • component (86-90)
  • endpoint (105-109)
components/src/dynamo/common/prometheus_utils.py (1)
  • register_engine_metrics_callback (41-76)
lib/runtime/src/metrics.rs (2)
lib/runtime/src/distributed.rs (2)
  • new (42-185)
  • execute_prometheus_update_callbacks (324-338)
lib/runtime/src/lib.rs (2)
  • new (107-113)
  • execute_prometheus_update_callbacks (129-134)
lib/runtime/src/lib.rs (1)
lib/runtime/src/distributed.rs (2)
  • new (42-185)
  • execute_prometheus_update_callbacks (324-338)
components/src/dynamo/sglang/main.py (3)
lib/bindings/python/rust/lib.rs (1)
  • _core (130-193)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • Endpoint (111-142)
components/src/dynamo/common/prometheus_utils.py (1)
  • register_engine_metrics_callback (41-76)
lib/runtime/src/system_status_server.rs (3)
lib/runtime/src/metrics.rs (3)
  • hierarchy (372-378)
  • prometheus_metrics_fmt (506-544)
  • new (1387-1389)
lib/runtime/src/distributed.rs (2)
  • execute_prometheus_update_callbacks (324-338)
  • new (42-185)
lib/runtime/src/lib.rs (3)
  • execute_prometheus_update_callbacks (129-134)
  • execute_prometheus_exposition_text_callbacks (137-155)
  • new (107-113)
lib/runtime/src/distributed.rs (3)
lib/bindings/python/rust/prometheus_metrics.rs (2)
  • register_prometheus_update_callback (674-676)
  • register_prometheus_exposition_text_callback (681-715)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (2)
  • register_prometheus_update_callback (231-253)
  • register_prometheus_exposition_text_callback (255-276)
lib/runtime/src/lib.rs (3)
  • add_prometheus_update_callback (116-118)
  • clone (173-179)
  • execute_prometheus_update_callbacks (129-134)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (2)
lib/bindings/python/rust/prometheus_metrics.rs (66)
  • name (98-101)
  • name (141-144)
  • name (184-187)
  • name (236-239)
  • name (288-291)
  • name (361-364)
  • name (422-425)
  • name (495-498)
  • name (568-571)
  • const_labels (104-112)
  • const_labels (147-155)
  • const_labels (190-198)
  • const_labels (242-250)
  • const_labels (294-302)
  • const_labels (367-375)
  • const_labels (428-436)
  • const_labels (501-509)
  • const_labels (574-582)
  • inc (115-118)
  • inc (158-161)
  • inc (207-211)
  • inc (259-263)
  • inc (316-319)
  • inc (389-392)
  • inc (458-462)
  • inc (531-535)
  • inc_by (121-124)
  • inc_by (164-167)
  • inc_by (214-218)
  • inc_by (266-270)
  • inc_by (322-325)
  • get (127-129)
  • get (170-172)
  • get (221-224)
  • get (273-276)
  • get (311-313)
  • get (384-386)
  • get (452-455)
  • get (525-528)
  • variable_labels (201-204)
  • variable_labels (253-256)
  • variable_labels (439-442)
  • variable_labels (512-515)
  • dec (328-331)
  • dec (395-398)
  • dec (465-469)
  • dec (538-542)
  • dec_by (334-337)
  • add (340-343)
  • add (401-404)
  • add (472-476)
  • add (545-549)
  • sub (346-349)
  • sub (407-410)
  • sub (479-483)
  • sub (552-556)
  • observe (585-588)
  • register_prometheus_update_callback (674-676)
  • register_prometheus_exposition_text_callback (681-715)
  • create_counter (722-737)
  • create_countervec (741-758)
  • create_gauge (762-778)
  • create_gaugevec (782-799)
  • create_histogram (803-819)
  • create_intcounter (823-839)
  • create_intgauge (864-880)
lib/runtime/src/metrics.rs (6)
  • create_counter (401-408)
  • create_countervec (411-426)
  • create_gauge (429-436)
  • create_histogram (439-447)
  • create_intcounter (450-457)
  • create_intgauge (478-485)
lib/bindings/python/rust/prometheus_metrics.rs (4)
lib/runtime/src/metrics.rs (10)
  • new (1387-1389)
  • hierarchy (372-378)
  • create_counter (401-408)
  • create_countervec (411-426)
  • create_gauge (429-436)
  • create_histogram (439-447)
  • create_intcounter (450-457)
  • create_intcountervec (460-475)
  • create_intgauge (478-485)
  • create_intgaugevec (488-503)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (75)
  • Counter (12-28)
  • IntCounter (126-142)
  • CounterVec (30-49)
  • IntCounterVec (144-163)
  • Gauge (51-82)
  • IntGauge (165-190)
  • GaugeVec (84-112)
  • IntGaugeVec (192-220)
  • Histogram (114-124)
  • name (14-16)
  • name (32-34)
  • name (53-55)
  • name (86-88)
  • name (116-118)
  • name (128-130)
  • name (146-148)
  • name (167-169)
  • name (194-196)
  • const_labels (17-19)
  • const_labels (35-37)
  • const_labels (56-58)
  • const_labels (89-91)
  • const_labels (119-121)
  • const_labels (131-133)
  • const_labels (149-151)
  • const_labels (170-172)
  • const_labels (197-199)
  • inc (20-22)
  • inc (41-43)
  • inc (65-67)
  • inc (101-103)
  • inc (134-136)
  • inc (155-157)
  • inc (179-181)
  • inc (209-211)
  • inc_by (23-25)
  • inc_by (44-46)
  • inc_by (68-70)
  • inc_by (137-139)
  • inc_by (158-160)
  • get (26-28)
  • get (47-49)
  • get (62-64)
  • get (98-100)
  • get (140-142)
  • get (161-163)
  • get (176-178)
  • get (206-208)
  • variable_labels (38-40)
  • variable_labels (92-94)
  • variable_labels (152-154)
  • variable_labels (200-202)
  • dec (71-73)
  • dec (104-106)
  • dec (182-184)
  • dec (212-214)
  • dec_by (74-76)
  • add (77-79)
  • add (107-109)
  • add (185-187)
  • add (215-217)
  • sub (80-82)
  • sub (110-112)
  • sub (188-190)
  • sub (218-220)
  • observe (122-124)
  • register_prometheus_exposition_text_callback (255-276)
  • create_counter (278-280)
  • create_countervec (282-284)
  • create_gauge (286-288)
  • create_histogram (294-296)
  • create_intcounter (298-300)
  • create_intcountervec (302-304)
  • create_intgauge (306-308)
  • create_intgaugevec (310-312)
lib/runtime/src/distributed.rs (2)
  • new (42-185)
  • register_prometheus_exposition_text_callback (341-353)
lib/runtime/src/lib.rs (2)
  • new (107-113)
  • clone (173-179)
🪛 markdownlint-cli2 (0.18.1)
components/src/dynamo/common/prometheus_utils.sglang.md

143-143: Spaces inside emphasis markers

(MD037, no-space-in-emphasis)

lib/bindings/python/examples/metrics/README.md

410-410: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.13.3)
components/src/dynamo/common/prometheus_utils.py

147-147: Do not catch blind exception: Exception

(BLE001)


148-148: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🔇 Additional comments (5)
lib/runtime/src/lib.rs (5)

89-93: LGTM! Well-designed type aliases.

The callback type aliases are clearly named and appropriately use Arc for thread-safe sharing. The documentation explains the design rationale well.


99-102: LGTM! Clear separation of callback concerns.

The split into separate callback vectors for updates vs. exposition text improves clarity and maintainability.


110-126: LGTM! Straightforward initialization and registration.

The constructor and callback registration methods are implemented correctly with clear, simple logic.


129-134: LGTM! Appropriate result collection.

Collecting all callback results allows callers to handle errors appropriately without short-circuiting on the first failure. This is a good pattern for metrics updates where partial failures shouldn't block the entire scrape.


176-177: Verify that dropping callbacks on clone is intentional.

The Clone implementation explicitly resets callback vectors to empty, meaning cloned MetricsRegistryEntry instances lose all registered callbacks. While the comment explains this, it could be surprising behavior.

Please confirm:

  1. Is this the intended behavior? (Likely yes, to avoid duplicate callback executions)
  2. Is this behavior documented in the struct-level documentation?
  3. Are there use cases where MetricsRegistryEntry is cloned? If so, do they expect callbacks to be lost?

Consider adding a warning to the struct-level doc comments explaining this clone behavior:

/// Structure to hold Prometheus registries and associated callbacks for a given hierarchy
///
/// # Clone Behavior
/// When cloned, the Prometheus registry is duplicated, but all callbacks are reset to empty.
/// This prevents duplicate callback executions when registries are shared.
pub struct MetricsRegistryEntry {

- Guard SGLang metrics callback setup with is_engine_metrics_callback_enabled()
- Set server_args.enable_metrics=True before engine init when callback enabled
- Move prometheus_client import to TYPE_CHECKING to avoid early import
- Update prometheus_utils docs to simplify example commands

Signed-off-by: Keiven Chang <[email protected]>
- Guard SGLang callback with is_engine_metrics_callback_enabled() in init()
- Set server_args.enable_metrics=True before engine init when callback enabled
- Guard vLLM callback with conditional import and registration
- Only enable metrics passthrough when DYN_ENGINE_METRICS_ENABLED=1

Signed-off-by: Keiven Chang <[email protected]>
Signed-off-by: Keiven Chang <[email protected]>

## Overview

When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with `vllm:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we went with vllm: and dynamo_ vs the same convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignorance 😂 I didn't know vLLM and SGLang already emit vllm:something and sglang:something but if I did I would have done the same.

I don't mind updating everything, but that'll have to be a whole different PR. @nnshah1 @grahamking what do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea don't let it block this PR. But if they both do it that way thats what I would prefer.

Copy link
Contributor

@rmccorm4 rmccorm4 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree not for this PR.

I actually find the vllm:/sglang: strange, because the official prometheus naming convention guide is prefix_ (underscore, not colon) - https://prometheus.io/docs/practices/naming/#metric-names

But if we think in this scenario it makes more sense to look uniform with the frameworks, I wouldn't be opposed.

What does TRTLLM do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to alec's point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRTLLM is in my next TODO, I will find out.

```bash
# Set environment variables
export DYN_SYSTEM_ENABLED=true
export DYN_SYSTEM_PORT=8081
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a dumb question, but why do I need to set two flags. Why can't I just set DYN_SYSTEM_PORT= and we assume if its set then enabled=TRUE? If we can infer that feels simpler and more intuitive than pasting two flags everytime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said another way, is there a time you would set a port and not set enabled=True?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a dumb question, and you bring up a good point, and there's usually a universal answer to all the questions: "... because of dumb historic reasons" 😂

Early on, we had a default (see lib/runtime/src/config.rs, which defaults to 9090), but wasn't sure whether to turn it on by default or not, so it was always left off (with DYN_SYSTEM_ENABLED=false by default).

Then throughout the examples we asked users to switch it on by turning it on to true. Then at some point, people wanted to keep changing port numbers (k8s yaml, etc) because 9090 conflicted with something else. So we started adding examples to override 9090 to 8081 (because at the time, 8080 was the frontend).

I think now is a GREAT time to clean all this up. @nnshah1 @mohammedabdulwahhab @biswapanda , what do you think? Can we just use the port number to turn this on/off? It would clean up quite a bit. Hopefully it won't be too troublesome in your k8s yml files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enabled is legacy to some extent - agree we can use system_port and have a default value ... if it is set to 0 or false we could disable it entirely -

I think seperate PR though -

await component.create_service()

generate_endpoint = component.endpoint(dynamo_args.endpoint)
generate_endpoint: Endpoint = component.endpoint(dynamo_args.endpoint)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit same comment as vLLM, not really in line with rest of code to add type hint here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also - should we be doing from dynamo.runtime import Endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Yes if needed, dynamo.runtime is more appropriate.

Copy link
Contributor

@alec-flowers alec-flowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed mainly python code and vLLM. Lets a number of comments to be addressed.

Move imports to top level in prometheus.py and vllm/main.py.
Simplify prometheus.md files by removing verbose examples and detailed metric lists.
Remove time estimates and focus on high-level metric categories.
Add PROMETHEUS_MULTIPROC_DIR explanation for vLLM only.

Signed-off-by: Keiven Chang <[email protected]>
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we removed a good 300+ lines from reviews on this one 🚀

Great work @keivenchang this will be amazing for observability in Dynamo!

@rmccorm4 rmccorm4 enabled auto-merge (squash) October 11, 2025 00:57
@rmccorm4 rmccorm4 merged commit 55e458d into main Oct 11, 2025
28 of 29 checks passed
@rmccorm4 rmccorm4 deleted the keivenchang/add-raw-engine-metrics branch October 11, 2025 00:59
ziqifan617 pushed a commit that referenced this pull request Oct 20, 2025
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants