-
Notifications
You must be signed in to change notification settings - Fork 654
feat: add SGLang and vLLM passthrough metrics on Dynamo backend worker #3539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds environment and Prometheus utility modules, exposes them via dynamo.common, and integrates Prometheus multiprocess metrics and callback passthrough into SGLang and vLLM endpoints. Introduces Python Prometheus metrics bindings (Rust + type stubs) and example usage. Refactors runtime to support Prometheus update and exposition text callbacks, updates server to aggregate per-hierarchy metrics, and adds comprehensive metrics documentation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Env as Env
participant App as SGLang/vLLM Main
participant Prom as Prometheus REGISTRY
participant EP as Endpoint.metrics
participant RT as Runtime (Rust)
Env->>App: DYN_ENGINE_METRICS_ENABLED
App->>Prom: Init multiprocess REGISTRY
App->>EP: register_engine_metrics_callback(callback)
note over EP,App: Callback queries runtime for exposition text and/or updates
EP->>RT: execute_prometheus_update_callbacks()
RT-->>EP: Ok/Err per callback
EP->>RT: execute_prometheus_exposition_text_callbacks()
RT-->>EP: concatenated exposition text (may be empty)
EP-->>Prom: Exposition text (REGISTRY metrics + appended text)
sequenceDiagram
autonumber
participant Client as HTTP Client
participant Srv as SystemStatusServer
participant Dist as DistributedRuntime
participant Reg as MetricsRegistry
participant Hier as Hierarchies[*]
Client->>Srv: GET /metrics
Srv->>Dist: get_registry()
Dist-->>Srv: Reg
Srv->>Reg: get_all_hierarchy_names()
loop each hierarchy
Srv->>Dist: execute_prometheus_update_callbacks(hierarchy)
Srv->>Dist: get_exposition_text(hierarchy)
Dist-->>Srv: text (possibly empty)
end
Srv-->>Client: 200 OK with base metrics + appended per-hierarchy text
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/runtime/src/system_status_server.rs (1)
196-223: Resolve merge conflict markers
<<<<<<< Updated upstream/=======/>>>>>>> Stashed changesare still in the checked-in file, so this module won’t compile. Please finish the merge resolution (keep the new callback iteration logic) and drop the markers.lib/runtime/src/distributed.rs (1)
303-352: Resolve the merge conflict in register_prometheus_update_callback.There are leftover
<<<<<<< Updated upstream/>>>>>>> Stashed changesmarkers, so the build will fail until the conflict is resolved. Please clean this section up so only the intended implementation remains.
🧹 Nitpick comments (2)
components/src/dynamo/common/prometheus_utils.sglang.md (1)
410-427: Specify the fenced code block language.Markdown lint flags this fence because it has no language. Please tag it (e.g.,
text) so tooling renders it correctly.-``` +```text # HELP request_total_slots Total request slots available # TYPE request_total_slots gauge request_total_slots{dynamo_namespace="ns556",dynamo_component="cp556",dynamo_endpoint="ep556"} 1024 @@ # TYPE internal_update_count counter internal_update_count{dynamo_namespace="ns556",dynamo_component="cp556",dynamo_endpoint="ep556",type="internal"} 1lib/runtime/src/lib.rs (1)
137-155: Consider final newline for Prometheus format.The newline handling correctly separates outputs from different callbacks. However, the final concatenated text may not end with a newline if the last callback's output doesn't include one. While Prometheus text format generally tolerates this, it's conventional for text exposition to end with a newline.
Consider appending a final newline if the result is non-empty:
} } + if !result.is_empty() && !result.ends_with('\n') { + result.push('\n'); + } result }Additionally, the error handling approach (log and continue) is appropriate for metrics gathering where partial failures shouldn't block the entire scrape.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
components/src/dynamo/common/__init__.py(1 hunks)components/src/dynamo/common/env_utils.py(1 hunks)components/src/dynamo/common/prometheus_utils.py(1 hunks)components/src/dynamo/common/prometheus_utils.sglang.md(1 hunks)components/src/dynamo/common/prometheus_utils.vllm.md(1 hunks)components/src/dynamo/sglang/main.py(4 hunks)components/src/dynamo/vllm/main.py(6 hunks)lib/bindings/python/examples/metrics/README.md(1 hunks)lib/bindings/python/examples/metrics/server_with_callback.py(1 hunks)lib/bindings/python/rust/prometheus_metrics.rs(1 hunks)lib/bindings/python/src/dynamo/_prometheus_metrics.pyi(1 hunks)lib/runtime/src/distributed.rs(4 hunks)lib/runtime/src/lib.rs(2 hunks)lib/runtime/src/metrics.rs(6 hunks)lib/runtime/src/service.rs(1 hunks)lib/runtime/src/system_status_server.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (10)
components/src/dynamo/common/prometheus_utils.py (3)
components/src/dynamo/common/env_utils.py (1)
env_is_truthy(16-48)lib/bindings/python/rust/prometheus_metrics.rs (2)
register_prometheus_exposition_text_callback(681-715)result(697-697)lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (1)
register_prometheus_exposition_text_callback(255-276)
lib/bindings/python/examples/metrics/server_with_callback.py (4)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (39)
Gauge(51-82)IntCounter(126-142)IntGauge(165-190)IntGaugeVec(192-220)create_intgauge(306-308)create_gauge(286-288)create_intgaugevec(310-312)create_intcounter(298-300)name(14-16)name(32-34)name(53-55)name(86-88)name(116-118)name(128-130)name(146-148)const_labels(17-19)const_labels(35-37)const_labels(56-58)const_labels(89-91)const_labels(119-121)const_labels(131-133)const_labels(149-151)inc(20-22)inc(41-43)inc(65-67)inc(101-103)inc(134-136)inc(155-157)inc(179-181)inc(209-211)get(26-28)get(47-49)get(62-64)get(98-100)get(140-142)get(161-163)get(176-178)get(206-208)register_prometheus_update_callback(231-253)lib/bindings/python/src/dynamo/_core.pyi (1)
DistributedRuntime(33-63)lib/bindings/python/src/dynamo/runtime/__init__.py (1)
dynamo_worker(22-48)lib/bindings/python/rust/prometheus_metrics.rs (39)
create_intgauge(864-880)create_gauge(762-778)create_intgaugevec(884-901)create_intcounter(823-839)name(98-101)name(141-144)name(184-187)name(236-239)name(288-291)name(361-364)name(422-425)name(495-498)name(568-571)const_labels(104-112)const_labels(147-155)const_labels(190-198)const_labels(242-250)const_labels(294-302)const_labels(367-375)const_labels(428-436)const_labels(501-509)const_labels(574-582)inc(115-118)inc(158-161)inc(207-211)inc(259-263)inc(316-319)inc(389-392)inc(458-462)inc(531-535)get(127-129)get(170-172)get(221-224)get(273-276)get(311-313)get(384-386)get(452-455)get(525-528)register_prometheus_update_callback(674-676)
components/src/dynamo/vllm/main.py (3)
lib/bindings/python/rust/lib.rs (3)
_core(130-193)component(729-735)endpoint(627-633)lib/bindings/python/src/dynamo/_core.pyi (4)
Endpoint(111-142)get(1275-1276)component(86-90)endpoint(105-109)components/src/dynamo/common/prometheus_utils.py (1)
register_engine_metrics_callback(41-76)
lib/runtime/src/metrics.rs (2)
lib/runtime/src/distributed.rs (2)
new(42-185)execute_prometheus_update_callbacks(324-338)lib/runtime/src/lib.rs (2)
new(107-113)execute_prometheus_update_callbacks(129-134)
lib/runtime/src/lib.rs (1)
lib/runtime/src/distributed.rs (2)
new(42-185)execute_prometheus_update_callbacks(324-338)
components/src/dynamo/sglang/main.py (3)
lib/bindings/python/rust/lib.rs (1)
_core(130-193)lib/bindings/python/src/dynamo/_core.pyi (1)
Endpoint(111-142)components/src/dynamo/common/prometheus_utils.py (1)
register_engine_metrics_callback(41-76)
lib/runtime/src/system_status_server.rs (3)
lib/runtime/src/metrics.rs (3)
hierarchy(372-378)prometheus_metrics_fmt(506-544)new(1387-1389)lib/runtime/src/distributed.rs (2)
execute_prometheus_update_callbacks(324-338)new(42-185)lib/runtime/src/lib.rs (3)
execute_prometheus_update_callbacks(129-134)execute_prometheus_exposition_text_callbacks(137-155)new(107-113)
lib/runtime/src/distributed.rs (3)
lib/bindings/python/rust/prometheus_metrics.rs (2)
register_prometheus_update_callback(674-676)register_prometheus_exposition_text_callback(681-715)lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (2)
register_prometheus_update_callback(231-253)register_prometheus_exposition_text_callback(255-276)lib/runtime/src/lib.rs (3)
add_prometheus_update_callback(116-118)clone(173-179)execute_prometheus_update_callbacks(129-134)
lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (2)
lib/bindings/python/rust/prometheus_metrics.rs (66)
name(98-101)name(141-144)name(184-187)name(236-239)name(288-291)name(361-364)name(422-425)name(495-498)name(568-571)const_labels(104-112)const_labels(147-155)const_labels(190-198)const_labels(242-250)const_labels(294-302)const_labels(367-375)const_labels(428-436)const_labels(501-509)const_labels(574-582)inc(115-118)inc(158-161)inc(207-211)inc(259-263)inc(316-319)inc(389-392)inc(458-462)inc(531-535)inc_by(121-124)inc_by(164-167)inc_by(214-218)inc_by(266-270)inc_by(322-325)get(127-129)get(170-172)get(221-224)get(273-276)get(311-313)get(384-386)get(452-455)get(525-528)variable_labels(201-204)variable_labels(253-256)variable_labels(439-442)variable_labels(512-515)dec(328-331)dec(395-398)dec(465-469)dec(538-542)dec_by(334-337)add(340-343)add(401-404)add(472-476)add(545-549)sub(346-349)sub(407-410)sub(479-483)sub(552-556)observe(585-588)register_prometheus_update_callback(674-676)register_prometheus_exposition_text_callback(681-715)create_counter(722-737)create_countervec(741-758)create_gauge(762-778)create_gaugevec(782-799)create_histogram(803-819)create_intcounter(823-839)create_intgauge(864-880)lib/runtime/src/metrics.rs (6)
create_counter(401-408)create_countervec(411-426)create_gauge(429-436)create_histogram(439-447)create_intcounter(450-457)create_intgauge(478-485)
lib/bindings/python/rust/prometheus_metrics.rs (4)
lib/runtime/src/metrics.rs (10)
new(1387-1389)hierarchy(372-378)create_counter(401-408)create_countervec(411-426)create_gauge(429-436)create_histogram(439-447)create_intcounter(450-457)create_intcountervec(460-475)create_intgauge(478-485)create_intgaugevec(488-503)lib/bindings/python/src/dynamo/_prometheus_metrics.pyi (75)
Counter(12-28)IntCounter(126-142)CounterVec(30-49)IntCounterVec(144-163)Gauge(51-82)IntGauge(165-190)GaugeVec(84-112)IntGaugeVec(192-220)Histogram(114-124)name(14-16)name(32-34)name(53-55)name(86-88)name(116-118)name(128-130)name(146-148)name(167-169)name(194-196)const_labels(17-19)const_labels(35-37)const_labels(56-58)const_labels(89-91)const_labels(119-121)const_labels(131-133)const_labels(149-151)const_labels(170-172)const_labels(197-199)inc(20-22)inc(41-43)inc(65-67)inc(101-103)inc(134-136)inc(155-157)inc(179-181)inc(209-211)inc_by(23-25)inc_by(44-46)inc_by(68-70)inc_by(137-139)inc_by(158-160)get(26-28)get(47-49)get(62-64)get(98-100)get(140-142)get(161-163)get(176-178)get(206-208)variable_labels(38-40)variable_labels(92-94)variable_labels(152-154)variable_labels(200-202)dec(71-73)dec(104-106)dec(182-184)dec(212-214)dec_by(74-76)add(77-79)add(107-109)add(185-187)add(215-217)sub(80-82)sub(110-112)sub(188-190)sub(218-220)observe(122-124)register_prometheus_exposition_text_callback(255-276)create_counter(278-280)create_countervec(282-284)create_gauge(286-288)create_histogram(294-296)create_intcounter(298-300)create_intcountervec(302-304)create_intgauge(306-308)create_intgaugevec(310-312)lib/runtime/src/distributed.rs (2)
new(42-185)register_prometheus_exposition_text_callback(341-353)lib/runtime/src/lib.rs (2)
new(107-113)clone(173-179)
🪛 markdownlint-cli2 (0.18.1)
components/src/dynamo/common/prometheus_utils.sglang.md
143-143: Spaces inside emphasis markers
(MD037, no-space-in-emphasis)
lib/bindings/python/examples/metrics/README.md
410-410: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.3)
components/src/dynamo/common/prometheus_utils.py
147-147: Do not catch blind exception: Exception
(BLE001)
148-148: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🔇 Additional comments (5)
lib/runtime/src/lib.rs (5)
89-93: LGTM! Well-designed type aliases.The callback type aliases are clearly named and appropriately use
Arcfor thread-safe sharing. The documentation explains the design rationale well.
99-102: LGTM! Clear separation of callback concerns.The split into separate callback vectors for updates vs. exposition text improves clarity and maintainability.
110-126: LGTM! Straightforward initialization and registration.The constructor and callback registration methods are implemented correctly with clear, simple logic.
129-134: LGTM! Appropriate result collection.Collecting all callback results allows callers to handle errors appropriately without short-circuiting on the first failure. This is a good pattern for metrics updates where partial failures shouldn't block the entire scrape.
176-177: Verify that dropping callbacks on clone is intentional.The
Cloneimplementation explicitly resets callback vectors to empty, meaning clonedMetricsRegistryEntryinstances lose all registered callbacks. While the comment explains this, it could be surprising behavior.Please confirm:
- Is this the intended behavior? (Likely yes, to avoid duplicate callback executions)
- Is this behavior documented in the struct-level documentation?
- Are there use cases where
MetricsRegistryEntryis cloned? If so, do they expect callbacks to be lost?Consider adding a warning to the struct-level doc comments explaining this clone behavior:
/// Structure to hold Prometheus registries and associated callbacks for a given hierarchy /// /// # Clone Behavior /// When cloned, the Prometheus registry is duplicated, but all callbacks are reset to empty. /// This prevents duplicate callback executions when registries are shared. pub struct MetricsRegistryEntry {
bb9efe2 to
4a0b657
Compare
Signed-off-by: Keiven Chang <[email protected]>
Signed-off-by: Keiven Chang <[email protected]>
- Guard SGLang metrics callback setup with is_engine_metrics_callback_enabled() - Set server_args.enable_metrics=True before engine init when callback enabled - Move prometheus_client import to TYPE_CHECKING to avoid early import - Update prometheus_utils docs to simplify example commands Signed-off-by: Keiven Chang <[email protected]>
- Guard SGLang callback with is_engine_metrics_callback_enabled() in init() - Set server_args.enable_metrics=True before engine init when callback enabled - Guard vLLM callback with conditional import and registration - Only enable metrics passthrough when DYN_ENGINE_METRICS_ENABLED=1 Signed-off-by: Keiven Chang <[email protected]>
Signed-off-by: Keiven Chang <[email protected]>
9977072 to
d899b1c
Compare
|
|
||
| ## Overview | ||
|
|
||
| When running vLLM through Dynamo, vLLM engine metrics are automatically passed through and exposed on Dynamo's `/metrics` endpoint (default port 8081). This allows you to access both vLLM engine metrics (prefixed with `vllm:`) and Dynamo runtime metrics (prefixed with `dynamo_*`) from a single worker backend endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we went with vllm: and dynamo_ vs the same convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignorance 😂 I didn't know vLLM and SGLang already emit vllm:something and sglang:something but if I did I would have done the same.
I don't mind updating everything, but that'll have to be a whole different PR. @nnshah1 @grahamking what do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea don't let it block this PR. But if they both do it that way thats what I would prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree not for this PR.
I actually find the vllm:/sglang: strange, because the official prometheus naming convention guide is prefix_ (underscore, not colon) - https://prometheus.io/docs/practices/naming/#metric-names
But if we think in this scenario it makes more sense to look uniform with the frameworks, I wouldn't be opposed.
What does TRTLLM do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to alec's point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRTLLM is in my next TODO, I will find out.
| ```bash | ||
| # Set environment variables | ||
| export DYN_SYSTEM_ENABLED=true | ||
| export DYN_SYSTEM_PORT=8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a dumb question, but why do I need to set two flags. Why can't I just set DYN_SYSTEM_PORT= and we assume if its set then enabled=TRUE? If we can infer that feels simpler and more intuitive than pasting two flags everytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Said another way, is there a time you would set a port and not set enabled=True?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a dumb question, and you bring up a good point, and there's usually a universal answer to all the questions: "... because of dumb historic reasons" 😂
Early on, we had a default (see lib/runtime/src/config.rs, which defaults to 9090), but wasn't sure whether to turn it on by default or not, so it was always left off (with DYN_SYSTEM_ENABLED=false by default).
Then throughout the examples we asked users to switch it on by turning it on to true. Then at some point, people wanted to keep changing port numbers (k8s yaml, etc) because 9090 conflicted with something else. So we started adding examples to override 9090 to 8081 (because at the time, 8080 was the frontend).
I think now is a GREAT time to clean all this up. @nnshah1 @mohammedabdulwahhab @biswapanda , what do you think? Can we just use the port number to turn this on/off? It would clean up quite a bit. Hopefully it won't be too troublesome in your k8s yml files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enabled is legacy to some extent - agree we can use system_port and have a default value ... if it is set to 0 or false we could disable it entirely -
I think seperate PR though -
components/src/dynamo/sglang/main.py
Outdated
| await component.create_service() | ||
|
|
||
| generate_endpoint = component.endpoint(dynamo_args.endpoint) | ||
| generate_endpoint: Endpoint = component.endpoint(dynamo_args.endpoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit same comment as vLLM, not really in line with rest of code to add type hint here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - should we be doing from dynamo.runtime import Endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Yes if needed, dynamo.runtime is more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed mainly python code and vLLM. Lets a number of comments to be addressed.
Move imports to top level in prometheus.py and vllm/main.py. Simplify prometheus.md files by removing verbose examples and detailed metric lists. Remove time estimates and focus on high-level metric categories. Add PROMETHEUS_MULTIPROC_DIR explanation for vLLM only. Signed-off-by: Keiven Chang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think we removed a good 300+ lines from reviews on this one 🚀
Great work @keivenchang this will be amazing for observability in Dynamo!
#3539) Signed-off-by: Keiven Chang <[email protected]> Co-authored-by: Keiven Chang <[email protected]>
#3539) Signed-off-by: Keiven Chang <[email protected]> Co-authored-by: Keiven Chang <[email protected]>
Overview:
This PR adds passthrough of SGLang and vLLM Prometheus metrics to Dynamo backend workers. Engine metrics are now directly exposed via the Dynamo
/metricsendpoint.Details:
prometheus_utils.pymodule for engine metrics text exposition usingregister_prometheus_exposition_text_callback()register_prometheus_exposition_text_callbackto append raw Prometheus text from engine registriesprometheus_prefix for consistency (register_prometheus_update_callback,execute_prometheus_update_callbacks)env_utils.pymatching Rustenv_is_truthy()inlib/runtime/src/config.rsmain.pyto useregister_engine_metrics_callback()withDYN_ENGINE_METRICS_ENABLEDflagWhere should the reviewer start?
components/src/dynamo/common/prometheus_utils.py- New Python utilities for metrics expositionlib/runtime/src/lib.rsandlib/runtime/src/distributed.rs- Rust callback implementationlib/runtime/src/system_status_server.rs- Metrics handler with exposition text callback executioncomponents/src/dynamo/vllm/main.pyandcomponents/src/dynamo/sglang/main.py- Integration examplescomponents/src/dynamo/common/prometheus_utils.*.md- Metrics documentationRelated Issues:
DIS-440, DIS-441, #3407
/coderabbit profile chill
Summary by CodeRabbit
New Features
Documentation
Examples