Missing Granular Metrics for upstream_addr and upstream_response_time #14374

JDarzan · 2025-03-19T23:33:35Z

I am using Kong API Gateway Enterprise (via Konnect Hybrid mode) and Self-Managed servers.

My goal is to monitor individual targets within an upstream to diagnose performance bottlenecks. However, I have realized that there is no native way in Kong to capture detailed metrics for each specific target within an upstream. Both in Datadog and Prometheus, I can see aggregated upstream metrics, but I cannot obtain per-target granularity.

Currently, I can access metrics such as kong.http.requests.count, kong.upstream.latency.ms.bucket, kong.upstream.latency.ms.count, and kong.upstream.latency.ms.sum, as well as general latency metrics like kongdd.upstream_latency.avg and kongdd.upstream_latency.max
However, all these metrics refer to the upstream as a whole, not to each individual target. The only per-target information available in Prometheus is the kong_upstream_target_health metric, which only indicates whether the target is healthy or unhealthy, without any visibility into the number of requests received or the individual response time.

In Kong’s access logs, both upstream_addr and upstream_response_time appear correctly, confirming that Kong knows which target was used and how long it took to respond. However, there is no native way to convert this information into metrics consumable by Datadog or Prometheus. I have attempted multiple approaches, such as modifying the Datadog plugin to include upstream_addr as a tag, creating a Kong Post-Function plugin to add an X-Upstream-Addr header to the response, and even storing the information in kong.ctx.shared within a Pre-Function plugin, but in all cases, the values were not available in the logging phase.

Since Kong already has internal access to upstream_addr and upstream_response_time, why is it so difficult to expose them as metrics? The lack of this granularity makes it challenging to monitor individual targets within an upstream, preventing precise identification of specific instances that may be causing performance issues. Is there a technical limitation preventing this feature from being implemented, or is there a recommended approach to efficiently work around this problem within Kong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Granular Metrics for upstream_addr and upstream_response_time #14374

Missing Granular Metrics for upstream_addr and upstream_response_time #14374

JDarzan commented Mar 19, 2025

Missing Granular Metrics for upstream_addr and upstream_response_time #14374

Missing Granular Metrics for upstream_addr and upstream_response_time #14374

Comments

JDarzan commented Mar 19, 2025