-
Notifications
You must be signed in to change notification settings - Fork 177
Description
Both summaries and histograms have a _count and a _sum “field”. They are often both considered counters in every aspects, but both summaries and histograms allow negative observations, which may lead to the _sum going down even without a counter reset. Thus, an actual counter reset in the _sum series has to be determined by looking at the value for _count going down.
Prometheus has never done that, which is a known but mostly ignored bug. (Negative observations are rare in practice, which makes the impact of the bug, once they are happening, even more surprising and hard to spot.) The reason why it is so hard to fix the bug in Prometheus is that Prometheus has no internal notion of the _count and the _sum of a particular histogram or summary belonging together. It would require either a quite severe layering violation or a new PromQL function (rate_ratio or something) or a fundamentally overhauled internal data model to fix the bug.
However, this problem of Prometheus shouldn't leak out to all users of OpenMetrics. Thus, I suggest to simply include in the OpenMetrics spec how counter resets in _sum are to be detected.