Skip to content

Clarify that counter resets in _sum are to be detected in _count #143

@beorn7

Description

@beorn7

Both summaries and histograms have a _count and a _sum “field”. They are often both considered counters in every aspects, but both summaries and histograms allow negative observations, which may lead to the _sum going down even without a counter reset. Thus, an actual counter reset in the _sum series has to be determined by looking at the value for _count going down.

Prometheus has never done that, which is a known but mostly ignored bug. (Negative observations are rare in practice, which makes the impact of the bug, once they are happening, even more surprising and hard to spot.) The reason why it is so hard to fix the bug in Prometheus is that Prometheus has no internal notion of the _count and the _sum of a particular histogram or summary belonging together. It would require either a quite severe layering violation or a new PromQL function (rate_ratio or something) or a fundamentally overhauled internal data model to fix the bug.

However, this problem of Prometheus shouldn't leak out to all users of OpenMetrics. Thus, I suggest to simply include in the OpenMetrics spec how counter resets in _sum are to be detected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions