Proposal: Early Compaction of Stale Series from the Head Block #55

codesome · 2025-07-04T00:10:22Z

Code is ready in prometheus/prometheus#16929

proposals/0055-stale-series-compaction.md

Signed-off-by: Ganesh Vernekar <[email protected]>

machine424

Thanks for this.
Some questions/suggestions.
I think we can start with tracking those stale series via a metric #55 (comment).

For the rest of the changes, If it's easy to put together, having a PoC will be really helpful to see clearer and start gathering meaningful measurements.

proposals/0055-stale-series-compaction.md

codesome · 2025-07-29T03:31:33Z

Just noticed the feedback @machine424, thanks! I will respond to them soon.

In the meantime, I did a PoC on this and here are the results prometheus/prometheus#16929 (comment)

I am adding stale series metrics in prometheus/prometheus#16925 which I will finish soon

codesome · 2025-08-06T23:38:05Z

The stale series tracking part is ready for review at prometheus/prometheus#16925

Fairly straightforward that should not block on any designing (considers only stale samples for now).

jhalterman · 2025-08-12T00:10:26Z

@codesome Having used the similar early head compaction in Mimir, this is nice to see.

Even with early compaction though, we still have this period of time when the old and new series are both in memory, which can lead to large spikes in resource usage, even if they're temporary. For the use case you described, where a rollout happens and some new series are sent that directly replace some old series, it would be great if Prometheus could be made to understand which new series replace which old series, so that fewer resources would be needed internally to track them both (in theory there should be no overlap in samples between two series). This could take the shape of a separate API that allows prometheus to be made aware of some relabeling before the new series are pushed. Is this something you've thought about?

SuperQ · 2025-08-12T05:14:35Z

Prometheus already handles directly replacement series by matching the labels and computing the same internal series ID. It simply can mark the series as not stale.

jhalterman · 2025-08-12T05:42:11Z

@SuperQ I was thinking of something slightly different based on the scenario described in this proposal. For example, after a rollout, some new series could be created, ex: foo{pod="bar2"} which effectively replaces foo{pod="bar1"}. At present, even with early compaction, we'd have 2 series in memory for some time. But if we could communicate that something churned via an API, and that one series replaces another, perhaps there could be some savings.

I suspect this is a hard problem since replacements may not always be 1:1, but given the resource spikes that can happen when large numbers of series churn, I thought it was worth mentioning.

SuperQ · 2025-08-12T06:13:29Z

Unfortunately, what you are proposing won't work.

Those are different series. New instances of processes need to be separated, otherwise you can end up with signal attribution that should not happen.

I get what you're trying to do, but it's not workable in reality.

This proposal solves the "GC" problem that occurs when large numbers of metrics churn.

There are also other proposals we are working on that will further improve things without the need for magic.

codesome · 2025-08-25T22:16:50Z

I have got the code for this in a ready state now prometheus/prometheus#16929 which I will test it in our prod traffic

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2025-08-29T00:04:40Z

@machine424 @SuperQ I have updated the proposal based on the feedback and also added a solution to the WAL replay (which I have implemented in prometheus/prometheus#16929).

bwplotka · 2025-09-09T10:56:28Z

Do you mind a quick rebase? We just fixed CI.

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2025-09-11T01:48:54Z

@machine424 @SuperQ do you have any more comments on this? cc @bboreham

codesome · 2025-11-02T10:18:43Z

Folks, any further comments? Are we good to ✅ and try this out? It would be nice to unblock this and develop this behind a feature flag.

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome · 2025-11-26T02:27:56Z

After running prometheus/prometheus#16929 internally at Reddit for 2+ weeks, I have updated the proposal to add the experimental analysis and the trade-offs, and I have also simplified the config to only have the immediate trigger, since the other one was of no practical use from the experiments (it can actually work the opposite and lead users to use it wrongly).

@machine424 @jhalterman @SuperQ let me know if you have any further queries. Would like to get it merged soon and implementation polished.

jesusvazquez

LGTM

proposals/0055-stale-series-compaction.md

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome requested review from bboreham, bwplotka and jesusvazquez July 4, 2025 00:12

codesome mentioned this pull request Jul 4, 2025

Eager compaction of stale series prometheus/prometheus#13616

Open

SuperQ reviewed Jul 4, 2025

View reviewed changes

proposals/0055-stale-series-compaction.md Outdated Show resolved Hide resolved

Proposal: Early Compaction of Stale Series from the Head Block

11dd563

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome force-pushed the codesome/stale-series-compaction branch from ebbfe83 to 11dd563 Compare July 8, 2025 19:21

machine424 reviewed Jul 15, 2025

View reviewed changes

This was referenced Jul 25, 2025

tsdb: Track stale series in the Head block based on stale sample prometheus/prometheus#16925

Merged

tsdb: Early compaction of stale series prometheus/prometheus#16929

Open

Update proposal based on feedback and add WAL replay logic

7cbf88e

Signed-off-by: Ganesh Vernekar <[email protected]>

bwplotka added the proposal label Sep 9, 2025

codesome added 2 commits September 10, 2025 09:38

Merge branch 'main' into codesome/stale-series-compaction

51d3bcb

make fmt

b3e100a

Signed-off-by: Ganesh Vernekar <[email protected]>

SuperQ approved these changes Nov 2, 2025

View reviewed changes

Simplify the configuration and add experimental analysis

75b3298

Signed-off-by: Ganesh Vernekar <[email protected]>

jesusvazquez approved these changes Nov 26, 2025

View reviewed changes

proposals/0055-stale-series-compaction.md Outdated Show resolved Hide resolved

proposals/0055-stale-series-compaction.md Show resolved Hide resolved

Fix review comment

81852fc

Signed-off-by: Ganesh Vernekar <[email protected]>

codesome merged commit 2c8e551 into main Nov 26, 2025
2 checks passed

codesome deleted the codesome/stale-series-compaction branch November 26, 2025 16:20

Proposal: Early Compaction of Stale Series from the Head Block #55

Proposal: Early Compaction of Stale Series from the Head Block #55

Uh oh!

Conversation

codesome commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

machine424 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codesome commented Jul 29, 2025

Uh oh!

codesome commented Aug 6, 2025

Uh oh!

jhalterman commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SuperQ commented Aug 12, 2025

Uh oh!

jhalterman commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SuperQ commented Aug 12, 2025

Uh oh!

codesome commented Aug 25, 2025

Uh oh!

codesome commented Aug 29, 2025

Uh oh!

bwplotka commented Sep 9, 2025

Uh oh!

codesome commented Sep 11, 2025

Uh oh!

codesome commented Nov 2, 2025

Uh oh!

codesome commented Nov 26, 2025

Uh oh!

jesusvazquez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

codesome commented Jul 4, 2025 •

edited

Loading

machine424 left a comment •

edited

Loading

jhalterman commented Aug 12, 2025 •

edited

Loading

jhalterman commented Aug 12, 2025 •

edited

Loading