-
Notifications
You must be signed in to change notification settings - Fork 21
Proposal: Early Compaction of Stale Series from the Head Block #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ganesh Vernekar <[email protected]>
ebbfe83 to
11dd563
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this.
Some questions/suggestions.
I think we can start with tracking those stale series via a metric #55 (comment).
For the rest of the changes, If it's easy to put together, having a PoC will be really helpful to see clearer and start gathering meaningful measurements.
|
Just noticed the feedback @machine424, thanks! I will respond to them soon. In the meantime, I did a PoC on this and here are the results prometheus/prometheus#16929 (comment) I am adding stale series metrics in prometheus/prometheus#16925 which I will finish soon |
|
The stale series tracking part is ready for review at prometheus/prometheus#16925 Fairly straightforward that should not block on any designing (considers only stale samples for now). |
|
@codesome Having used the similar early head compaction in Mimir, this is nice to see. Even with early compaction though, we still have this period of time when the old and new series are both in memory, which can lead to large spikes in resource usage, even if they're temporary. For the use case you described, where a rollout happens and some new series are sent that directly replace some old series, it would be great if Prometheus could be made to understand which new series replace which old series, so that fewer resources would be needed internally to track them both (in theory there should be no overlap in samples between two series). This could take the shape of a separate API that allows prometheus to be made aware of some relabeling before the new series are pushed. Is this something you've thought about? |
|
Prometheus already handles directly replacement series by matching the labels and computing the same internal series ID. It simply can mark the series as not stale. |
|
@SuperQ I was thinking of something slightly different based on the scenario described in this proposal. For example, after a rollout, some new series could be created, ex: I suspect this is a hard problem since replacements may not always be 1:1, but given the resource spikes that can happen when large numbers of series churn, I thought it was worth mentioning. |
|
Unfortunately, what you are proposing won't work. Those are different series. New instances of processes need to be separated, otherwise you can end up with signal attribution that should not happen. I get what you're trying to do, but it's not workable in reality. This proposal solves the "GC" problem that occurs when large numbers of metrics churn. There are also other proposals we are working on that will further improve things without the need for magic. |
|
I have got the code for this in a ready state now prometheus/prometheus#16929 which I will test it in our prod traffic |
Signed-off-by: Ganesh Vernekar <[email protected]>
|
@machine424 @SuperQ I have updated the proposal based on the feedback and also added a solution to the WAL replay (which I have implemented in prometheus/prometheus#16929). |
|
Do you mind a quick rebase? We just fixed CI. |
Signed-off-by: Ganesh Vernekar <[email protected]>
|
@machine424 @SuperQ do you have any more comments on this? cc @bboreham |
|
Folks, any further comments? Are we good to ✅ and try this out? It would be nice to unblock this and develop this behind a feature flag. |
Signed-off-by: Ganesh Vernekar <[email protected]>
|
After running prometheus/prometheus#16929 internally at Reddit for 2+ weeks, I have updated the proposal to add the experimental analysis and the trade-offs, and I have also simplified the config to only have the immediate trigger, since the other one was of no practical use from the experiments (it can actually work the opposite and lead users to use it wrongly). @machine424 @jhalterman @SuperQ let me know if you have any further queries. Would like to get it merged soon and implementation polished. |
jesusvazquez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Ganesh Vernekar <[email protected]>
For prometheus/prometheus#13616
Code is ready in prometheus/prometheus#16929