-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak from new rename feature #95
Comments
Hi @jeblair, Thanks for reporting this. However, I may be wrong, but I am not sure it can be qualified as a memory leak. Also, unless I am wrong, the issue you are reporting was already existing before this PR as it keeps metrics in Prometheus registry indefinitely. This PR just make it worse, am I right? May I ask:
The metrics are never removed, unless there is a device renaming event in zigbee2mqtt (example here: https://github.com/kpetremann/mqtt-exporter/pull/89/files#diff-6457339f71452db172c35bd245b87a4593d9a6a59cad2f59f4d4131a3c470f05R431-R435) One way to "fix" this would be to clear all metrics upon collection, but it could break many usage, like sensors with low rate of changes/reporting. Graphs would become "dots" instead of lines. Same for alerts, it would break. Thanks! |
Sorry if I wasn't clear. I do understand about the normal memory usage of both the prometheus client library and the mqtt exporter in terms of what they need to keep in memory in order to serve the scrape endpoint.
Home monitoring (zigbee2mqtt and esphome sensors).
It grew to 500MB before the oomkiller got it. I'm not sure what the previous steady-state size was.
Yes, I've been using older versions without a problem for about a year (thanks for the program, btw!). I only noticed swapping and the oomkiller on a recent upgrade to 1.6.1. I just reverted to 1.5.0; not enough time has passed to confirm the problem isn't present (it takes about 24 hours to become a problem). The specific memory leak I'm referring to is that after #89, the exporter appends a tuple of (PromMetricId, labels) for every sample that it receives. To be clear, the issue is not that we're storing information for every metric, it's that we're appending this tuple to a list for every value. In order to see the leak in action, you may need a fairly busy message bus. I have sensors that emit several values every second, which means the append in line 154 runs every second for all of those values. I think in #89 the intent was to store all of the labels for all of the metrics, so they could be removed on rename. I think the bug is that we're storing a copy of that every time we receive a value, instead of just storing unique metric+label pairs. |
many thanks for all the info.
You are welcome :)
indeed, it is a cause good candidate.
oh my, ok this is definitely a memory leak issue :/ sorry about that! I'll try to free more time, if it does not fix the issue. |
Sure thing, running it now, thanks. I'll report back tomorrow.
|
Sorry for the late reply, but 2 days later, it's looking good:
|
this very good news :) |
The new rename feature in #89 appears to cause a memory leak.
It looks like every sample we receive is appended to a list here:
mqtt-exporter/mqtt_exporter/main.py
Line 154 in a15e01e
And also again a few lines below.
One solution to this would be to change the metric_refs data structure to avoid duplication.
But would it be easier to use .clear() to clear all the labelsets from the metric instead of our needing to remember them all?
https://github.com/prometheus/client_python/blob/92b23970f032cbc990aa0e501708c425708e51ea/prometheus_client/metrics.py#L206
Then we can just rely on the prometheus client knowing what needs to be removed instead of keeping track in the exporter.
The text was updated successfully, but these errors were encountered: