Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak from new rename feature #95

Closed
jeblair opened this issue Mar 2, 2025 · 6 comments · Fixed by #97
Closed

Memory leak from new rename feature #95

jeblair opened this issue Mar 2, 2025 · 6 comments · Fixed by #97
Assignees
Labels
bug Something isn't working

Comments

@jeblair
Copy link

jeblair commented Mar 2, 2025

The new rename feature in #89 appears to cause a memory leak.

It looks like every sample we receive is appended to a list here:

metric_refs[original_topic].append((prom_metric_id, labels))

And also again a few lines below.

One solution to this would be to change the metric_refs data structure to avoid duplication.

But would it be easier to use .clear() to clear all the labelsets from the metric instead of our needing to remember them all?

https://github.com/prometheus/client_python/blob/92b23970f032cbc990aa0e501708c425708e51ea/prometheus_client/metrics.py#L206

Then we can just rely on the prometheus client knowing what needs to be removed instead of keeping track in the exporter.

@jeblair jeblair added the bug Something isn't working label Mar 2, 2025
@kpetremann
Copy link
Owner

kpetremann commented Mar 2, 2025

Hi @jeblair,

Thanks for reporting this.

However, I may be wrong, but I am not sure it can be qualified as a memory leak.
The exporter keep metrics in memory on purpose as the goal is to expose the last known value for sensors or any other IOT device.

Also, unless I am wrong, the issue you are reporting was already existing before this PR as it keeps metrics in Prometheus registry indefinitely. This PR just make it worse, am I right?

May I ask:

  • what is your usage of this exporter?
  • how big is the memory usage on your side? on my side it is about 2MB and does not evolve (unless I am adding a new IOT sensor in my network).
  • Do you confirm you had no issue in the past? or are you a new user of this exporter?

But would it be easier to use .clear() to clear all the labelsets from the metric instead of our needing to remember them all?

The metrics are never removed, unless there is a device renaming event in zigbee2mqtt (example here: https://github.com/kpetremann/mqtt-exporter/pull/89/files#diff-6457339f71452db172c35bd245b87a4593d9a6a59cad2f59f4d4131a3c470f05R431-R435)

One way to "fix" this would be to clear all metrics upon collection, but it could break many usage, like sensors with low rate of changes/reporting. Graphs would become "dots" instead of lines. Same for alerts, it would break.

Thanks!

@jeblair
Copy link
Author

jeblair commented Mar 2, 2025

Sorry if I wasn't clear. I do understand about the normal memory usage of both the prometheus client library and the mqtt exporter in terms of what they need to keep in memory in order to serve the scrape endpoint.

what is your usage of this exporter?

Home monitoring (zigbee2mqtt and esphome sensors).

how big is the memory usage on your side? on my side it is about 2MB and does not evolve (unless I am adding a new IOT sensor in my network).

It grew to 500MB before the oomkiller got it. I'm not sure what the previous steady-state size was.

Do you confirm you had no issue in the past? or are you a new user of this exporter?

Yes, I've been using older versions without a problem for about a year (thanks for the program, btw!). I only noticed swapping and the oomkiller on a recent upgrade to 1.6.1. I just reverted to 1.5.0; not enough time has passed to confirm the problem isn't present (it takes about 24 hours to become a problem).

The specific memory leak I'm referring to is that after #89, the exporter appends a tuple of (PromMetricId, labels) for every sample that it receives. To be clear, the issue is not that we're storing information for every metric, it's that we're appending this tuple to a list for every value.

In order to see the leak in action, you may need a fairly busy message bus. I have sensors that emit several values every second, which means the append in line 154 runs every second for all of those values.

I think in #89 the intent was to store all of the labels for all of the metrics, so they could be removed on rename. I think the bug is that we're storing a copy of that every time we receive a value, instead of just storing unique metric+label pairs.

@kpetremann
Copy link
Owner

kpetremann commented Mar 2, 2025

many thanks for all the info.

thanks for the program, btw!

You are welcome :)

the exporter appends a tuple of (PromMetricId, labels) for every sample that it receives

indeed, it is a cause good candidate.

It grew to 500MB before the oomkiller got it

oh my, ok this is definitely a memory leak issue :/ sorry about that!
Would it be possible for you to test with this branch: https://github.com/kpetremann/mqtt-exporter/tree/fix_ref_mem_leak

I'll try to free more time, if it does not fix the issue.

@jeblair
Copy link
Author

jeblair commented Mar 2, 2025

oh my, ok this is definitely a memory leak issue :/ sorry about that! Would it be possible for you to test with this branch: https://github.com/kpetremann/mqtt-exporter/tree/fix_ref_mem_leak

I'll try to free more time, if it does not fix the issue.

Sure thing, running it now, thanks. I'll report back tomorrow.

ps output at start:

   PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
888275  1.7  2.3  28084 22692 ?        Ssl  20:16   0:01 python /opt/mqtt-exporter/exporter.py

@jeblair
Copy link
Author

jeblair commented Mar 4, 2025

Sorry for the late reply, but 2 days later, it's looking good:

   PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
888275  0.9  1.6  30328 16136 ?        Ssl  Mar02  29:06 python /opt/mqtt-exporter/exporter.py

@kpetremann
Copy link
Owner

this very good news :)
I'll merge the code to the master then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants