-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lazily initialize ReservoirCells #6851
base: main
Are you sure you want to change the base?
Conversation
803ba3d
to
6ec011b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6851 +/- ##
============================================
- Coverage 90.49% 90.28% -0.22%
+ Complexity 6599 6591 -8
============================================
Files 731 729 -2
Lines 19738 19776 +38
Branches 1938 1944 +6
============================================
- Hits 17862 17854 -8
- Misses 1285 1327 +42
- Partials 591 595 +4 ☔ View full report in Codecov by Sentry. |
...src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/FixedSizeExemplarReservoir.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good idea. Exemplar reservoirs are initialized for every distinct series an instrument encounters. But the SDK may have an exemplar filter configured that results in the no measurements ever being offered to the reservoir. No reason to pay for allocating the cells which are never used.
The argument against lazy initialization is that it changes the application from allocating on initialization, to allocating when measurements are recorded. But the allocation should happen very early in an app's lifecycle so it shouldn't take long to reach steady state.
It was actually a conscious choice to allocate initially and try to avoid allocations later. The goal is that you "pay" for all your metric memory overhead EARLY and then don't risk allocations in hot paths later. Generally - I think we should push for metrics to have a steady-state of memory usage in "hot paths". If we're seeing a LOT of wasted allocations, generally, I'd be on board. I agree we should stabilize over time to steady-state allocations, but how bad is the problem today? Would like more motivation for this change though (at least a description of why). |
I imagine we are in situations when exemplars are turned off or are To see why, check out the code for SumAggregation (the other aggregations follow the same pattern):
We could refactor the code to avoid allocation in the always off case, but I don't see an easy way to avoid unnecessary allocation (besides lazy initialization) in the case where the exemplar filter is trace based but the instrument never has measurements with sampled spans. |
@jsuereth @jack-berg thank you for the early reviews. I added more detail in the PR description to justify why we need this change. |
…l/exemplar/FixedSizeExemplarReservoir.java Remove extraneous check on storage being null if there are no measurements. Co-authored-by: jack-berg <[email protected]>
FWIW I'm not seeing this show up as an allocation hot spot in my environment, and my service is constantly creating new time series in "steady state" (e.g. high churn). You only pay for the allocation once/metric so that's maybe why. |
Ah.. I forgot that we reuse I would say this slightly diminishes the usefulness of lazy initialization of the cells, but its still true that an SDK would benefit from lazy initialization when the exemplar filter is |
Yes, what we're looking to avoid mostly is leaving a bunch of long-tenured exemplar objects on the heap that we never use, rather than the allocation itself (though that is a bonus). |
@jsuereth @jack-berg see the additional context/motivation in the PR description and comments. Let me know if there's anything else you'd like to see to move the PR forward. |
Resolves #5581.
In high cardinality environments (~1M+ time series), this is taking up a non-trivial amount of space. This is made even worse on machines with a high number of CPU, as the size of
ReservoirCell[]
depends on the number of cores. Each cell is ~50 bytes here. On a 17 core machine the overhead is almost 1 KB/metric.In the heap dump below it accounts for roughly 37% of all
io.opentelemetry
metric objects.We completely disable exemplar sampling so we'd like this overhead be closer to zero.
The solution here is based on a comment in #5581, which doesn't initialize the cell if it hasn't received any measurements.