lazily initialize ReservoirCells #6851

edma2 · 2024-11-06T00:20:47Z

Resolves #5581.

In high cardinality environments (~1M+ time series), this is taking up a non-trivial amount of space. This is made even worse on machines with a high number of CPU, as the size of ReservoirCell[] depends on the number of cores. Each cell is ~50 bytes here. On a 17 core machine the overhead is almost 1 KB/metric.

In the heap dump below it accounts for roughly 37% of all io.opentelemetry metric objects.

We completely disable exemplar sampling so we'd like this overhead be closer to zero.

The solution here is based on a comment in #5581, which doesn't initialize the cell if it hasn't received any measurements.

linux-foundation-easycla · 2024-11-06T00:20:51Z

The committers listed above are authorized under a signed CLA.

✅ login: edma2 / name: Eugene Ma (e2821dd, a25b2e2, 6ec011b)

codecov · 2024-11-06T00:38:10Z

Codecov Report

Attention: Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.28%. Comparing base (ef03239) to head (e2821dd).
Report is 6 commits behind head on main.

Files with missing lines	Patch %	Lines
.../internal/exemplar/FixedSizeExemplarReservoir.java	91.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #6851      +/-   ##
============================================
- Coverage     90.49%   90.28%   -0.22%     
+ Complexity     6599     6591       -8     
============================================
  Files           731      729       -2     
  Lines         19738    19776      +38     
  Branches       1938     1944       +6     
============================================
- Hits          17862    17854       -8     
- Misses         1285     1327      +42     
- Partials        591      595       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

...src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/FixedSizeExemplarReservoir.java

jack-berg

This seems like a good idea. Exemplar reservoirs are initialized for every distinct series an instrument encounters. But the SDK may have an exemplar filter configured that results in the no measurements ever being offered to the reservoir. No reason to pay for allocating the cells which are never used.

The argument against lazy initialization is that it changes the application from allocating on initialization, to allocating when measurements are recorded. But the allocation should happen very early in an app's lifecycle so it shouldn't take long to reach steady state.

jsuereth · 2024-11-07T15:11:11Z

This seems like a good idea. Exemplar reservoirs are initialized for every distinct series an instrument encounters. But the SDK may have an exemplar filter configured that results in the no measurements ever being offered to the reservoir. No reason to pay for allocating the cells which are never used.

It was actually a conscious choice to allocate initially and try to avoid allocations later. The goal is that you "pay" for all your metric memory overhead EARLY and then don't risk allocations in hot paths later. Generally - I think we should push for metrics to have a steady-state of memory usage in "hot paths".

If we're seeing a LOT of wasted allocations, generally, I'd be on board. I agree we should stabilize over time to steady-state allocations, but how bad is the problem today? Would like more motivation for this change though (at least a description of why).

jack-berg · 2024-11-07T16:04:05Z

If we're seeing a LOT of wasted allocations, generally, I'd be on board. I agree we should stabilize over time to steady-state allocations, but how bad is the problem today? Would like more motivation for this change though (at least a description of why).

I imagine we are in situations when exemplars are turned off or are trace_based, but the instrument never has sampled spans active.

To see why, check out the code for SumAggregation (the other aggregations follow the same pattern):

A Supplier<ExemplarReservoir<?> is passed to the aggregator at initialization
This supplier is invoked with each distinct series the aggregator encounters
The supplier always instantiates a reservoir, regardless of the ExemplarFilter
This means that reservoir is initialized w/ memory allocated even when ExemplarFilter.alwaysOff() is used

We could refactor the code to avoid allocation in the always off case, but I don't see an easy way to avoid unnecessary allocation (besides lazy initialization) in the case where the exemplar filter is trace based but the instrument never has measurements with sampled spans.

edma2 · 2024-11-07T19:13:10Z

@jsuereth @jack-berg thank you for the early reviews. I added more detail in the PR description to justify why we need this change.

…l/exemplar/FixedSizeExemplarReservoir.java Remove extraneous check on storage being null if there are no measurements. Co-authored-by: jack-berg <[email protected]>

edma2 · 2024-11-07T19:23:00Z

It was actually a conscious choice to allocate initially and try to avoid allocations later. The goal is that you "pay" for all your metric memory overhead EARLY and then don't risk allocations in hot paths later. Generally - I think we should push for metrics to have a steady-state of memory usage in "hot paths".

FWIW I'm not seeing this show up as an allocation hot spot in my environment, and my service is constantly creating new time series in "steady state" (e.g. high churn). You only pay for the allocation once/metric so that's maybe why.

jack-berg · 2024-11-08T15:15:27Z

You only pay for the allocation once/metric so that's maybe why.

Ah.. I forgot that we reuse AggregatorHandles (and their associated ExemplarReservoir) to reduce allocations 🤦.

I would say this slightly diminishes the usefulness of lazy initialization of the cells, but its still true that an SDK would benefit from lazy initialization when the exemplar filter is always_off or trace_based` (and the instrument receives no measurements with sampled spans). I wouldn't expect to see any difference in memory churn, but I would expect to see an SDK with lazy reservoir cell initialization to have slightly lower overall memory, since it would avoid ever allocating for reservoir cells in these situations.

edma2 · 2024-11-08T22:53:31Z

I wouldn't expect to see any difference in memory churn, but I would expect to see an SDK with lazy reservoir cell initialization to have slightly lower overall memory, since it would avoid ever allocating for reservoir cells in these situations.

Yes, what we're looking to avoid mostly is leaving a bunch of long-tenured exemplar objects on the heap that we never use, rather than the allocation itself (though that is a bonus).

edma2 · 2024-11-12T17:45:05Z

@jsuereth @jack-berg see the additional context/motivation in the PR description and comments. Let me know if there's anything else you'd like to see to move the PR forward.

edma2 added 2 commits November 5, 2024 16:24

initial implementation of lazy storage

a25b2e2

style changes

6ec011b

edma2 force-pushed the lazy-init-reservoir branch from 803ba3d to 6ec011b Compare November 6, 2024 00:24

jack-berg reviewed Nov 6, 2024

View reviewed changes

...src/main/java/io/opentelemetry/sdk/metrics/internal/exemplar/FixedSizeExemplarReservoir.java Outdated Show resolved Hide resolved

jack-berg approved these changes Nov 6, 2024

View reviewed changes

edma2 marked this pull request as ready for review November 7, 2024 19:12

edma2 requested a review from a team as a code owner November 7, 2024 19:12

Update sdk/metrics/src/main/java/io/opentelemetry/sdk/metrics/interna…

e2821dd

…l/exemplar/FixedSizeExemplarReservoir.java Remove extraneous check on storage being null if there are no measurements. Co-authored-by: jack-berg <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lazily initialize ReservoirCells #6851

lazily initialize ReservoirCells #6851

edma2 commented Nov 6, 2024 •

edited

Loading

linux-foundation-easycla bot commented Nov 6, 2024 •

edited

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading

jack-berg left a comment

jsuereth commented Nov 7, 2024

jack-berg commented Nov 7, 2024

edma2 commented Nov 7, 2024

edma2 commented Nov 7, 2024

jack-berg commented Nov 8, 2024

edma2 commented Nov 8, 2024

edma2 commented Nov 12, 2024

lazily initialize ReservoirCells #6851

Are you sure you want to change the base?

lazily initialize ReservoirCells #6851

Conversation

edma2 commented Nov 6, 2024 • edited Loading

linux-foundation-easycla bot commented Nov 6, 2024 • edited Loading

codecov bot commented Nov 6, 2024 • edited Loading

Codecov Report

jack-berg left a comment

Choose a reason for hiding this comment

jsuereth commented Nov 7, 2024

jack-berg commented Nov 7, 2024

edma2 commented Nov 7, 2024

edma2 commented Nov 7, 2024

jack-berg commented Nov 8, 2024

edma2 commented Nov 8, 2024

edma2 commented Nov 12, 2024

edma2 commented Nov 6, 2024 •

edited

Loading

linux-foundation-easycla bot commented Nov 6, 2024 •

edited

Loading

codecov bot commented Nov 6, 2024 •

edited

Loading