-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Optimize memory usage around query tracking #27343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
679f9c5 to
30ea39b
Compare
core/trino-main/src/main/java/io/trino/execution/QueryTracker.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/SqlQueryManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/memory/ClusterMemoryManager.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/memory/ClusterMemoryLeakDetector.java
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/memory/ClusterMemoryPool.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/security/AccessControlUtil.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/security/AccessControlUtil.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/server/ui/ClusterStatsResource.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/memory/ClusterMemoryManager.java
Outdated
Show resolved
Hide resolved
testing/trino-tests/src/test/java/io/trino/memory/TestMemoryManager.java
Outdated
Show resolved
Hide resolved
30ea39b to
5f77773
Compare
3d41d08 to
3e46aad
Compare
|
The PR does exactly what the title says, but what the motivation for this change? |
|
@martint we process the queries lists in the streaming fashion. This PR avoids unnecessary copies and reduces memory allocations when working with long query lists. |
core/trino-main/src/main/java/io/trino/memory/ClusterMemoryLeakDetector.java
Show resolved
Hide resolved
|
Where do we see those unnecessary allocations? Is that a problem in practice? A cluster will usually not be dealing with so many queries that that should be visible. BTW, my concern with using Stream in a general purpose API like this is that it forces a style of usage that we don't want to thrust upon on every caller. It may also introduce hidden costs if you need to do something with the results twice, especially if the underlying stream has complex filters/transformations applied to it (or force the caller to materialize into a temporary collection). |
|
@martint I was looking at these while running high concurrency/small queries benchmarks. In query enforcement code (scan, write, memory, cpu limits) we are processing queries like this, effectively always copying a list of queries every 1 s (x4) and then processing queries one-by-one. The only place where we work on the stream twice is the memory leak detector where we run callOomKiller once again over a stream but this happens if we are out of memory (so not every time). |
|
QueryTracker.getAllQueries() doesn't copy the queries: public Collection<T> getAllQueries()
{
return unmodifiableCollection(queries.values());
}So, the copying is confined to the method you posted above. It sounds like we just need a better interaction between that method and ClusterQueryManager. |
3e46aad to
3d72131
Compare
|
@martint what's wrong with this change if the elements are processed one by one and we only do filtering/mapping? |
3d72131 to
ae76d8d
Compare
|
It's too invasive (making SqlQueryTracker return Stream), when we're trying to solve how SqlQueryManager processes the queries and feeds them to the ClusterQueryManager. From an abstraction perspective, there's no inherent "streaminess" or "laziness" to SqlQueryTracker. A Stream is also more constraining. It only allows a simple one-pass computations, and forces callers to adopt a style of processing that may not be appropriate in all situations. Streams are more appropriate for representing computation pipelines, not a data model/data structure. If the caller needs a Stream (for the purpose of filtering/transforming), it can just call stream() on the collection it returns. |
|
@martint I'm not talking about some future applications but what we have right now. I disagree that "streams are representing computational pipelines". In this case, we are exposing an ability to traverse list of queries, rather than a container for these queries (given that this is a snapshot in time since the list changes frequently). From the consumer perspective, this states an intent - "walk through this structure" rather than "stash it/keep it somewhere". Stream having its "walk once" semantics better reflects the fact that this is heavily mutated structure and you can only peek into its current content, right here, right now. |
|
@dain, can you comment on the original intent and abstraction provided by QueryTracker from when you refactored that code a few years ago (if you recall)? |
Quite simply to have a master list of queries, so we could handle all of the complex timeout logic in one, easy to understand, place. In the old days, we had, abandonded query detection, timeout, and history management spread throughtout the even more complex query manager system, and we had tons of bugs (memory leaks). I pulled this part out so that we could test independently and have a better chance of not getting more bugs. For this PR, I'm not sure of the value of this change. It basically forces a Is there some bad behavior users get into with the original design? |
|
@dain no, just optimizing for memory usage by reducing number of unnecessary memory copies/materialization |
|
@wendigo I don't see how this changes "memory copies/materialization" |
this copies running queries to a new list |
|
I assume you are talking about: Which has been updated to: But you could have done the same thing with: So my point is I don't see how changing |
|
@dain agreed but then on every usage site I'm doing ::stream so what's the point? |
There are lots of cases where I have collections an only interact with them as streams, but I don't change them to be |
|
@dain yeah as an internal API it's easy to change it back when usage patter changes |
ae76d8d to
f1d7e90
Compare
To avoid primitive boxing/unboxing
It was only used in tests and no where else
f1d7e90 to
4d9527f
Compare
Description
Additional context and related issues
Release notes
(x) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text: