-
Notifications
You must be signed in to change notification settings - Fork 5.4k
[native] Track CPU & Memory overload in native worker. #24949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
788ce9d
to
fb795d3
Compare
<< ", threshold: " | ||
<< velox::succinctBytes(overloadedThresholdMemBytes); | ||
} | ||
RECORD_METRIC_VALUE(kCounterOverloadedMem, isMemOverloaded ? 100 : 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why is 0 and 100, but not a range of values in between. or why not just 0 and 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkdutta
Because when the counters backend starts averaging the signal over some time period, say 60 seconds the 1 will turn into 0 and we won't see any signal.
100 however won't have such issue and will be easy to reason about how much % of the period the system was overloaded if the number is below 100.
**memoryInfo_.wlock() = std::move(memoryInfo); | ||
} | ||
|
||
void PrestoServer::checkOverload() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe offloading all these in a separate class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tanjialiang
I would rather keep this logic here.
Next we will need Server to tell TaskManager that we are overloaded or no longer, so it can act on the Task queue.
Doing this in some another side class seems wrong.
Spinning up the class to just decide if we are overloaded based on a couple of metrics seems too much.
If that somehow grows over time, then we can refactor.
New configuration properties should be documented. Suggest adding new doc in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/presto_cpp/properties.rst. |
Description
We introduce new set of config properties to setup cpu and memory use thresholds above which we will consider worker overloaded.
Default values are zero, meaning we are not tracking it.
Motivation and Context
We want to spin up a system where Presto is aware of the workers' load and acts accordingly.
Impact
The change is harmless - only tracking the overload (optionally).