-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPA - Document the current recommendation algorithm #2747
Comments
@bskiba any updates on this? |
This would be awesome!! How can I help? |
I've been digging around the code for a bit, this is what I understand so far, please correct me where I'm wrong 😃 To answer this:
recommendations are calculated using decaying histogram of weighted samples from the metrics server, where the newer samples are assigned higher weights; older samples are decaying and hence affect less and less w.r.t. to the recommendations. CPU is calculated using the 90th percentile of all cpu samples, and memory is calculated using the 90th percentile peak over the 8 day window.
8 days of history is used for recommendation (1 memory usage sample per day). Prometheus can be used a history provider in this calculation. By default, vpa is collecting data about all controllers, so when new vpa objects are created, they are already providing stable recommendations (unless you specify
saw this in the code here 🙃 ...I'm not sure if it's possible to get a "stable" recommendation before 8 days... |
@yashbhutwala Great Summary!! I am getting a huge upper bound for my recommendation at the startup and trying to understand the behavior. Below is the VPA object.
I don't want to set an upper limit in the VPA object. I don't have checkpoints as history is loaded from prometheus server. But I noticed this huge upperbound (numbers may be slightly different ) irrespective of whether I load from check point or prometheus. Can you tell why does the algorithm give such a high upper bound? Also, there are no OOM events in VPA recommender logs. I did the same experiment without prometheus server and got similar numbers. I checked the checkpoint of VPA The surprising case is there is no memory histogram. Is this because it will only appear after 24 hr? I deleted the VPA object, checkpoint and then restarted the VPA object but still getting huge upper bounds after 2 hours of startup. How is it recommending memory without any histogram? Can you please answer this? |
@djjayeeta good questions!! I'm not an expert here, but as far as I understand it, the most important value for you to look at is Target. This is the recommendation given for you to set the requests to. There is no limit recommendation given by VPA currently. Lower and Upper bound are meant to only be used by the VPA updater to allow pod to be running if the requests are in that range; and not evict them. For upper bound, I suspect this is set by default to node's capacity (as in your case this is 16Gi). just fyi, Uncapped Target gives the recommendation before applying constraints specified in the VPA spec, such as min or max. With this in mind, in your case, the target of
Yes, it samples the peak per day |
@yashbhutwala, thanks for taking time to answer here, your answer is very precise. 👍 |
@djjayeeta high upper bound at startup would be due to confidence factor scaling. With more data, it will go closer towards the 95th percentile.
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Hi, Can I work on this? I think there is enough information added in the comment: #2747 (comment) and #2747 (comment) . I can rephrase it and add in the FAQ: https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/FAQ.md |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Hi @yashbhutwala, is it still the case that the VPA does not recommend the resource limits and only the requests? |
@Duske yes. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I do not find relevant instructions of Recommender Components algorithm in the project. It is still a little complicated to learn from the code. Is there no documentation of the algorithm in the community? eg: |
I agree. I would like to know what heuristics are applied in the algorithm to ensure that correct target values are sent to different services with different usage patterns. |
/reopen I don't think this is solved and it is very hard to find information about this |
@alvaroaleman: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lifecycle frozen |
I am wondering why the VPA recommender uses targetMemoryPeaksPercentile := 0.9. |
@ManuelMueller1st it is a request not a limit. |
@pierreozoux many people (including @thockin as a general rule, with possible caveats of course) recommend setting memory requests == limits as memory is an uncompressable resource and you may over-allocate memory on a host resulting in random OOMs anyways. |
I also think a high level overview of how the algorithm works would be super helpful for making people comfortable with using VPA particularly for memory recommendations. |
@mwarkentin, if you want to maintain a 1:1 ratio between memory requests and limits, you can achieve this by setting both memory requests and limits to the same values when you initially deploy the pods first: The Additionally, it's crucial to note that the |
BTW, it would be great to add VPA to the official documentation |
I did some debugging, mainly in the Recommender, because I was interested in how the CPU/Memory samples are aggregated and how the recommendations are calculated. I posted my findings on my personal webpage - those who are interested can check the URL in my GitHub profile. |
Document
Please note that VPA recommendation algorithm is not part of the API and is subject to change without notice
The text was updated successfully, but these errors were encountered: