-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPA] Usage of VPA helm chart >2.0.0 leads to missing recommendations #1296
Comments
How are you pulling these metrics into Grafana? Is it possible there's actually just an issue with the metrics reporting rather than the actual VPA recommendation itself? The changes from 1.7.5 to 2.x are almost entirely unrelated to the recommender deployment itself. |
Additionally, are you using long-term storage with prometheus to feed VPA? |
We use kube-state-metrics to scrape the VPA recommendations. The values in the Grafana dashboard are the same as when checking using I also cannot understand why this change would lead to this behaviour. You have not seen anything like this before? |
Yes we use thanos |
The only time I've seen erratic recommendations is when I'm not using Prometheus data to feed the recommendations and I don't wait long enough for VPA to generate a good recommendation. Here's a cluster with 53 VPAs, using prometheus data, and the latest chart. (also using kube-state-metrics to poll the VPA data) |
Additional remark: we have multiple clients using our setup and only but all EKS clients are suffering from this, the AKS customers are not after the same upgrade. |
Maybe try turning the log level on the recommender up to 10? |
I just realized the cluster that I'm showing in that graph above uses the vpa 0.14.0 image. Perhaps there's a bugfix in that version. Worth trying. It would help if you could share your exact values for me to try to reproduce the issue |
Relevant parameters:
Full logs in your mail :) not to leak any sensitive info here. |
Helm values are not much different:
|
Aha. You're using kubernetes/autoscaler#2747 (comment)
I would imagine that switching that metrics to |
Well now I'm at a loss. Perhaps the VPA folks can help explain why the recommendation status would oscillate so much. I personally haven't seen it do this in my various tests. I'm guessing that the actual chart change actually has nothing to do with it, but it's something that is triggered by the re-deploy of the VPA pods. But that's just a hunch. |
What happened?
We upgraded Goldilocks (tried 7.0.0, 7.1.0 and 7.1.1) and the VPA (2.2.0 and 2.3.0) after which we saw intermittent drops in the recommendations to the minimum we set (e.g. 25 MB for memory) and sometimes even lower than that (8.33 MB).
Reverting the VPA to the latest 1.x.x (1.7.5) seems to undo this behaviour, even though the underlying VPA image version (0.13.0) did not change. We are a bit at a loss here, because also the logs are not really giving us a lot of helpful information.
We see a lot of log lines like below, but that is also still the case after downgrading.
Any ideas how to move forward?
Posting here, since the underlying VPA version did not change.
What did you expect to happen?
Continous recommendations being shown.
How can we reproduce this?
EKS 1.26
Version
VPA helm chart 2.3.0
Search
Code of Conduct
Additional context
No response
The text was updated successfully, but these errors were encountered: