Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPA: deleting VPA and recreate it, still gives some recommendation #4682

Open
Pisztrang opened this issue Feb 15, 2022 · 15 comments
Open

VPA: deleting VPA and recreate it, still gives some recommendation #4682

Pisztrang opened this issue Feb 15, 2022 · 15 comments
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@Pisztrang
Copy link

Which component are you using?:
vertical-pod-autoscaler

What version of the component are you using?:
Component version: 0.10.0

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:18:51Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.5", GitCommit:"e338cf2c6d297aa603b50ad3a301f761b4173aa6", GitTreeState:"clean", BuildDate:"2020-12-09T11:10:32Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

What did you expect to happen?:
We are using VPA to make recommendation for our pods. We just start the vpa as it is written. Only difference was that we added the the pod-recommendation-min-memory-mb=10 flag for the recommender deployment. We ran some test with 100TPS for an hour and we got some recommendation. Until this step everything was fine. Let's say we got 500m CPU recommendation

Then after one day no test, the recommandation was down, let's say to 130mCPU. I think this is still good, since the recommender uses the data what it is collected from the start.

Now we removed the VPA totally with the vpa-down.sh script. then I created again and also created the VPA resource. When I checked the recommendation it still showed the 130mCPU, while I thought, since I stop the VPA totally, that it will be nothing or only the minimum value. so it seems that somewhere, somebody is storing some data.

What happened instead?:

Recommendation did not disapperared after VPA uninstall/reinstall

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

As far as I know the default vpa-up.sh does not setup the Promethous history data collection. I did not set it either, so I do not know why this porblem happened.

Can you help me what are we doing wrongly?
Thanks
Akos

@Pisztrang Pisztrang added the kind/bug Categorizes issue or PR as related to a bug. label Feb 15, 2022
@Pisztrang Pisztrang changed the title VPA: deleting VPA and recrate it, still gives some recommendation VPA: deleting VPA and recreate it, still gives some recommendation Feb 15, 2022
@jbartosik
Copy link
Collaborator

Please add more information. Preferably step-by-step reproduction instructions and explain what happened vs what you expected

@Pisztrang
Copy link
Author

Hi

What I did earlier is the following:

  1. install VPA as describer. Only difference was that the minimum memory was set to 10Mbyte
kubectl -n kube-system describe deployments.apps vpa-recommender 
Name:                   vpa-recommender
Namespace:              kube-system
CreationTimestamp:      Fri, 04 Mar 2022 13:25:57 +0000
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=vpa-recommender
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app=vpa-recommender
  Service Account:  vpa-recommender
  Containers:
   recommender:
    Image:      k8s.gcr.io/autoscaling/vpa-recommender:0.9.2
    Port:       8942/TCP
    Host Port:  0/TCP
    Args:
      --pod-recommendation-min-memory-mb=10
    Limits:
      cpu:     200m
      memory:  1000Mi
    Requests:
      cpu:        50m
      memory:     500Mi
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   vpa-recommender-5c5bbcb6cf (1/1 replicas created)
Events:          <none>

  1. Start the VPA for my application
apiVersion: "autoscaling.k8s.io/v1"
kind: VerticalPodAutoscaler
metadata:
  name: parameterprovision
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: parameterprovision
  updatePolicy:
    updateMode: "Off"
  1. I ran some performance test with 100tps. Then I can see that the reccomendation is showing some value, which seems to be OK.

  2. then I stopped the test case and removed the VPA from k8s (vpa-down.sh).

  3. Now giving the vpa-up.sh again and create the same vpa. Giving the vpa describer it shows the same values as it was earlier, but currently there is no load on it. so it seems that it remembers some earler data.

actually it became more weired today....after the weekend I checked the recommendation again and also checked the top pods:
I did not run anything in the last two days on the cluster.

kubectl top pods| grep prov
parameterprovision-6b8988c8fd-5m6q4         3m           27Mi

while the vpa describer shows this:

  Recommendation:
    Container Recommendations:
      Container Name:  nef-parameterprovision
      Lower Bound:
        Cpu:     25m
        Memory:  49530131
      Target:
        Cpu:     25m
        Memory:  109814751
      Uncapped Target:
        Cpu:     25m
        Memory:  109814751
      Upper Bound:
        Cpu:     698m
        Memory:  150053070

The targer memory is 105Mbytes, but there is no load for two days in this pod.

And Now I deleted VPA, scale my deployement to 0, then scale it to 1 again, create VPA and the result is the following:

nef-parameterprovision-6b8988c8fd-5t46m         2m           20Mi 
  Recommendation:
    Container Recommendations:
      Container Name:  nef-parameterprovision
      Lower Bound:
        Cpu:     25m
        Memory:  33289666
      Target:
        Cpu:     25m
        Memory:  36253748
      Uncapped Target:
        Cpu:     25m
        Memory:  36253748
      Upper Bound:
        Cpu:     490m
        Memory:  1615842163

I see a quite big difference between the top and the recommendation.

so what is the right way to "reset" the vpa and start a new measurement. Actually we just want to get a recommendation when the load is 100 tps, but now for me it seems that I cannot reproduce twice the same result....

Is there somewhere explained how the recommendation is calculated?
Thanks

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2022
@jbartosik
Copy link
Collaborator

/remove-lifecycle stale

I didn't have time to look into the problem but I think it would be good to do that.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 12, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 12, 2022
@jbartosik
Copy link
Collaborator

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 14, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2023
@jbartosik
Copy link
Collaborator

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 12, 2023
@jbartosik
Copy link
Collaborator

/remove-lifecycle rotten

@jbartosik
Copy link
Collaborator

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jan 12, 2023
@2fst4u
Copy link

2fst4u commented Jan 14, 2023

Curious about this too. Changes have been made to my pod and therefore resource needs, but deleting and recreating the vpa uses historical data still.

@voelzmo
Copy link
Contributor

voelzmo commented Jan 16, 2023

By default, historical data is stored in a VPACheckpoint resource and the vpa-recommender keeps it also in memory for a while. So if you want to get rid of all historical information for a Pod, you probably should

  • remove the corresponding VPACheckpoint resource
  • restart vpa-recommender or wait for the garbage collection interval to pass (1 hour)

@BalzGuenat
Copy link

BalzGuenat commented Mar 12, 2024

* remove the corresponding `VPACheckpoint` resource

* restart `vpa-recommender` or wait for the garbage collection interval to pass (1 hour)

Did not work for me. I had to delete the VPA itself (and before that, disable Argo CD to prevent instant recreation), then restart vpa-recommender and then recreate the VPA.

@augustovictor
Copy link

Here's what I do (based on other's answers):

kubectl delete vpacheckpoint <my-vpacheckpoint>
kubectl -n kube-system rollout restart deployment vpa-recommender

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vertical-pod-autoscaler kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

8 participants