-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster autoscaler waits 10 minutes between scaling operations to scale down #4872
Comments
We are noticing the same issue when cordoning some nodes. The cluster autoscaler recognizes they are unused for 10+ minutes, but if a scale up happens then it neglects to terminate those nodes. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Those waiting times are configurable |
@carlosjgp The problem is that the wait time should be applied per autoscaling group. That is, just because autoscaling group A scaled up in the last 10 minutes, that should not prevent autoscaling group B from scaling down. |
That's how CA works I'm afraid if you want the behaviour you are describing you might need to deploy 1 CA per ASG and make sure the autodiscovery tags are set properly |
@carlosjgp That sounds like a poor design choice that should be fixed. |
can we expose another parameter to configure scaling down speed per autoscaling group? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Which component are you using?:
cluster-autoscaler
What version of the component are you using?:
v1.20.2
Component version:
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
AWS EKS
What did you expect to happen?:
The cluster autoscaler should terminate nodes when they have been unneeded for > 10 minutes.
The duration since the last scale up/down is completely irrelevant for this operation.
What happened instead?:
It waited 10 minutes after the previous scale up/down before scaling down again.
How to reproduce it (as minimally and precisely as possible):
Create two autoscaling groups in AWS. One group should have a bunch of extra nodes that are unused, and marked as such by the cluster autoscaler.
Wait a few minutes for the autoscaler to report these extra nodes as unneeded in the logs.
Now trigger a scale up on the second autoscaling group via the cluster autoscaler. (This can be achieved via nodeAffinity in the pod spec.)
Observe that the cluster autoscaler will wait 10 minutes after this scale up to terminate the nodes from the first autoscaling group.
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: