cluster autoscaler waits 10 minutes between scaling operations to scale down #4872

rittneje · 2022-05-09T17:54:31Z

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

v1.20.2

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.15", GitCommit:"8f1e5bf0b9729a899b8df86249b56e2c74aebc55", GitTreeState:"clean", BuildDate:"2022-01-19T17:27:39Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.9-eks-14c7a48", GitCommit:"717bfb2b8ceb809a42a6c0baabde59fae28637ef", GitTreeState:"clean", BuildDate:"2022-04-01T03:17:28Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

AWS EKS

What did you expect to happen?:

The cluster autoscaler should terminate nodes when they have been unneeded for > 10 minutes.

The duration since the last scale up/down is completely irrelevant for this operation.

What happened instead?:

It waited 10 minutes after the previous scale up/down before scaling down again.

How to reproduce it (as minimally and precisely as possible):

Create two autoscaling groups in AWS. One group should have a bunch of extra nodes that are unused, and marked as such by the cluster autoscaler.
Wait a few minutes for the autoscaler to report these extra nodes as unneeded in the logs.
Now trigger a scale up on the second autoscaling group via the cluster autoscaler. (This can be achieved via nodeAffinity in the pod spec.)
Observe that the cluster autoscaler will wait 10 minutes after this scale up to terminate the nodes from the first autoscaling group.

Anything else we need to know?:

The text was updated successfully, but these errors were encountered:

rittneje · 2022-07-06T17:30:53Z

We are noticing the same issue when cordoning some nodes. The cluster autoscaler recognizes they are unused for 10+ minutes, but if a scale up happens then it neglects to terminate those nodes.

k8s-triage-robot · 2022-10-04T17:52:08Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2022-10-04T18:09:38Z

/remove-lifecycle stale

carlosjgp · 2022-12-14T11:19:09Z

Those waiting times are configurable
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#troubleshooting

rittneje · 2022-12-14T13:11:09Z

@carlosjgp The problem is that the wait time should be applied per autoscaling group. That is, just because autoscaling group A scaled up in the last 10 minutes, that should not prevent autoscaling group B from scaling down.

carlosjgp · 2022-12-14T13:37:12Z

That's how CA works I'm afraid if you want the behaviour you are describing you might need to deploy 1 CA per ASG and make sure the autodiscovery tags are set properly

rittneje · 2022-12-14T13:45:11Z

@carlosjgp That sounds like a poor design choice that should be fixed.

anson627 · 2022-12-22T19:17:22Z

can we expose another parameter to configure scaling down speed per autoscaling group?

k8s-triage-robot · 2023-03-22T19:37:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2023-03-22T19:41:28Z

/remove-lifecycle stale

k8s-triage-robot · 2023-06-20T20:02:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2023-06-20T20:47:50Z

/remove-lifecycle stale

k8s-triage-robot · 2024-01-23T01:40:05Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-01-23T02:05:25Z

/remove-lifecycle stale

k8s-triage-robot · 2024-06-19T13:42:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-06-19T16:18:40Z

/remove-lifecycle stale

k8s-triage-robot · 2024-09-17T16:25:19Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-09-17T18:03:20Z

/remove-lifecycle stale

k8s-triage-robot · 2024-12-16T18:07:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-12-16T19:48:47Z

/remove-lifecycle stale

rittneje added the kind/bug Categorizes issue or PR as related to a bug. label May 9, 2022

jbartosik added the area/cluster-autoscaler label May 11, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 22, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 23, 2024

towca added the area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. label Mar 21, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster autoscaler waits 10 minutes between scaling operations to scale down #4872

cluster autoscaler waits 10 minutes between scaling operations to scale down #4872

rittneje commented May 9, 2022

rittneje commented Jul 6, 2022

k8s-triage-robot commented Oct 4, 2022

rittneje commented Oct 4, 2022

carlosjgp commented Dec 14, 2022

rittneje commented Dec 14, 2022

carlosjgp commented Dec 14, 2022

rittneje commented Dec 14, 2022

anson627 commented Dec 22, 2022

k8s-triage-robot commented Mar 22, 2023

rittneje commented Mar 22, 2023

k8s-triage-robot commented Jun 20, 2023

rittneje commented Jun 20, 2023

k8s-triage-robot commented Jan 23, 2024

rittneje commented Jan 23, 2024

k8s-triage-robot commented Jun 19, 2024

rittneje commented Jun 19, 2024

k8s-triage-robot commented Sep 17, 2024

rittneje commented Sep 17, 2024

k8s-triage-robot commented Dec 16, 2024

rittneje commented Dec 16, 2024

cluster autoscaler waits 10 minutes between scaling operations to scale down #4872

cluster autoscaler waits 10 minutes between scaling operations to scale down #4872

Comments

rittneje commented May 9, 2022

rittneje commented Jul 6, 2022

k8s-triage-robot commented Oct 4, 2022

rittneje commented Oct 4, 2022

carlosjgp commented Dec 14, 2022

rittneje commented Dec 14, 2022

carlosjgp commented Dec 14, 2022

rittneje commented Dec 14, 2022

anson627 commented Dec 22, 2022

k8s-triage-robot commented Mar 22, 2023

rittneje commented Mar 22, 2023

k8s-triage-robot commented Jun 20, 2023

rittneje commented Jun 20, 2023

k8s-triage-robot commented Jan 23, 2024

rittneje commented Jan 23, 2024

k8s-triage-robot commented Jun 19, 2024

rittneje commented Jun 19, 2024

k8s-triage-robot commented Sep 17, 2024

rittneje commented Sep 17, 2024

k8s-triage-robot commented Dec 16, 2024

rittneje commented Dec 16, 2024