cluster autoscaler should consider availability zone balancing during scaledown #3693

rittneje · 2020-11-14T16:36:57Z

We are running a cluster in AWS EKS that uses nodes from auto-scaling groups. We have noticed that whenever the autoscaler terminates a node during scaledown, the auto-scaling group triggers an availability zone rebalancing shortly thereafter. This in turn leads to a spike in errors. It would be preferable if the cluster autoscaler properly considered availability zones during scaledown, shuffling pods between nodes as necessary to preemptively avoid a rebalancing.

knkarthik · 2020-11-23T23:00:12Z

We are also seeing this even after removing --balance-similar-node-groups suggested in https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas.

May be we'll give Suspended processes in ASG console a try.

fejta-bot · 2021-02-21T23:24:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

rittneje · 2021-02-21T23:37:01Z

/remove-lifecycle stale

fejta-bot · 2021-05-22T23:44:20Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

rittneje · 2021-05-23T00:04:58Z

/remove-lifecycle stale

k8s-triage-robot · 2021-08-21T00:19:38Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2021-08-21T00:25:57Z

/remove-lifecycle stale

k8s-triage-robot · 2021-12-14T16:02:15Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2021-12-14T18:28:35Z

/remove-lifecycle stale

k8s-triage-robot · 2022-03-14T19:11:18Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2022-03-14T19:41:44Z

/remove-lifecycle stale

k8s-triage-robot · 2022-06-12T19:57:46Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2022-06-12T20:40:56Z

/remove-lifecycle stale

k8s-triage-robot · 2022-09-10T20:49:10Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2022-09-11T01:09:34Z

/remove-lifecycle stale

k8s-triage-robot · 2022-12-10T02:03:21Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2022-12-10T02:52:33Z

/remove-lifecycle stale

k8s-triage-robot · 2023-03-10T03:04:40Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2023-03-10T03:27:26Z

/remove-lifecycle stale

maxgio92 · 2023-05-08T17:14:10Z

Hi all, are there news in the meantime?
We're experiencing this issue I think, where the aws autoscaling rebalances and consequently scales down for overprovisioning, independently from the cluster-autoscaler work:

MidTerminatingLifecycleAction
	Terminating EC2 instance: i-XYZ	At 2023-05-08T12:08:31Z instances were launched to balance instances in zones eu-west-1b eu-west-1a with other zones resulting in more than desired number of instances in the group.
	At 2023-05-08T12:08:42Z an instance was taken out of service in response to a difference between desired and actual capacity, shrinking the capacity from 12 to 11.
	At 2023-05-08T12:08:42Z instance i-XYZ was selected for termination.

The result is that AWS autoscaling seems to spawn new instances for balancing the avilabilty across the AZs, but exceeding the desired count of instances, it terminates an instance to match the desired count (managed by the cluster-autoscaler), obviously bypassing the cluster-autoscaler.

Furthermore I have workload that can't be evicted (and for which set the cluster-autoscaler.kubernetes.io/safe-to-evict: "false"), and the AWS autoscalig is agnostic to that obviously.

Am I missing something?

k8s-triage-robot · 2024-01-19T21:05:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-01-19T23:11:25Z

/remove-lifecycle stale

zioproto · 2024-03-05T08:56:02Z

Pinging repo approvers about this feature request. @mwielgus @MaciekPytel @gjtempleton

This seems to be a valid issue. It is documented in this repo documentation here:

https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler

Currently the balancing is only done at scale-up. Cluster Autoscaler will still scale down underutilized nodes regardless of the relative sizes of underlying node groups. We plan to take balancing into account in scale-down in the future.

Is the sentence "We plan to take balancing into account in scale-down in the future" still valid ?

Is there a Roadmap published on GitHub ?

Why this is blocked since a long time ? There is a lack of interest in doing the implementation or it requires a massive code refactoring that is not worth the effort ?

Please let the community know what would help here. Thank you !

k8s-triage-robot · 2024-06-19T13:42:50Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-06-19T16:18:09Z

/remove-lifecycle stale

k8s-triage-robot · 2024-09-17T16:25:18Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-09-17T18:04:14Z

/remove-lifecycle stale

k8s-triage-robot · 2024-12-16T18:07:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

rittneje · 2024-12-16T19:49:37Z

/remove-lifecycle stale

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 23, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 21, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 21, 2021

jbartosik added the area/cluster-autoscaler label Sep 15, 2021

arroyoh mentioned this issue Oct 19, 2021

Cluster Autoscaler on EKS during scale-down ignores pod annotation & node annotations #3978

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 11, 2022

sdickhoven mentioned this issue Nov 22, 2022

Nodes with safe-to-evict flag set to false gets evicted during scale down #4789

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 10, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2023

maxgio92 mentioned this issue May 19, 2023

feat(aws/eks): add new and move jobs nodes to single-az falcosecurity/test-infra#1134

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 19, 2024

towca added the area/core-autoscaler Denotes an issue that is related to the core autoscaler and is not specific to any provider. label Mar 21, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 19, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 17, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cluster autoscaler should consider availability zone balancing during scaledown #3693

cluster autoscaler should consider availability zone balancing during scaledown #3693

rittneje commented Nov 14, 2020

knkarthik commented Nov 23, 2020 •

edited

Loading

fejta-bot commented Feb 21, 2021

rittneje commented Feb 21, 2021

fejta-bot commented May 22, 2021

rittneje commented May 23, 2021

k8s-triage-robot commented Aug 21, 2021

rittneje commented Aug 21, 2021

k8s-triage-robot commented Dec 14, 2021

rittneje commented Dec 14, 2021

k8s-triage-robot commented Mar 14, 2022

rittneje commented Mar 14, 2022

k8s-triage-robot commented Jun 12, 2022

rittneje commented Jun 12, 2022

k8s-triage-robot commented Sep 10, 2022

rittneje commented Sep 11, 2022

k8s-triage-robot commented Dec 10, 2022

rittneje commented Dec 10, 2022

k8s-triage-robot commented Mar 10, 2023

rittneje commented Mar 10, 2023

maxgio92 commented May 8, 2023

k8s-triage-robot commented Jan 19, 2024

rittneje commented Jan 19, 2024

zioproto commented Mar 5, 2024

k8s-triage-robot commented Jun 19, 2024

rittneje commented Jun 19, 2024

k8s-triage-robot commented Sep 17, 2024

rittneje commented Sep 17, 2024

k8s-triage-robot commented Dec 16, 2024

rittneje commented Dec 16, 2024

cluster autoscaler should consider availability zone balancing during scaledown #3693

cluster autoscaler should consider availability zone balancing during scaledown #3693

Comments

rittneje commented Nov 14, 2020

knkarthik commented Nov 23, 2020 • edited Loading

fejta-bot commented Feb 21, 2021

rittneje commented Feb 21, 2021

fejta-bot commented May 22, 2021

rittneje commented May 23, 2021

k8s-triage-robot commented Aug 21, 2021

rittneje commented Aug 21, 2021

k8s-triage-robot commented Dec 14, 2021

rittneje commented Dec 14, 2021

k8s-triage-robot commented Mar 14, 2022

rittneje commented Mar 14, 2022

k8s-triage-robot commented Jun 12, 2022

rittneje commented Jun 12, 2022

k8s-triage-robot commented Sep 10, 2022

rittneje commented Sep 11, 2022

k8s-triage-robot commented Dec 10, 2022

rittneje commented Dec 10, 2022

k8s-triage-robot commented Mar 10, 2023

rittneje commented Mar 10, 2023

maxgio92 commented May 8, 2023

k8s-triage-robot commented Jan 19, 2024

rittneje commented Jan 19, 2024

zioproto commented Mar 5, 2024

k8s-triage-robot commented Jun 19, 2024

rittneje commented Jun 19, 2024

k8s-triage-robot commented Sep 17, 2024

rittneje commented Sep 17, 2024

k8s-triage-robot commented Dec 16, 2024

rittneje commented Dec 16, 2024

knkarthik commented Nov 23, 2020 •

edited

Loading