Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster Autoscaler on EKS during scale-down ignores pod annotation & node annotations #3978

Closed
sohrabkhan opened this issue Mar 29, 2021 · 6 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@sohrabkhan
Copy link

Which component are you using?:
I'm using Cluster Autoscaler on EKS with 2 managed nodegroups. One per Az

Cluster Autoscaler

What version of the component are you using?:
v1.18.3
v1.19.2

Component version: 1.18.3

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version
Environment 1 (Non-prod)
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

Environment 2 (Prod)
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.3", GitCommit:"1e11e4a2108024935ecfcb2912226cedeafd99df", GitTreeState:"clean", BuildDate:"2020-10-14T12:50:19Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.6-eks-49a6c0", GitCommit:"49a6c0bf091506e7bafcdb1b142351b69363355a", GitTreeState:"clean", BuildDate:"2020-12-23T22:10:21Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:
Non Prod & Prod (both)

AWS EKS with 2 Node groups both on AMI Version: 1.18.9-20210208
Platform version: eks.3

What did you expect to happen?:

The Cluster Autoscaler scales out whenever we've excess workload running elegantly. We've set Statefulset pod annotations of "cluster-autoscaler.kubernetes.io/safe-to-evict": "false". We also set node annotations of "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true".
We do not expect our original nodes to be removed as we've the node annotations and all of those nodes have pods which have the pod-annotations.

What happened instead?:

Cluster Autoscaler will always display: 1 cluster.go:168] Fast evaluation: node ip-172-30-59-87.eu-west-2.compute.internal cannot be removed: pod annotated as not safe to evict present: jenkins-0 It will also log output about the nodes which cannot be removed. Cluster Autoscaler pod is also on that node and has the pod annotation by default to avoid it's removal.

But Cluster Autoscaler will always remove the oldest nodes in the cluster. No matter what configuration I set it will always remove the oldest nodes.

How to reproduce it (as minimally and precisely as possible):

Deploy cluster autoscaler using AWS Documentation https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html or https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md.
Then create a simple Deployment of Nginx (3GBRAM, 2CPU) & Scale out to 50 replicas. Cluster Autoscaler will add new EC2 nodes. Now scale down replicas to 0 and wait for 10 mins for Cluster Autoscaler to scale down.
Cluster Autoscaler will log output like 1 cluster.go:168] Fast evaluation: node ip-172-30-59-87.eu-west-2.compute.internal cannot be removed: pod annotated as not safe to evict present: jenkins-0 but still remove this exact node.

Anything else we need to know?:

Images tried with: 1.18.3, 1.18.2 on prod cluster which is Kubernetes 1.18
Images tried with: 1.19.2 on non-prod cluster which is Kubernetes 1.19
Tried with Configuration:
spec: containers: - command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/xxxmyclusternamexxx - --balance-similar-node-groups - --skip-nodes-with-system-pods=false
and
spec: containers: - command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/xxxxmyclusternamexxxxx - --balance-similar-node-groups

@sohrabkhan sohrabkhan added the kind/bug Categorizes issue or PR as related to a bug. label Mar 29, 2021
@sohrabkhan
Copy link
Author

It's been 25 days since I opened this issue. There hasn't been any comments yet. If the details is too much then here is the issue in few words:

The issue is that I don't want nodes that contain Statefulset pods which I've annotated as "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" to be removed. The new nodes that were launched to deal with the load should be the ones that should be removed.

I also do not expect cluster-autoscaler to remove a node that has annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" to be removed.

The oldest nodes are the ones that are always removed. Why is this happening?

@ccc-56
Copy link

ccc-56 commented Jul 2, 2021

I have met the same problem using aws EKS v1.19. Anyone know the problem ?
I am also do the configuration according to: https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html.

When add load to my eks cluster, it add nodes and it's ok, but when scaling down there is problem, no matter I set the: kubectl annotate node cluster-autoscaler.kubernetes.io/scale-down-disabled=true on the node or set "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" on the pods, it take no effect, the node still be removed.

But according to docs: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-can-i-prevent-cluster-autoscaler-from-scaling-down-a-particular-node. it should work. I don't know why it don't work?

I tried a long time to find the stackoverflow question: https://stackoverflow.com/questions/67199875/cluster-autoscaler-on-eks-during-scale-down-ignores-pod-annotation-node-annota.

But it seems nobody have met this problem except @sohrabkhan, thanks for your detail describe, do you find the reason ?

@arroyoh
Copy link

arroyoh commented Oct 19, 2021

We are also experiencing the same issue described by @sohrabkhan. In our case, we were able to identify that it is being caused by AWS Availability Zone Rebalancing opeation. It is described in more details in this issue: #3693. Can you verify if it is your case as well?

Basically, when rebalancing nodes between AZs, it seems to ignore any pod or pod annotations on that node.

Our current workaround is to run sensitive pods in separate node groups, where we can have better control of scaling to avoid this issue, which is impacting us on production environments.

@MaciekPytel
Copy link
Contributor

Cluster Autoscaler is expected to respect the annotation (and we're not aware of any particular bug where it doesn't do it). However, in many cluster configurations CA is not the only component that can remove nodes and those other components would not respect CA annotations. AZ rebalancing is just one example of this.

CA logs any scale-down activity it performs and exposes scaled_down_nodes_total metric that can be used to monitor CA scale-downs. If a node is removed and it's not mentioned in CA logs/metrics, most likely it was some other component that has removed the node.

Personally, I would strongly recommend disabling AZ rebalancing when using CA.

@sohrabkhan
Copy link
Author

Thanks @MaciekPytel. Indeed the issue was related to AWS ASG. Although AZ-Rebalancing was not the cause for the issue I experienced it was actually ASG's aggressive termination policies. The AWS ASG that was kicking in just few seconds before CA.

To fix the issue I disabled ASG scaling policy so the scale-in and scale-out was only done by CA.

@adiseshan
Copy link

Thanks @MaciekPytel. Indeed the issue was related to AWS ASG. Although AZ-Rebalancing was not the cause for the issue I experienced it was actually ASG's aggressive termination policies. The AWS ASG that was kicking in just few seconds before CA.

To fix the issue I disabled ASG scaling policy so the scale-in and scale-out was only done by CA.

@sohrabkhan The problem got fixed when you disabled "ASG Termination policy" ??
(or) If you mean ASG scaling policy. could you please explain how to disable ASG scaling policy ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

6 participants