You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The agentpool "genmlow" uses Standard_DS2_v2 machines (8GB) in a virtual machine scale set.
When I scale the number of replicas to 10 (kubectl scale deployment pause --replicas=10), I see that the cluster autoscaler (version 1.3.9, k8s 1.11.8) creates only one node per iteration, as if it was ignoring the affinity rules. See cluster-autoscaler logs, where nodes go from 0->1->2->...->N.
I0503 14:03:19.299146 1 azure_manager.go:261] Refreshed ASG list, next refresh after 2019-05-03 14:04:19.2991386 +0000 UTC m=+948.211672501
I0503 14:03:19.993383 1 scale_up.go:249] Pod default/pause-66cf84dcdb-2khzb is unschedulable
I0503 14:03:19.993412 1 scale_up.go:249] Pod default/pause-66cf84dcdb-l7587 is unschedulable
I0503 14:03:19.993418 1 scale_up.go:249] Pod default/pause-66cf84dcdb-t5mb8 is unschedulable
I0503 14:03:19.993422 1 scale_up.go:249] Pod default/pause-66cf84dcdb-xp2kn is unschedulable
I0503 14:03:19.993426 1 scale_up.go:249] Pod default/pause-66cf84dcdb-rpskf is unschedulable
I0503 14:03:19.993429 1 scale_up.go:249] Pod default/pause-66cf84dcdb-kkxc5 is unschedulable
I0503 14:03:19.993433 1 scale_up.go:249] Pod default/pause-66cf84dcdb-lbprj is unschedulable
I0503 14:03:19.993437 1 scale_up.go:249] Pod default/pause-66cf84dcdb-lmwmf is unschedulable
I0503 14:03:19.993441 1 scale_up.go:249] Pod default/pause-66cf84dcdb-c8njm is unschedulable
I0503 14:03:19.993446 1 scale_up.go:249] Pod default/pause-66cf84dcdb-gg6xh is unschedulable
...
I0503 14:03:20.071931 1 utils.go:187] Pod pause-66cf84dcdb-kkxc5 can't be scheduled on k8s-genl-24772259-vmss. Used cached predicate check results
I0503 14:03:20.072229 1 utils.go:187] Pod pause-66cf84dcdb-lbprj can't be scheduled on k8s-genl-24772259-vmss. Used cached predicate check results
I0503 14:03:20.072529 1 utils.go:187] Pod pause-66cf84dcdb-lmwmf can't be scheduled on k8s-genl-24772259-vmss. Used cached predicate check results
I0503 14:03:20.073242 1 utils.go:187] Pod pause-66cf84dcdb-c8njm can't be scheduled on k8s-genl-24772259-vmss. Used cached predicate check results
...
I0503 14:03:20.076758 1 scale_up.go:378] Best option to resize: k8s-genmlow-24772259-vmss
I0503 14:03:20.076770 1 scale_up.go:382] Estimated 1 nodes needed in k8s-genmlow-24772259-vmss
I0503 14:03:20.076783 1 scale_up.go:461] Final scale-up plan: [{k8s-genmlow-24772259-vmss 0->1 (max: 1000)}]
I0503 14:03:20.076796 1 scale_up.go:531] Scale-up: setting group k8s-genmlow-24772259-vmss size to 1
...
I0503 14:06:13.334377 1 scale_up.go:378] Best option to resize: k8s-genmlow-24772259-vmss
I0503 14:06:13.334411 1 scale_up.go:382] Estimated 1 nodes needed in k8s-genmlow-24772259-vmss
I0503 14:06:13.334470 1 scale_up.go:461] Final scale-up plan: [{k8s-genmlow-24772259-vmss 1->2 (max: 1000)}]
I0503 14:06:13.334503 1 scale_up.go:531] Scale-up: setting group k8s-genmlow-24772259-vmss size to 2
...
I0503 14:09:02.059191 1 scale_up.go:378] Best option to resize: k8s-genmlow-24772259-vmss
I0503 14:09:02.059243 1 scale_up.go:382] Estimated 1 nodes needed in k8s-genmlow-24772259-vmss
I0503 14:09:02.059310 1 scale_up.go:461] Final scale-up plan: [{k8s-genmlow-24772259-vmss 2->3 (max: 1000)}]
I0503 14:09:02.059350 1 scale_up.go:531] Scale-up: setting group k8s-genmlow-24772259-vmss size to 3
...
I0503 14:11:50.214206 1 scale_up.go:378] Best option to resize: k8s-genmlow-24772259-vmss
I0503 14:11:50.214228 1 scale_up.go:382] Estimated 1 nodes needed in k8s-genmlow-24772259-vmss
I0503 14:11:50.214245 1 scale_up.go:461] Final scale-up plan: [{k8s-genmlow-24772259-vmss 3->4 (max: 1000)}]
I0503 14:11:50.214262 1 scale_up.go:531] Scale-up: setting group k8s-genmlow-24772259-vmss size to 4
...
...
However, it only behaves this way when the pod has no requests. If I add the following requests:
resources:
requests:
memory: 5Gi
Everything works as expected and the cluster autoscaler works as expected, creating the 10 virtual machines in a single batch (1->10). I guess it is because this time the autoscaler knows that it can not fit two pods in a single node (5Gi + 5Gi > 8GB), even if it still ignoring the affinity rules.
I0503 14:31:36.574678 1 scale_up.go:378] Best option to resize: k8s-genmlow-24772259-vmss
I0503 14:31:36.574722 1 scale_up.go:382] Estimated 10 nodes needed in k8s-genmlow-24772259-vmss
I0503 14:31:36.574752 1 scale_up.go:461] Final scale-up plan: [{k8s-genmlow-24772259-vmss 0->10 (max: 1000)}]
I0503 14:31:36.574786 1 scale_up.go:531] Scale-up: setting group k8s-genmlow-24772259-vmss size to 10
It looks like a bug to me. Using the same setup in AWS (cluster autoscaler 1.2.x instead of 1.3.x is the only difference) works fine, and the CA creates the 10 virtual machines no matter whether you specify the container memory requests or not.
The text was updated successfully, but these errors were encountered:
It's a known issue with pod affinity / antiaffinity: #257 (comment). The details are on the issue I linked, but in general pod affinity and (especially) antiaffinity don't work well with CA. It can cause CA to only add nodes one by one as you observe and it completely breaks CA performance on large clusters.
It's not easy to fix, because it's caused by pod affinity being implemented in a way that is conceptually incompatible with how autoscaler works. To fix it we'd need a significant refactor of either scheduler or autoscaler, neither of which is likely to happen soon.
I think that cluster-autoscaler (CA) 1.3.x in Azure has problems dealing with affinity rules.
I use the following deployment to deploy a "pause" pod with two rules:
The agentpool "genmlow" uses Standard_DS2_v2 machines (8GB) in a virtual machine scale set.
When I scale the number of replicas to 10 (
kubectl scale deployment pause --replicas=10
), I see that the cluster autoscaler (version 1.3.9, k8s 1.11.8) creates only one node per iteration, as if it was ignoring the affinity rules. See cluster-autoscaler logs, where nodes go from 0->1->2->...->N.However, it only behaves this way when the pod has no requests. If I add the following requests:
Everything works as expected and the cluster autoscaler works as expected, creating the 10 virtual machines in a single batch (1->10). I guess it is because this time the autoscaler knows that it can not fit two pods in a single node (5Gi + 5Gi > 8GB), even if it still ignoring the affinity rules.
It looks like a bug to me. Using the same setup in AWS (cluster autoscaler 1.2.x instead of 1.3.x is the only difference) works fine, and the CA creates the 10 virtual machines no matter whether you specify the container memory requests or not.
The text was updated successfully, but these errors were encountered: