You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have Security Groups in Pods enabled for my EKS cluster and I did it by following the AWS guide in here.
I've upgraded the AWS CNI Plugin of my EKS cluster from version v1.16.2 to version v1.18.6. And started seeing the issue. Eventually downgraded to v1.16.4 which is the release where the issue first appears.
Once I upgraded the AWS CNI plugin, some pods started to be stuck into ContainerCreating. With logs like this:
failed (add): add cmd: failed to assign an IP address to container
(Nodes have been rotated various times while testing and debugging)
After debugging and comparing with other EKS clusters running the AWS CNI plugin version v1.16.2 I noticed that, in version v1.16.4a change was introduced that prevents pods to be assigned to the "trunk ENI".
By not using the "trunk ENI", the calculation of "max pods" per instance type changes from what is provided by default.
For a node of the type m6i.large, the EKS AMI will calculate 29 for the max-pods, which works fine if "Security Groups for EKS Pods" is not enabled.
Once SG feature is enabled (as per AWS guide linked above), the max-pods calculation needs to change and be reduced as 1 ENI is now used for "trunk interface" and cannot have IPs for non-SG pods assigned to it.
I can see the max-pods value in the Node spec in K8s
Scale up the cluster with pods that do not use SG and fill up the nodes
Some pods should be stuck in ContainerCreating once the node is almost full in terms of "Maximum pods" (not CPU or memory)
Environment:
AWS Region: eu-west-1
Instance Type(s): m6i.large
Cluster Kubernetes version: 1.30
Node Kubernetes version: v1.30.4-eks-a737599
AMI Version: ami-008d7732840c48377 - amazon-eks-node-1.30-v20241109
Notes
This bug is also described in here, but that issue was closed by the author. The other difference is that I'm not using Bottlerocket, I'm using EKS-AMI.
I'm not entirely sure if this bug belongs in this repo or in amazon-vpc-cni-k8s - but I have a hunch it belongs here.
This has nothing to do with subnets size. Subnets have plenty of available IPs
The bug can be triggered by using aws cni plugin v1.16.4 or earlier and fixed by using v1.16.2 (rotating nodes when switching version)
The AWS CNI plugin is managed as an EKS add-on
See below the full spec for the aws-node DaemonSet
What happened:
I have Security Groups in Pods enabled for my EKS cluster and I did it by following the AWS guide in here.
I've upgraded the AWS CNI Plugin of my EKS cluster from version v1.16.2 to version v1.18.6. And started seeing the issue. Eventually downgraded to v1.16.4 which is the release where the issue first appears.
Once I upgraded the AWS CNI plugin, some pods started to be stuck into
ContainerCreating
. With logs like this:(Nodes have been rotated various times while testing and debugging)
After debugging and comparing with other EKS clusters running the AWS CNI plugin version
v1.16.2
I noticed that, in versionv1.16.4
a change was introduced that prevents pods to be assigned to the "trunk ENI".By not using the "trunk ENI", the calculation of "max pods" per instance type changes from what is provided by default.
For a node of the type
m6i.large
, the EKS AMI will calculate29
for themax-pods
, which works fine if "Security Groups for EKS Pods" is not enabled.Once SG feature is enabled (as per AWS guide linked above), the
max-pods
calculation needs to change and be reduced as 1 ENI is now used for "trunk interface" and cannot have IPs for non-SG pods assigned to it.I can see the
max-pods
value in the Node spec in K8sWhat you expected to happen:
I expected the AWS EKS AMI to correctly calculate the amount of "max-pods" it can have once "SG for EKS Pods" is enabled.
How to reproduce it (as minimally and precisely as possible):
ContainerCreating
once the node is almost full in terms of "Maximum pods" (not CPU or memory)Environment:
eu-west-1
m6i.large
1.30
v1.30.4-eks-a737599
ami-008d7732840c48377
-amazon-eks-node-1.30-v20241109
Notes
v1.16.4
or earlier and fixed by usingv1.16.2
(rotating nodes when switching version)aws-node
DaemonSetThe text was updated successfully, but these errors were encountered: