You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, at the company I work for we had an incident caused by incorrect arguments being passed to the kubelet via --kubelet-extra-args in the EKS terraform configuration.
These arguments are being passed to the bootstrap.sh by the Terraform provider, and it seems that the script accepts incorrect arguments but doesn't check later if the kubelet has started.
Nodes with incorrect kubelet arguments cannot start the kubelet and thus join the cluster. Despite the above, EKS does not consider such nodes unhealthy.
Proposed solution
Add checks to determine whether the kubelet has started or not.
Very roughly:
if systemctl is-active --quiet kubelet;then
log "INFO: kubelet service is active and running."else
log "ERROR: kubelet service failed to start."exit 1 # Exit if kubelet did not start successfullyfi
The text was updated successfully, but these errors were encountered:
I think it'd be a worthwhile improvement to exit bootstrap.sh with a non-zero code if the kubelet unit doesn't become active, but that won't do much in isolation because the outcome of cloud-init user data isn't reported to EC2 (or EKS). Feel free to open a PR 👍 How an orchestrator (EKS managed nodes, Karpenter, cluster-autoscaler, etc.) handles this situation is a separate issue.
AlexNabokikh
added a commit
to AlexNabokikh/amazon-eks-ami
that referenced
this issue
May 13, 2024
Hey!
Issue
Recently, at the company I work for we had an incident caused by incorrect arguments being passed to the kubelet via
--kubelet-extra-args
in the EKS terraform configuration.These arguments are being passed to the
bootstrap.sh
by the Terraform provider, and it seems that the script accepts incorrect arguments but doesn't check later if the kubelet has started.Nodes with incorrect
kubelet
arguments cannot start thekubelet
and thus join the cluster. Despite the above, EKS does not consider such nodes unhealthy.Proposed solution
Add checks to determine whether the
kubelet
has started or not.Very roughly:
The text was updated successfully, but these errors were encountered: