-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not properly delete NodeClass #6462
Comments
Disruption refers to voluntary disruption modes: e.g. Drift, Expiration, and Consolidation. None of these can take place when the NodePool or NodeClass does not exist, hence why Karpenter can't disrupt the NodeClaim. That doesn't mean Karpenter can't terminate the NodeClaim. Deleting the NodeClass should result in Karpenter setting a deletion timestamp on each NodeClaim associated with that NodeClass, and those NodeClaims will gracefully terminate. Graceful termination isn't bounded; blocking PDBs can prevent a NodeClaim from terminating indefinitely. If you're able to share Karpenter logs and the NodeClaim resources we should be able to determine if Karpenter is operating correctly. If it is and you want to be able to set an upper bound on termination time, you'll probably be interested in kubernetes-sigs/karpenter#916 which just merged in the upstream repo. |
Hey, thank you for your explanation. I just did some more testing, even without any PDBs in the cluster (except for karpenter but that's running on fargate), the nodes wont terminate.
That's everything i can find that is relating to the deletion. How does the release process of karpenter go? There's the merge in kubernetes-sigs/karpenter and then the cloud specific providers (aws in this case) has to implement and release it too? |
Chiming in here with the same issue, the message has changed a little, I'm currently in Karpenter 1.0.0 Looking at
And this happens when the following is executed, and the command gets permanently stuck
It's not clear to me yet if my terraform code tried to delete the nodepool first or the ec2nodeclass first, but in any case, deletion is stuck and nothing happens. Can anyone clarify what would be the correct process to remove the nodepool and the respectives nodeclasses? Should the nodeclasses be deleted first? EDIT: I checked and there was no reason why the nodepool would have changed, so this issue is triggered by deleting the ec2nodeclass |
In my case I saw different related elements that "block" the deletion: 1.- NodeClaim termination
2.- NodeClaim (and node) can't be deleted because Pod Disruption Budget
3.- Finalizer Step to remove the ec2nodeclass: 1.- Edit the deploys or pods with PDB and change the maxUnavailable from 1 to a higher number (ex: 10) |
Same issue as @GerBriones, it could be nice to force the deletion. |
Minimally, I think there's some work that we can do here to make sure that we are less noisy in our logs and have some better error messaging cc: @jmdeal |
Description
Observed Behavior:
When deleting a NodeClass, karpenter wants to delete the nodeclaims (
Waiting on NodeClaim termination for common-xdbj9, common-vvclr, common-2ppgb
) but they suddenly cant find their nodepool anymore (Cannot disrupt NodeClaim: Owning nodepool "common" not found
).Karpenter just logs
resolving node class, ec2nodeclasses.karpenter.sh "default" is terminating, treating as not found
as soon as the deletion gets issued.Expected Behavior:
The nodeClaims delete themselfs first and then the nodeClass.
Reproduction Steps (Please include YAML):
kubectl delete ec2nodeclasses.karpenter.k8s.aws default
Versions:
0.36.0
kubectl version
):v1.29.4-eks-036c24b
The text was updated successfully, but these errors were encountered: