-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(al2023): disable default nodeadm
phases
#2124
Comments
If you can stay on the static user data path, that's a safer road; but we'll see what kind of escape hatch we can add for use cases like this. I think the problem you're having boils down to not being able to disable the default |
nodeadm
phases
Unfortunately static configuration would add a lot more overhead as we would need to maintain or generate different NodeConfigs based on the instance type and application requirements. Being able to do it at boot time is a critical step in keeping things simple. I was able to work around it by creating a reboot service that runs after nodeadm-run service and disables itself. So we create a basic I'm curious what sort of nodeadm-related configuration is "expected" to happen in cloud-init since the documentation mentions it but doesn't provide any examples of what to configure or how to configure it inside cloud-init. Thanks |
could you just run FWIW, our "expected" usage of If that solves your use case we can help extend the docs around the |
Completely understand! I'm mostly concerned with avoiding footguns in the default path, not opposed to making this more flexible.
The happy path is you never call On our AL2023 AMIs, your |
We do want to avoid this whenever possible, because our experience has been that shell scripts in user data are error-prone, particularly when the EKS tooling touches the same file/service. If we can add something to |
We didn't really have any issues with the config getting clobbered once we figured out that we could provide a minimal config for In our case the reboot in cloud-init was causing the nodeadm-run service to never run and patch up the ethernet configuration so we ended up getting DHCP enabled on all interfaces which caused the node to become unreachable.
That makes sense - I was just trying to continue using |
I'm guessing that most people don't configure the kubelet itself at this stage. If we didn't have to worry about CPU-specific configuration for a wide variety of instance types it would be a lot easier. Thanks for the responses @cartermckinnon and @ndbaker1 - I think we're doing things "right" or at least as right as we can for now. |
I am converting our EKS custom nodegroup configuration from AL2 to AL2023 and I can't find a way to modify the NodeConfig dynamically at runtime in userdata. I'm testing with the AL2023 1.29 0116 EKS AMI.
In AL2 we calculate the values for various kubelet flags in a userdata shell script and pass them to bootstrap.sh on the command line. This includes things like CPU reservation settings which are really hard to know ahead of time. We also need to reboot the node to activate the CPU reservations and other features that are not enabled initially and require a reboot, so we do that after we call bootstrap.sh.
In AL2023 I have tried generating a NodeConfig dynamically in the userdata script and then calling
nodeadm init --skip run -c file:///var/tmp/nodeconfig.txt
which seems to work, sort of, sometimes. I am seeing issues with the nodes not coming back up after the reboot, which appears to be caused by the issues thatensureEKSNetworkConfiguration
was designed to fix, so I changed the code to not skip therun
.That fixed the networking problem but now the kubelet is ignoring the custom configuration in the 00-nodeadm.conf
I have a feeling that either I'm missing something simple and that there is an easier way to do this, or that this use case is not supported and I have to come up with another solution.
Is there a way to generate
NodeConfig
configuration at boot time and inject that into the kubelet?Thanks in advance
The text was updated successfully, but these errors were encountered: