Skip to content

Conversation

@faiq
Copy link
Contributor

@faiq faiq commented Oct 7, 2025

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR implements the nodeadm config type outlined by the KEP #5678

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Checklist:

  • squashed commits
  • includes documentation
  • includes emoji in title
  • adds unit tests
  • adds or updates e2e tests

Release note:

Adds Nodeadm bootstrapping type

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign neolit123 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested review from AndiDog and nrb October 7, 2025 20:30
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 7, 2025
@faiq
Copy link
Contributor Author

faiq commented Oct 7, 2025

testing with this manifest

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    services:
      cidrBlocks:
      - 10.96.0.0/12
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta2
    kind: AWSManagedControlPlane
    name: default-control-plane
  infrastructureRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta2
    kind: AWSManagedControlPlane
    name: default-control-plane
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta2
kind: AWSManagedControlPlane
metadata:
  name: default-control-plane
spec:
  addons:
  - name: kube-proxy
    version: v1.32.0-eksbuild.2
  network:
    cni:
      cniIngressRules:
      - description: kube-proxy metrics
        fromPort: 10249
        protocol: tcp
        toPort: 10249
      - description: NVIDIA Data Center GPU Manager metrics
        fromPort: 9400
        protocol: tcp
        toPort: 9400
      - description: Prometheus node exporter metrics
        fromPort: 9100
        protocol: tcp
        toPort: 9100
  region: us-west-2
  sshKeyName: ""
  version: v1.33.0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSMachineTemplate
metadata:
  name: default
spec:
  template:
    spec:
      cloudInit:
        insecureSkipSecretsManager: true
      ami:
        eksLookupType: AmazonLinux2023
      instanceMetadataOptions:
        httpTokens: required
        httpPutResponseHopLimit: 2
      iamInstanceProfile: nodes.cluster-api-provider-aws.sigs.k8s.io
      instanceType: m5a.16xlarge
      rootVolume:
        size: 80
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: NodeadmConfigTemplate
metadata:
  name: default
spec:
  template:
    spec: {}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: default
spec:
  clusterName: default
  replicas: 1
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
          kind: NodeadmConfigTemplate
          name: default
      clusterName: default
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
        kind: AWSMachineTemplate
        name: default
      version: v1.33.0

@faiq
Copy link
Contributor Author

faiq commented Oct 7, 2025

/retest

2 similar comments
@faiq
Copy link
Contributor Author

faiq commented Oct 8, 2025

/retest

@faiq
Copy link
Contributor Author

faiq commented Oct 8, 2025

/retest

@faiq faiq force-pushed the faiq/nodeadm-upstream branch 2 times, most recently from 8f854bd to 81f3664 Compare October 8, 2025 20:43
@faiq
Copy link
Contributor Author

faiq commented Oct 8, 2025

/test ?

@k8s-ci-robot
Copy link
Contributor

@faiq: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-aws-build
/test pull-cluster-api-provider-aws-build-docker
/test pull-cluster-api-provider-aws-e2e-blocking
/test pull-cluster-api-provider-aws-test
/test pull-cluster-api-provider-aws-verify

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-aws-apidiff-main
/test pull-cluster-api-provider-aws-e2e
/test pull-cluster-api-provider-aws-e2e-clusterclass
/test pull-cluster-api-provider-aws-e2e-conformance
/test pull-cluster-api-provider-aws-e2e-conformance-with-ci-artifacts
/test pull-cluster-api-provider-aws-e2e-eks
/test pull-cluster-api-provider-aws-e2e-eks-gc
/test pull-cluster-api-provider-aws-e2e-eks-testing

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-aws-apidiff-main
pull-cluster-api-provider-aws-build
pull-cluster-api-provider-aws-build-docker
pull-cluster-api-provider-aws-e2e-blocking
pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@faiq
Copy link
Contributor Author

faiq commented Oct 8, 2025

/test /test pull-cluster-api-provider-aws-e2e-eks

@dsanders1234
Copy link

Does this work with AWSManagedMachinePool ?

@faiq faiq changed the title 🌱 feat: starts bootstrapping new type with kubebuilder 🌱 feat: implements nodeadm bootstrapping type Oct 9, 2025
@faiq faiq force-pushed the faiq/nodeadm-upstream branch from 81f3664 to 59ecae0 Compare October 9, 2025 19:01
@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

@dsanders1234 try this

 apiVersion: cluster.x-k8s.io/v1beta1
 kind: MachinePool
 metadata:
   name: default
 spec:
   clusterName: default
   template:
     spec:
       bootstrap:
         #dataSecretName: ""
         configRef:
           apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
           kind: NodeadmConfig
           name: default
       clusterName: default
       infrastructureRef:
         apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
         kind: AWSManagedMachinePool
         name: default
       version: v1.33.0
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: AWSManagedMachinePool
metadata:
  name: default
spec:
  roleName: "nodes.cluster-api-provider-aws.sigs.k8s.io"
  scaling:
    minSize: 1
    maxSize: 3
  amiType: CUSTOM
  awsLaunchTemplate:
    ami:
      eksLookupType: AmazonLinux2023
    instanceMetadataOptions:
      httpTokens: required
      httpPutResponseHopLimit: 2
    instanceType: "m5a.16xlarge"
    rootVolume:
      size: 80
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta2
kind: NodeadmConfig
metadata:
  name: default
spec:
  kubelet:
    config:
      evictionHard:
        memory.available: "2000Mi"

@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq faiq force-pushed the faiq/nodeadm-upstream branch 2 times, most recently from 10503ae to 476eb43 Compare October 9, 2025 20:20
@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

/retest

3 similar comments
@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

/retest

@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

/retest

@faiq
Copy link
Contributor Author

faiq commented Oct 9, 2025

/retest

@faiq faiq force-pushed the faiq/nodeadm-upstream branch from 476eb43 to 234d905 Compare October 10, 2025 02:57
@faiq faiq force-pushed the faiq/nodeadm-upstream branch from 6f464e0 to e89afb2 Compare November 10, 2025 21:32
@faiq
Copy link
Contributor Author

faiq commented Nov 10, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq faiq force-pushed the faiq/nodeadm-upstream branch from e89afb2 to a9bf155 Compare November 10, 2025 22:50
@faiq
Copy link
Contributor Author

faiq commented Nov 10, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq faiq force-pushed the faiq/nodeadm-upstream branch from a9bf155 to d657f69 Compare November 11, 2025 00:25
@faiq
Copy link
Contributor Author

faiq commented Nov 11, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq
Copy link
Contributor Author

faiq commented Nov 11, 2025

The last test failed in deleting the aws managed control plane, which is entirely unrelated to these changes.

EDITS:

see run here: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-aws/5700/pull-cluster-api-provider-aws-e2e-eks/1988044875567730688

i'll run the tests once again to get that sweet ✔️

@faiq
Copy link
Contributor Author

faiq commented Nov 11, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@matzegebbe
Copy link
Contributor

First of all, thanks for the work on this PR - I'm currently testing the build artifacts from this branch of @faiq in our environment.

During evaluation, I encountered RBAC-related errors such as:

nodeadmconfigs.bootstrap.cluster.x-k8s.io is forbidden: User "system:serviceaccount:capa-system:capa-controller-manager" cannot list resource "nodeadmconfigs" in API group "bootstrap.cluster.x-k8s.io" at the cluster scope

but we can of course fix that, along with a few other small stumbling blocks with the new CRDs and configuration.

While I'm working through these issues and completing our internal validation, I wanted to kindly ask whether there are any remaining blockers preventing this PR from being approved and merged.

EKS nodes images based on Amazon Linux 2 will go out of support on November 26 2025, and we are already running into additional problems with AL2 and the latest runc version (see: awslabs/amazon-eks-ami#2498
). In my opinion, ensuring that Cluster API and CAPA offer solid support for running fully supported machine images on AWS is critical for the project’s overall reliability and long-term relevance.

Thanks in advance for any update or clarification on the merge timeline ✌️

@faiq
Copy link
Contributor Author

faiq commented Nov 17, 2025

While I'm working through these issues and completing our internal validation, I wanted to kindly ask whether there are any remaining blockers preventing this PR from being approved and merged.

As far as I can tell, there are none other than the open source maintainers being busy both with their day jobs. If you can please push in the k8s slack for some reviews. Also, feel free to review this as well. All comments welcome.

@faiq faiq force-pushed the faiq/nodeadm-upstream branch from 9f4ef81 to 483bbb3 Compare November 20, 2025 21:26
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2025
@faiq faiq force-pushed the faiq/nodeadm-upstream branch from 483bbb3 to 439664b Compare November 26, 2025 17:08
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 26, 2025
@faiq
Copy link
Contributor Author

faiq commented Nov 26, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq
Copy link
Contributor Author

faiq commented Nov 26, 2025

/retest

@faiq faiq force-pushed the faiq/nodeadm-upstream branch from 439664b to b685eea Compare November 26, 2025 17:42
@faiq
Copy link
Contributor Author

faiq commented Nov 26, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@faiq
Copy link
Contributor Author

faiq commented Nov 26, 2025

/retest

1 similar comment
@faiq
Copy link
Contributor Author

faiq commented Nov 26, 2025

/retest

@faiq faiq force-pushed the faiq/nodeadm-upstream branch from b685eea to 0dd3488 Compare November 26, 2025 23:32
@faiq
Copy link
Contributor Author

faiq commented Nov 26, 2025

/test pull-cluster-api-provider-aws-e2e-eks

@k8s-ci-robot
Copy link
Contributor

@faiq: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-aws-test 0dd3488 link true /test pull-cluster-api-provider-aws-test
pull-cluster-api-provider-aws-e2e-eks 0dd3488 link false /test pull-cluster-api-provider-aws-e2e-eks

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants