Skip to content

Clarify guidance for pod network routing, add resources to Overview #1035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: mainline
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 34 additions & 10 deletions latest/ug/nodes/hybrid-nodes-networking.adoc
Original file line number Diff line number Diff line change
@@ -18,26 +18,48 @@ image::images/hybrid-prereq-diagram.png[Hybrid node network connectivity.,scaled
[#hybrid-nodes-networking-on-prem]
== On-premises networking configuration

*Minimum network requirements*
[#hybrid-nodes-networking-min-reqs]
=== Minimum network requirements

For an optimal experience, {aws} recommends reliable network connectivity of at least 100 Mbps and a maximum of 200ms round trip latency for the hybrid nodes connection to the {aws} Region. The bandwidth and latency requirements can vary depending on the number of hybrid nodes and your workload characteristics such as application image size, application elasticity, monitoring and logging configurations, and application dependencies on accessing data stored in other {aws} services.
For an optimal experience, it is recommended to have reliable network connectivity of at least 100 Mbps and a maximum of 200ms round trip latency for the hybrid nodes connection to the {aws} Region. This is general guidance that accommodates most use cases but is not a strict requirement. The bandwidth and latency requirements can vary depending on the number of hybrid nodes and your workload characteristics, such as application image size, application elasticity, monitoring and logging configurations, and application dependencies on accessing data stored in other {aws} services. It is recommended to test with your own applications and environments before deploying to production to validate that your networking setup meets the requirements for your workloads.

*On-premises node and pod CIDRs*
[#hybrid-nodes-networking-on-prem-cidrs]
=== On-premises node and pod CIDRs

Identify the node and pod CIDRs you will use for your hybrid nodes and the workloads running on them. The node CIDR is allocated from your on-premises network and the pod CIDR is allocated from your Container Network Interface (CNI) if you are using an overlay network for your CNI. You pass your on-premises node CIDRs and optionally pod CIDRs as inputs when you create your EKS cluster with the `RemoteNodeNetwork` and `RemotePodNetwork` fields.
Identify the node and pod CIDRs you will use for your hybrid nodes and the workloads running on them. The node CIDR is allocated from your on-premises network and the pod CIDR is allocated from your Container Network Interface (CNI) if you are using an overlay network for your CNI. You pass your on-premises node CIDRs and pod CIDRs as inputs when you create your EKS cluster with the `RemoteNodeNetwork` and `RemotePodNetwork` fields. Your on-premises node CIDRs must be routable on your on-premises network. See the following section for information on the on-premises pod CIDR routability.

The on-premises node and pod CIDR blocks must meet the following requirements:

1. Be within one of the following `IPv4` RFC-1918 ranges: `10.0.0.0/8`, `172.16.0.0/12`, or `192.168.0.0/16`.
2. Not overlap with each other, the VPC CIDR for your EKS cluster, or your Kubernetes service `IPv4` CIDR.

If your CNI performs Network Address Translation (NAT) for pod traffic as it leaves your on-premises hosts, you do not need to make your pod CIDR routable on your on-premises network or configure your EKS cluster with your _remote pod network_ for hybrid nodes to become ready to workloads. If your CNI does not use NAT for pod traffic as it leaves your on-premises hosts, your pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network for hybrid nodes to become ready to workloads.
[#hybrid-nodes-networking-on-prem-pod-routing]
=== On-premises pod network routing

There are several techniques you can use to make your pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. {aws} supports the BGP capabilities of Cilium and Calico for advertising hybrid nodes pod CIDRs, see <<hybrid-nodes-cni, Configure CNI for hybrid nodes>> for more information.
When using EKS Hybrid Nodes, it is generally recommended to make your on-premises pod CIDRs routable on your on-premises network to enable full cluster communication and functionality between cloud and on-premises environments.

If you are running webhooks on hybrid nodes, your pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network so the EKS control plane can directly communicate with the webhooks running on hybrid nodes. If you cannot make your pod CIDR routable on your on-premises network but need to run webhooks, it is recommended to run webhooks on cloud nodes in the same EKS cluster. For more information on running webhooks on cloud nodes, see <<hybrid-nodes-webhooks, Configure webhooks for hybrid nodes>>.
*Routable pod networks*

*Access required during hybrid node installation and upgrade*
If you are able to make your pod network routable on your on-premises network, follow the guidance below.

1. Configure the `RemotePodNetwork` field for your EKS cluster with your on-premises pod CIDR, your VPC route tables with your on-premises pod CIDR, and your EKS cluster security group with your on-premises pod CIDR.
2. There are several techniques you can use to make your on-premises pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. {aws} supports the BGP capabilities of Cilium and Calico for advertising pod CIDRs, see <<hybrid-nodes-cni>> and <<hybrid-nodes-concepts-k8s-pod-cidrs>> for more information.
3. Webhooks can run on hybrid nodes as the EKS control plane is able to communicate with the Pod IP addresses assigned to the webhooks.
4. Workloads running on cloud nodes are able to communicate directly with workloads running on hybrid nodes in the same EKS cluster.
5. Other AWS services, such as AWS Application Load Balancers and Amazon Managed Service for Prometheus, are able to communicate with workloads running on hybrid nodes to balance network traffic and scrape pod metrics.

*Unroutable pod networks*

If you are _not_ able to make your pod networks routable on your on-premises network, follow the guidance below.

1. Webhooks cannot run on hybrid nodes because webhooks require connectivity from the EKS control plane to the Pod IP addresses assigned to the webhooks. In this case, it is recommended to run webhooks on cloud nodes in the same EKS cluster as your hybrid nodes, see <<hybrid-nodes-webhooks>> for more information.
2. Workloads running on cloud nodes are not able to communicate directly with workloads running on hybrid nodes when using the VPC CNI for cloud nodes and Cilium or Calico for hybrid nodes.
3. Use Service Traffic Distribution to keep traffic local to the zone it is originating from. For more information on Service Traffic Distribution, see <<hybrid-nodes-service-traffic-distribution>>.
4. Configure your CNI to use egress masquerade or network address translation (NAT) for pod traffic as it leaves your on-premises hosts. This is enabled by default in Cilium. Calico requires `natOutgoing` to be set to `true`.
5. Other AWS services, such as AWS Application Load Balancers and Amazon Managed Service for Prometheus, are not able to communicate with workloads running on hybrid nodes.

[#hybrid-nodes-networking-access-reqs]
=== Access required during hybrid node installation and upgrade

You must have access to the following domains during the installation process where you install the hybrid nodes dependencies on your hosts. This process can be done once when you are building your operating system images or it can be done on each host at runtime. This includes initial installation and when you upgrade the Kubernetes version of your hybrid nodes.

@@ -96,7 +118,8 @@ You must have access to the following domains during the installation process wh
^2^ Access to the {aws} IAM endpoints are only required if you are using {aws} IAM Roles Anywhere for your on-premises IAM credential provider.
====

*Access required for ongoing cluster operations*
[#hybrid-nodes-networking-access-reqs-ongoing]
=== Access required for ongoing cluster operations

The following network access for your on-premises firewall is required for ongoing cluster operations.

@@ -201,7 +224,8 @@ Depending on your choice of CNI, you need to configure additional network access
^1^ The IPs of the EKS cluster. See the following section on Amazon EKS elastic network interfaces.
====

*Amazon EKS network interfaces*
[#hybrid-nodes-networking-eks-network-interfaces]
=== Amazon EKS network interfaces

Amazon EKS attaches network interfaces to the subnets in the VPC you pass during cluster creation to enable the communication between the EKS control plane and your VPC. The network interfaces that Amazon EKS creates can be found after cluster creation in the Amazon EC2 console or with the {aws} CLI. The original network interfaces are deleted and new network interfaces are created when changes are applied on your EKS cluster, such as Kubernetes version upgrades. You can restrict the IP range for the Amazon EKS network interfaces by using constrained subnet sizes for the subnets you pass during cluster creation, which makes it easier to configure your on-premises firewall to allow inbound/outbound connectivity to this known, constrained set of IPs. To control which subnets network interfaces are created in, you can limit the number of subnets you specify when you create a cluster or you can update the subnets after creating the cluster.

2 changes: 1 addition & 1 deletion latest/ug/nodes/hybrid-nodes-os.adoc
Original file line number Diff line number Diff line change
@@ -10,7 +10,7 @@ include::../attributes.txt[]
Prepare operating system for use with Hybrid Nodes
--

Bottlerocket, Ubuntu, Red Hat Enterprise Linux (RHEL), and Amazon Linux 2023 (AL2023) are validated on an ongoing basis for use as the node operating system for hybrid nodes. {aws} supports the hybrid nodes integration with these operating systems but, with the exception of Bottlerocket, does not provide support for the operating systems itself. AL2023 is not covered by {aws} Support Plans when run outside of Amazon EC2. AL2023 can only be used in on-premises virtualized environments, reference the link:linux/al2023/ug/outside-ec2.html[Amazon Linux 2023 User Guide,type="documentation"] for more information.
Bottlerocket, Amazon Linux 2023 (AL2023), Ubuntu, and RHEL are validated on an ongoing basis for use as the node operating system for hybrid nodes. Bottlerocket is supported by {aws}in VMware vSphere environments only. AL2023 is not covered by {aws} Support Plans when run outside of Amazon EC2. AL2023 can only be used in on-premises virtualized environments, see the link:linux/al2023/ug/outside-ec2.html[Amazon Linux 2023 User Guide,type="documentation"] for more information. {aws} supports the hybrid nodes integration with Ubuntu and RHEL operating systems but does not provide support for the operating system itself.

You are responsible for operating system provisioning and management. When you are testing hybrid nodes for the first time, it is easiest to run the Amazon EKS Hybrid Nodes CLI (`nodeadm`) on an already provisioned host. For production deployments, we recommend that you include `nodeadm` in your operating system images with it configured to run as a systemd service to automatically join hosts to Amazon EKS clusters at host startup. If you are using Bottlerocket as your node operating system on vSphere, you do not need to use `nodeadm` as Bottlerocket already contains the dependencies required for hybrid nodes and will automatically connect to the cluster you configure upon host startup.

11 changes: 9 additions & 2 deletions latest/ug/nodes/hybrid-nodes-overview.adoc
Original file line number Diff line number Diff line change
@@ -12,7 +12,7 @@ Join nodes from your data centers to Amazon EKS Kubernetes clusters with Amazon

With _Amazon EKS Hybrid Nodes_, you can use your on-premises and edge infrastructure as nodes in Amazon EKS clusters. {aws} manages the {aws}-hosted Kubernetes control plane of the Amazon EKS cluster, and you manage the hybrid nodes that run in your on-premises or edge environments. This unifies Kubernetes management across your environments and offloads Kubernetes control plane management to {aws} for your on-premises and edge applications.

Amazon EKS Hybrid Nodes works with any on-premises hardware or virtual machines, bringing the efficiency, scalability, and availability of Amazon EKS to wherever your applications need to run. You can use a wide range of Amazon EKS features with Amazon EKS Hybrid Nodes including Amazon EKS add-ons, Amazon EKS Pod Identity, cluster access entries, cluster insights, and extended Kubernetes version support. Amazon EKS Hybrid Nodes natively integrates with {aws} services including {aws} Systems Manager, {aws} IAM Roles Anywhere, Amazon Managed Service for Prometheus, Amazon CloudWatch, and Amazon GuardDuty for centralized monitoring, logging, and identity management.
Amazon EKS Hybrid Nodes works with any on-premises hardware or virtual machines, bringing the efficiency, scalability, and availability of Amazon EKS to wherever your applications need to run. You can use a wide range of Amazon EKS features with Amazon EKS Hybrid Nodes including Amazon EKS add-ons, Amazon EKS Pod Identity, cluster access entries, cluster insights, and extended Kubernetes version support. Amazon EKS Hybrid Nodes natively integrates with {aws} services including {aws} Systems Manager, {aws} IAM Roles Anywhere, Amazon Managed Service for Prometheus, and Amazon CloudWatch for centralized monitoring, logging, and identity management.

With Amazon EKS Hybrid Nodes, there are no upfront commitments or minimum fees, and you are charged per hour for the vCPU resources of your hybrid nodes when they are attached to your Amazon EKS clusters. For more pricing information, see link:eks/pricing/[Amazon EKS Pricing,type="marketing"].

@@ -35,10 +35,17 @@ EKS Hybrid Nodes has the following high-level features:

* EKS Hybrid Nodes can be used with new or existing EKS clusters.
* EKS Hybrid Nodes is available in all {aws} Regions, except the {aws} GovCloud (US) Regions and the {aws} China Regions.
* EKS Hybrid Nodes must have a reliable connection between your on-premises environment and {aws}. EKS Hybrid Nodes is not a fit for disconnected, disrupted, intermittent or limited (DDIL) environments. If you are running in a DDIL environment, consider link:eks/eks-anywhere/[Amazon EKS Anywhere,type="marketing"].
* EKS Hybrid Nodes must have a reliable connection between your on-premises environment and {aws}. EKS Hybrid Nodes is not a fit for disconnected, disrupted, intermittent or limited (DDIL) environments. If you are running in a DDIL environment, consider link:eks/eks-anywhere/[Amazon EKS Anywhere,type="marketing"]. Reference the link:eks/latest/best-practices/hybrid-nodes-network-disconnections.html[Best Practices for EKS Hybrid Nodes,type="documentation"] for information on how hybrid nodes behave during network disconnection scenarios.
* Running EKS Hybrid Nodes on cloud infrastructure, including {aws} Regions, {aws} Local Zones, {aws} Outposts, or in other clouds, is not supported. You will be charged the hybrid nodes fee if you run hybrid nodes on Amazon EC2 instances.
* Billing for hybrid nodes starts when the nodes join the EKS cluster and stops when the nodes are removed from the cluster. Be sure to remove your hybrid nodes from your EKS cluster if you are not using them.

[#hybrid-nodes-resources]
== Additional resources

* link:https://www.eksworkshop.com/docs/networking/eks-hybrid-nodes/[**EKS Hybrid Nodes workshop**]: Step-by-step instructions for deploying EKS Hybrid Nodes in a demo environment.
* link:https://www.youtube.com/watch?v=ZxC7SkemxvU[**AWS re:Invent: EKS Hybrid Nodes**]: AWS re:Invent session introducing the EKS Hybrid Nodes launch with a customer showing how they are using EKS Hybrid Nodes in their environment.
* link:https://repost.aws/articles/ARL44xuau6TG2t-JoJ3mJ5Mw/unpacking-the-cluster-networking-for-amazon-eks-hybrid-nodes[**AWS re:Post: Cluster networking for EKS Hybrid Nodes**]: Article explaining various methods for setting up networking for EKS Hybrid Nodes.
* link:https://aws.amazon.com/blogs/containers/run-genai-inference-across-environments-with-amazon-eks-hybrid-nodes/[**AWS blog: Run GenAI inference across environments with EKS Hybrid Nodes**]: Blog post showing how to run GenAI inference across environments with EKS Hybrid Nodes.

include::hybrid-nodes-prereqs.adoc[leveloffset=+1]

15 changes: 7 additions & 8 deletions latest/ug/nodes/hybrid-nodes-prereqs.adoc
Original file line number Diff line number Diff line change
@@ -25,26 +25,25 @@ image::images/hybrid-prereq-diagram.png[Hybrid node network connectivity.,scaled

The communication between the Amazon EKS control plane and hybrid nodes is routed through the VPC and subnets you pass during cluster creation, which builds on the https://aws.github.io/aws-eks-best-practices/networking/subnets/[existing mechanism] in Amazon EKS for control plane to node networking. There are several link:whitepapers/latest/aws-vpc-connectivity-options/network-to-amazon-vpc-connectivity-options.html[documented options,type="documentation"] available for you to connect your on-premises environment with your VPC including {aws} Site-to-Site VPN, {aws} Direct Connect, or your own VPN connection. Reference the link:vpn/latest/s2svpn/VPC_VPN.html[{aws} Site-to-Site VPN,type="documentation"] and link:directconnect/latest/UserGuide/Welcome.html[{aws} Direct Connect,type="documentation"] user guides for more information on how to use those solutions for your hybrid network connection.

For an optimal experience, {aws} recommends reliable network connectivity of at least 100 Mbps and a maximum of 200ms round trip latency for the hybrid nodes connection to the {aws} Region. The bandwidth and latency requirements can vary depending on the number of hybrid nodes and your workload characteristics, such as application image size, application elasticity, monitoring and logging configurations, and application dependencies on accessing data stored in other {aws} services. We recommend that you test with your own applications and environments before deploying to production to validate that your networking setup meets the requirements for your workloads.
For an optimal experience, it is recommended to have reliable network connectivity of at least 100 Mbps and a maximum of 200ms round trip latency for the hybrid nodes connection to the {aws} Region. This is general guidance that accommodates most use cases but is not a strict requirement. The bandwidth and latency requirements can vary depending on the number of hybrid nodes and your workload characteristics, such as application image size, application elasticity, monitoring and logging configurations, and application dependencies on accessing data stored in other {aws} services. It is recommended to test with your own applications and environments before deploying to production to validate that your networking setup meets the requirements for your workloads.


[#hybrid-nodes-prereqs-onprem]
== On-premises network configuration

You must enable inbound network access from the Amazon EKS control plane to your on-premises environment to allow the Amazon EKS control plane to communicate with the `kubelet` running on hybrid nodes and optionally with webhooks running on your hybrid nodes. Additionally, you must enable outbound network access for your hybrid nodes and components running on them to communicate with the Amazon EKS control plane. You can configure this communication to stay fully private to your {aws} Direct Connect, {aws} Site-to-Site VPN, or your own VPN connection. For a full list of the required ports and protocols that you must enable in your firewall and on-premises environment, see <<hybrid-nodes-networking>>.

The Classless Inter-Domain Routing (CIDR) ranges you use for your on-premises node and pod networks must use IPv4 RFC1918 address ranges. When you create your hybrid nodes-enabled Amazon EKS cluster, you pass your on-premises node and optionally pod CIDRs to enable communication from the Amazon EKS control plane to your hybrid nodes and the resources running on them. Your on-premises router must be configured with routes to your on-premises nodes and optionally pods. You can use Border Gateway Protocol (BGP) or static configurations to advertise pod IPs to your router.
You must enable inbound network access from the Amazon EKS control plane to your on-premises environment to allow the Amazon EKS control plane to communicate with the `kubelet` running on hybrid nodes and optionally with webhooks running on your hybrid nodes. Additionally, you must enable outbound network access for your hybrid nodes and components running on them to communicate with the Amazon EKS control plane. You can configure this communication to stay fully private to your {aws} Direct Connect, {aws} Site-to-Site VPN, or your own VPN connection.

The Classless Inter-Domain Routing (CIDR) ranges you use for your on-premises node and pod networks must use IPv4 RFC-1918 address ranges. Your on-premises router must be configured with routes to your on-premises nodes and optionally pods. See <<hybrid-nodes-networking-on-prem>> for more information on the on-premises network requirements, including the full list of required ports and protocols that must be enabled in your firewall and on-premises environment.


[#hybrid-nodes-prereqs-cluster]
== EKS cluster configuration

To minimize latency, it is recommended to create your Amazon EKS cluster in the {aws} Region closest to your on-premises or edge environment. You pass your on-premises node and pod CIDRs during Amazon EKS cluster creation via two API fields: `RemoteNodeNetwork` and `RemotePodNetwork`. You may need to discuss with your on-premises network team to identify your on-premises node and pod CIDRs. The node CIDR is allocated from your on-premises network and the pod CIDR is allocated from the Container Network Interface (CNI) you use if you are using an overlay network for your CNI.
To minimize latency, it is recommended to create your Amazon EKS cluster in the {aws} Region closest to your on-premises or edge environment. You pass your on-premises node and pod CIDRs during Amazon EKS cluster creation via two API fields: `RemoteNodeNetwork` and `RemotePodNetwork`. You may need to discuss with your on-premises network team to identify your on-premises node and pod CIDRs. The node CIDR is allocated from your on-premises network and the pod CIDR is allocated from the Container Network Interface (CNI) you use if you are using an overlay network for your CNI. Cilium and Calico use overlay networks by default.

The on-premises node and pod CIDRs are used to configure the Amazon EKS control plane to route traffic through your VPC to the `kubelet` and the pods running on your hybrid nodes. Your on-premises node and pod CIDRs cannot overlap with each other, the VPC CIDR you pass during cluster creation, or the service IPv4 configuration for your Amazon EKS cluster. The pod CIDR is optional. You must configure your pod CIDR if your CNI does not use Network Address Translation (NAT) or masquerading for pod IP addresses when pod traffic leaves your on-premises hosts. You additionally must configure your pod CIDR if you are running _Kubernetes webhooks_ on hybrid nodes. For example, {aws} Distro for Open Telemetry (ADOT) uses webhooks.
The on-premises node and pod CIDRs you configure via the `RemoteNodeNetwork` and `RemotePodNetwork` fields are used to configure the Amazon EKS control plane to route traffic through your VPC to the `kubelet` and the pods running on your hybrid nodes. Your on-premises node and pod CIDRs cannot overlap with each other, the VPC CIDR you pass during cluster creation, or the service IPv4 configuration for your Amazon EKS cluster.

It is recommended to use either public or private endpoint access for the Amazon EKS Kubernetes API server endpoint. If you choose “Public and Private”, the Amazon EKS Kubernetes API server endpoint will always resolve to the public IPs for hybrid nodes running outside of your VPC, which can prevent your hybrid nodes from joining the cluster. You can use either public or private endpoint access for the Amazon EKS Kubernetes API server endpoint. You cannot choose “Public and Private”. When you use public endpoint access, the Kubernetes API server endpoint is resolved to public IPs and the communication from hybrid nodes to the Amazon EKS control plane will be routed over the internet. When you choose private endpoint access, the Kubernetes API server endpoint is resolved to private IPs and the communication from hybrid nodes to the Amazon EKS control plane will be routed over your private connectivity link, in most cases {aws} Direct Connect or {aws} Site-to-Site VPN.
It is recommended to use either public or private endpoint access for the Amazon EKS Kubernetes API server endpoint. If you choose “Public and Private”, the Amazon EKS Kubernetes API server endpoint will always resolve to the public IPs for hybrid nodes running outside of your VPC, which can prevent your hybrid nodes from joining the cluster. When you use public endpoint access, the Kubernetes API server endpoint is resolved to public IPs and the communication from hybrid nodes to the Amazon EKS control plane will be routed over the internet. When you choose private endpoint access, the Kubernetes API server endpoint is resolved to private IPs and the communication from hybrid nodes to the Amazon EKS control plane will be routed over your private connectivity link, in most cases {aws} Direct Connect or {aws} Site-to-Site VPN.


[#hybrid-nodes-prereqs-vpc]
@@ -115,7 +114,7 @@ You must have bare metal servers or virtual machines available to use as hybrid
[#hybrid-nodes-prereqs-os]
== Operating system

Amazon Linux 2023 (AL2023), Ubuntu, and RHEL are validated on an ongoing basis for use as the node operating system for hybrid nodes. {aws} supports the hybrid nodes integration with these operating systems but does not provide support for the operating systems itself. AL2023 is not covered by {aws} Support Plans when run outside of Amazon EC2. AL2023 can only be used in on-premises virtualized environments, see the link:linux/al2023/ug/outside-ec2.html[Amazon Linux 2023 User Guide,type="documentation"] for more information.
Bottlerocket, Amazon Linux 2023 (AL2023), Ubuntu, and RHEL are validated on an ongoing basis for use as the node operating system for hybrid nodes. Bottlerocket is supported by {aws}in VMware vSphere environments only. AL2023 is not covered by {aws} Support Plans when run outside of Amazon EC2. AL2023 can only be used in on-premises virtualized environments, see the link:linux/al2023/ug/outside-ec2.html[Amazon Linux 2023 User Guide,type="documentation"] for more information. {aws} supports the hybrid nodes integration with Ubuntu and RHEL operating systems but does not provide support for the operating system itself.

You are responsible for operating system provisioning and management. When you are testing hybrid nodes for the first time, it is easiest to run the Amazon EKS Hybrid Nodes CLI (`nodeadm`) on an already provisioned host. For production deployments, it is recommended to include `nodeadm` in your golden operating system images with it configured to run as a systemd service to automatically join hosts to Amazon EKS clusters at host startup.

38 changes: 23 additions & 15 deletions latest/ug/nodes/hybrid-nodes-webhooks.adoc
Original file line number Diff line number Diff line change
@@ -13,27 +13,30 @@ Configure webhooks for hybrid nodes

This page details considerations for running webhooks with hybrid nodes. Webhooks are used in Kubernetes applications and open source projects, such as the {aws} Load Balancer Controller and CloudWatch Observability Agent, to perform mutating and validation capabilities at runtime.

If you are running webhooks on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network. Also you must configure your EKS cluster with your remote pod network so the EKS control plane can communicate with the webhooks running on hybrid nodes.
*Routable pod networks*

There are several techniques you can use to make your on-premises pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. {aws} supports the BGP capabilities of Cilium and Calico for advertising hybrid nodes pod CIDRs, see <<hybrid-nodes-cni, Configure CNI for hybrid nodes>> for more information.
If you are able to make your on-premises pod CIDR routable on your on-premises network, you can run webhooks on hybrid nodes. There are several techniques you can use to make your on-premises pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. {aws} supports the BGP capabilities of Cilium and Calico for advertising pod CIDRs, see <<hybrid-nodes-cni>> and <<hybrid-nodes-concepts-k8s-pod-cidrs>> for more information.

If you _cannot_ make your on-premises pod CIDR routable on your on-premises network and need to run webhooks, we recommend that you run all of your webhooks in the {aws} Cloud. To function, a webhook must run in the same EKS cluster as your hybrid nodes.
*Unroutable pod networks*

If you _cannot_ make your on-premises pod CIDR routable on your on-premises network and need to run webhooks, it is recommended to run all webhooks on cloud nodes in the same EKS cluster as your hybrid nodes.

[#hybrid-nodes-considerations-mixed-mode]
== Considerations for mixed mode clusters

_Mixed mode clusters_ are defined as EKS clusters that have both hybrid nodes and nodes running in {aws} Cloud. When running a mixed mode cluster, consider the following recommendations:

- Run the VPC CNI on nodes in {aws} Cloud and either Cilium or Calico on hybrid nodes. Cilium and Calico are not supported by {aws} when running on nodes in {aws} Cloud.
- If your applications require pods running on nodes in {aws} Cloud to directly communicate with pods running on hybrid nodes ("east-west communication"), and you are using the VPC CNI on nodes in {aws} Cloud and Cilium or Calico in overlay/tunnel mode on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
- Run at least one replica of CoreDNS on nodes in {aws} Cloud and at least one replica of CoreDNS on hybrid nodes, see <<hybrid-nodes-mixed-mode, Configure add-ons and webhooks for mixed mode clusters>> for configuration steps.
- Configure webhooks to run on nodes in {aws} Cloud. See <<hybrid-nodes-webhooks-add-ons, Configuring webhooks for add-ons>> for how to configure the webhooks used by {aws} and community add-ons when running mixed mode clusters.
- If you are using Application Load Balancers (ALB) or Network Load Balancers (NLB) for workload traffic running on hybrid nodes, then the IP target(s) used with the ALB or NLB must be routable from {aws}.
- Configure webhooks to run on nodes in {aws} Cloud. See <<hybrid-nodes-webhooks-add-ons>> for how to configure the webhooks for {aws} and community add-ons.
- If your applications require pods running on nodes in {aws} Cloud to directly communicate with pods running on hybrid nodes ("east-west communication"), and you are using the VPC CNI on nodes in {aws} Cloud, and Cilium or Calico on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
- Run at least one replica of CoreDNS on nodes in {aws} Cloud and at least one replica of CoreDNS on hybrid nodes.
- Configure Service Traffic Distribution to keep Service traffic local to the zone it is originating from. For more information on Service Traffic Distribution, see <<hybrid-nodes-service-traffic-distribution>>.
- If you are using AWS Application Load Balancers (ALB) or Network Load Balancers (NLB) for workload traffic running on hybrid nodes, then the IP target(s) used with the ALB or NLB must be routable from {aws}.
- The Metrics Server add-on requires connectivity from the EKS control plane to the Metrics Server pod IP address. If you are running the Metrics Server add-on on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
- To collect metrics for hybrid nodes using Amazon Managed Service for Prometheus (AMP) managed collectors, your on-premises pod CIDR must be routable on your on-premises network. Or, you can use the AMP managed collector for EKS control plane metrics and nodes running in {aws} Cloud, and the {aws} Distro for OpenTelemetry (ADOT) add-on to collect metrics for hybrid nodes.
- To collect metrics for hybrid nodes using Amazon Managed Service for Prometheus (AMP) managed collectors, your on-premises pod CIDR must be routable on your on-premises network. Or, you can use the AMP managed collector for EKS control plane metrics and resources running in {aws} Cloud, and the {aws} Distro for OpenTelemetry (ADOT) add-on to collect metrics for hybrid nodes.
[#hybrid-nodes-mixed-mode]
== Configure add-ons and webhooks for mixed mode clusters
== Configure mixed mode clusters

To view the mutating and validating webhooks running on your cluster, you can view the *Extensions* resource type in the *Resources* panel of the EKS console for your cluster, or you can use the following commands. EKS also reports webhook metrics in the cluster observability dashboard, see <<observability-dashboard>> for more information.

@@ -47,15 +50,20 @@ kubectl get mutatingwebhookconfigurations
kubectl get validatingwebhookconfigurations
----

[#hybrid-nodes-mixed-coredns]
=== Configure CoreDNS replicas
[#hybrid-nodes-mixed-service-traffic-distribution]
=== Configure Service Traffic Distribution

If you are running a mixed mode cluster with both hybrid nodes and nodes in {aws} Cloud, we recommend that you have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on your nodes in {aws} Cloud. To prevent latency and network issues in a mixed mode cluster setup, you can configure the CoreDNS Service to prefer the closest CoreDNS replica with link:https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution[Service Traffic Distribution].

_Service Traffic Distribution_ (available for Kubernetes versions 1.31 and later in EKS) is the recommended solution over link:https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/[Topology Aware Routing] because it is more predictable. In Service Traffic Distribution, healthy endpoints in the zone will receive all of the traffic for that zone. In Topology Aware Routing, each service must meet several conditions in that zone to apply the custom routing, otherwise it routes traffic evenly to all endpoints. The following steps configure Service Traffic Distribution.
When running mixed mode clusters, it is recommended to use link:https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution[_Service Traffic Distribution_] to keep Service traffic local to the zone it is originating from. Service Traffic Distribution (available for Kubernetes versions 1.31 and later in EKS) is the recommended solution over link:https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/[Topology Aware Routing] because it is more predictable. With Service Traffic Distribution, healthy endpoints in the zone will receive all of the traffic for that zone. With Topology Aware Routing, each service must meet several conditions in that zone to apply the custom routing, otherwise it routes traffic evenly to all endpoints.

If you are using Cilium as your CNI, you must run the CNI with the `enable-service-topology` set to `true` to enable Service Traffic Distribution. You can pass this configuration with the Helm install flag `--set loadBalancer.serviceTopology=true` or you can update an existing installation with the Cilium CLI command `cilium config set enable-service-topology true`. The Cilium agent running on each node must be restarted after updating the configuration for an existing installation.

An example of how to configure Service Traffic Distribution for the CoreDNS Service is shown in the section below, and it is recommended to enable the same for all Services in your cluster to avoid unintended cross-environment traffic.

[#hybrid-nodes-mixed-coredns]
=== Configure CoreDNS replicas

When running mixed mode clusters, it is recommended to have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on nodes in {aws} Cloud.

. Add a topology zone label for each of your hybrid nodes, for example `topology.kubernetes.io/zone: onprem`. Or, you can set the label at the `nodeadm init` phase by specifying the label in your `nodeadm` configuration, see <<hybrid-nodes-nodeadm-kubelet>>. Note, nodes running in {aws} Cloud automatically get a topology zone label applied to them that corresponds to the availability zone (AZ) of the node.
+
[source,bash,subs="verbatim,attributes,quotes"]
@@ -100,7 +108,7 @@ spec:
...
----
+
. Add the setting `trafficDistribution: PreferClose` to the `kube-dns` Service configuration to enable Topology Aware Routing.
. Add the setting `trafficDistribution: PreferClose` to the `kube-dns` Service configuration to enable Service Traffic Distribution.
+
[source,bash,subs="verbatim,attributes"]
----