Skip to content

[Hybrid Nodes] add details to Service Traffic Distribution guidance #994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: mainline
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions latest/ug/nodes/hybrid-nodes-cni.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ ipam:
clusterPoolIPv4MaskSize: [.replaceable]`25`
clusterPoolIPv4PodCIDRList:
- [.replaceable]`POD_CIDR`
loadBalancer:
serviceTopology: true
operator:
affinity:
nodeAffinity:
Expand Down
1 change: 1 addition & 0 deletions latest/ug/nodes/hybrid-nodes-nodeadm.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -538,6 +538,7 @@ spec:
privateKeyPath: # Path to the private key file for the certificate
----

[#hybrid-nodes-nodeadm-kubelet]
== Node Config for customizing kubelet (Optional)

You can pass kubelet configuration and flags in your `nodeadm` configuration. See the example below for how to add an additional node label `abc.amazonaws.com/test-label` and config for setting `shutdownGracePeriod` to 30 seconds.
Expand Down
74 changes: 54 additions & 20 deletions latest/ug/nodes/hybrid-nodes-webhooks.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,26 +11,26 @@ include::../attributes.txt[]
Configure webhooks for hybrid nodes
--

This page details considerations for running webhooks with hybrid nodes. Webhooks are used in Kubernetes applications and open-source projects, such as the {aws} Load Balancer Controller and CloudWatch Observability Agent, to perform mutating and validation capabilities at runtime.
This page details considerations for running webhooks with hybrid nodes. Webhooks are used in Kubernetes applications and open source projects, such as the {aws} Load Balancer Controller and CloudWatch Observability Agent, to perform mutating and validation capabilities at runtime.

If you are running webhooks on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network and you must configure your EKS cluster with your remote pod network so the EKS control plane can communicate with the webhooks running on hybrid nodes.
If you are running webhooks on hybrid nodes, your on-premises pod CIDR must be routable on your on-premises network. Also you must configure your EKS cluster with your remote pod network so the EKS control plane can communicate with the webhooks running on hybrid nodes.

There are several techniques you can use to make your on-premises pod CIDR routable on your on-premises network including Border Gateway Protocol (BGP), static routes, or other custom routing solutions. BGP is the recommended solution as it is more scalable and easier to manage than alternative solutions that require custom or manual route configuration. {aws} supports the BGP capabilities of Cilium and Calico for advertising hybrid nodes pod CIDRs, see <<hybrid-nodes-cni, Configure CNI for hybrid nodes>> for more information.

If you _cannot_ make your on-premises pod CIDR routable on your on-premises network and need to run webhooks, it is recommended to run webhooks on EC2 instances in the same EKS cluster as your hybrid nodes.
If you _cannot_ make your on-premises pod CIDR routable on your on-premises network and need to run webhooks, we recommend that you run all of your webhooks in the {aws} Cloud. To function, a webhook must run in the same EKS cluster as your hybrid nodes.

[#hybrid-nodes-considerations-mixed-mode]
== Considerations for mixed mode clusters

Mixed mode clusters are defined as EKS clusters that have both hybrid nodes and nodes running in {aws} Cloud. When running a mixed mode cluster, consider the following recommendations:
_Mixed mode clusters_ are defined as EKS clusters that have both hybrid nodes and nodes running in {aws} Cloud. When running a mixed mode cluster, consider the following recommendations:

- Run the VPC CNI on nodes in {aws} Cloud and either Cilium or Calico on hybrid nodes. Cilium and Calico are not supported by {aws} when running on nodes in {aws} Cloud.
- If your applications require pods running on nodes in {aws} Cloud to directly communicate with pods running on hybrid nodes ("east-west communication"), and you are using the VPC CNI on nodes in {aws} Cloud and Cilium or Calico in overlay/tunnel mode on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
- Run at least one replica of CoreDNS on nodes in {aws} Cloud and at least one replica of CoreDNS on hybrid nodes, see <<hybrid-nodes-mixed-mode, Configure add-ons and webhooks for mixed mode clusters>> for configuration steps.
- Configure webhooks to run on nodes in {aws} Cloud. See <<hybrid-nodes-webhooks-add-ons, Configuring webhooks for add-ons>> for how to configure the webhooks used by {aws} and community add-ons when running mixed mode clusters.
- If you are using Application Load Balancers (ALB) or Network Load Balancers (NLB) for workload traffic running on hybrid nodes, then the IP target(s) used with the ALB or NLB must be routable from {aws}.
- If you are using Application Load Balancers (ALB) or Network Load Balancers (NLB) for workload traffic running on hybrid nodes, then the IP target(s) used with the ALB or NLB must be routable from {aws}.
- The Metrics Server add-on requires connectivity from the EKS control plane to the Metrics Server pod IP address. If you are running the Metrics Server add-on on hybrid nodes, then your on-premises pod CIDR must be routable on your on-premises network.
- To collect metrics for hybrid nodes using Amazon Managed Service for Prometheus (AMP) managed collectors, your on-premises pod CIDR must be routable on your on-premises network. You can alternatively use the AMP managed collector for EKS control plane metrics and nodes running in {aws} Cloud, and the {aws} Distro for OpenTelemetry (ADOT) add-on to collect metrics for hybrid nodes.
- To collect metrics for hybrid nodes using Amazon Managed Service for Prometheus (AMP) managed collectors, your on-premises pod CIDR must be routable on your on-premises network. Or, you can use the AMP managed collector for EKS control plane metrics and nodes running in {aws} Cloud, and the {aws} Distro for OpenTelemetry (ADOT) add-on to collect metrics for hybrid nodes.

[#hybrid-nodes-mixed-mode]
== Configure add-ons and webhooks for mixed mode clusters
Expand All @@ -50,16 +50,20 @@ kubectl get validatingwebhookconfigurations
[#hybrid-nodes-mixed-coredns]
=== Configure CoreDNS replicas

If you are running a mixed mode cluster with both hybrid nodes and nodes in {aws} Cloud, it is recommended to have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on your nodes in {aws} Cloud. The CoreDNS Service can be configured to prefer the closest CoreDNS replica to prevent latency and network issues in a mixed mode cluster setup with the following steps.
If you are running a mixed mode cluster with both hybrid nodes and nodes in {aws} Cloud, we recommend that you have at least one CoreDNS replica on hybrid nodes and at least one CoreDNS replica on your nodes in {aws} Cloud. To prevent latency and network issues in a mixed mode cluster setup, you can configure the CoreDNS Service to prefer the closest CoreDNS replica with link:https://kubernetes.io/docs/reference/networking/virtual-ips/#traffic-distribution[Service Traffic Distribution].

. Add a topology zone label for each of your hybrid nodes, for example `topology.kubernetes.io/zone: onprem`. This can alternatively be done at the `nodeadm init` phase by specifying the label in your `nodeadm` configuration. Note, nodes running in {aws} Cloud automatically get a topology zone label applied to them that corresponds to the availability zone (AZ) of the node.
_Service Traffic Distribution_ (available for Kubernetes versions 1.31 and later in EKS) is the recommended solution over link:https://kubernetes.io/docs/concepts/services-networking/topology-aware-routing/[Topology Aware Routing] because it is more predictable. In Service Traffic Distribution, healthy endpoints in the zone will receive all of the traffic for that zone. In Topology Aware Routing, each service must meet several conditions in that zone to apply the custom routing, otherwise it routes traffic evenly to all endpoints. The following steps configure Service Traffic Distribution.

If you are using Cilium as your CNI, you must run the CNI with the `enable-service-topology` set to `true` to enable Service Traffic Distribution. You can pass this configuration with the Helm install flag `--set loadBalancer.serviceTopology=true` or you can update an existing installation with the Cilium CLI command `cilium config set enable-service-topology true`. The Cilium agent running on each node must be restarted after updating the configuration for an existing installation.

. Add a topology zone label for each of your hybrid nodes, for example `topology.kubernetes.io/zone: onprem`. Or, you can set the label at the `nodeadm init` phase by specifying the label in your `nodeadm` configuration, see <<hybrid-nodes-nodeadm-kubelet>>. Note, nodes running in {aws} Cloud automatically get a topology zone label applied to them that corresponds to the availability zone (AZ) of the node.
+
[source,bash,subs="verbatim,attributes,quotes"]
----
kubectl label node [.replaceable]`hybrid-node-name` topology.kubernetes.io/zone=[.replaceable]`zone`
----
+
. Add `podAntiAffinity` to the CoreDNS deployment configuration with the topology zone key. You can alternatively configure the CoreDNS deployment during installation with EKS add-ons.
. Add `podAntiAffinity` to the CoreDNS deployment with the topology zone key. Or, you can configure the CoreDNS deployment during installation with EKS add-ons.
+
[source,bash,subs="verbatim,attributes,quotes"]
----
Expand Down Expand Up @@ -96,35 +100,65 @@ spec:
...
----
+
. Add `trafficDistribution` to the kube-dns Service configuration.
. Add the setting `trafficDistribution: PreferClose` to the `kube-dns` Service configuration to enable Topology Aware Routing.
Copy link
Preview

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency between this instruction and the rest of the PR guidance. The title and earlier instructions suggest that Service Traffic Distribution is recommended over Topology Aware Routing. Please clarify whether the setting trafficDistribution: PreferClose is intended to enable Service Traffic Distribution or Topology Aware Routing and update the text accordingly.

Copilot uses AI. Check for mistakes.

+
[source,bash,subs="verbatim,attributes"]
----
kubectl patch svc kube-dns -n kube-system --type=merge -p '{
"spec": {
"trafficDistribution": "PreferClose"
}
}'
----
+
. You can confirm that Service Traffic Distribution is enabled by viewing the endpoint slices for the `kube-dns` Service. Your endpoint slices must show the `hints` for your topology zone labels, which confirms that Service Traffic Distribution is enabled. If you do not see the `hints` for each endpoint address, then Service Traffic Distribution is not enabled.
+
[source,bash,subs="verbatim,attributes"]
----
kubectl edit service kube-dns -n kube-system
kubectl get endpointslice -A | grep "kube-dns"
----
+
[source,bash,subs="verbatim,attributes"]
----
kubectl get endpointslice [.replaceable]`kube-dns-<id>` -n kube-system -o yaml
----
+
[source,yaml,subs="verbatim,attributes"]
----
spec:
...
trafficDistribution: PreferClose
addressType: IPv4
apiVersion: discovery.k8s.io/v1
endpoints:
- addresses:
- <your-hybrid-node-pod-ip>
hints:
forZones:
- name: onprem
nodeName: <your-hybrid-node-name>
zone: onprem
- addresses:
- <your-cloud-node-pod-ip>
hints:
forZones:
- name: us-west-2a
nodeName: <your-cloud-node-name>
zone: us-west-2a
----

[#hybrid-nodes-webhooks-add-ons]
=== Configure webhooks for add-ons

The following add-ons use webhooks and are supported for use with hybrid nodes.
The following add-ons use webhooks and are supported for use with hybrid nodes.

- {aws} Load Balancer Controller
- CloudWatch Observability Agent
- {aws} Distro for OpenTelemetry (ADOT)

See the sections below for configuring the webhooks used by these add-ons to run on nodes in {aws} Cloud.
See the following sections for configuring the webhooks used by these add-ons to run on nodes in {aws} Cloud.

[#hybrid-nodes-mixed-lbc]
==== {aws} Load Balancer Controller

To run the {aws} Load Balancer Controller on nodes in {aws} Cloud in a mixed mode cluster setup, add the following to your Helm values configuration or specify the values using EKS add-on configuration.
To use the {aws} Load Balancer Controller in a mixed mode cluster setup, you must run the controller on nodes in {aws} Cloud. To do so, add the following to your Helm values configuration or specify the values by using EKS add-on configuration.

[source,yaml,subs="verbatim,attributes"]
----
Expand All @@ -142,7 +176,7 @@ affinity:
[#hybrid-nodes-mixed-cwagent]
==== CloudWatch Observability Agent

The CloudWatch Observability Agent add-on has an operator that uses webhooks. To run the operator on nodes in {aws} Cloud in a mixed mode cluster setup, edit the CloudWatch Observability Agent operator configuration. The ability to configure operator affinity during installation with Helm and EKS add-ons is planned for a future release (see link:https://github.com/aws/containers-roadmap/issues/2431[containers-roadmap issue #2431]).
The CloudWatch Observability Agent add-on has a Kubernetes Operator that uses webhooks. To run the operator on nodes in {aws} Cloud in a mixed mode cluster setup, edit the CloudWatch Observability Agent operator configuration. You can't configure the operator affinity during installation with Helm and EKS add-ons (see link:https://github.com/aws/containers-roadmap/issues/2431[containers-roadmap issue #2431]).

[source,bash,subs="verbatim,attributes"]
----
Expand Down Expand Up @@ -170,7 +204,7 @@ spec:
[#hybrid-nodes-mixed-adot]
==== {aws} Distro for OpenTelemetry (ADOT)

The {aws} Distro for OpenTelemetry (ADOT) add-on has an operator that uses webhooks. To run the operator on nodes in {aws} Cloud in a mixed mode cluster setup, add the following to your Helm values configuration or specify the values using EKS add-on configuration.
The {aws} Distro for OpenTelemetry (ADOT) add-on has a Kubernetes Operator that uses webhooks. To run the operator on nodes in {aws} Cloud in a mixed mode cluster setup, add the following to your Helm values configuration or specify the values by using EKS add-on configuration.

[source,yaml,subs="verbatim,attributes"]
----
Expand All @@ -185,7 +219,7 @@ affinity:
- hybrid
----

If your pod CIDR is not routable on your on-premises network, configure the ADOT collector Custom Resource Definition (CRD) to run on your hybrid nodes so it can scrape the metrics from your hybrid nodes and the workloads running on them.
If your pod CIDR is not routable on your on-premises network, then the ADOT collector must run on hybrid nodes to scrape the metrics from your hybrid nodes and the workloads running on them. To do so, edit the Custom Resource Definition (CRD).

[source,bash,subs="verbatim,attributes"]
----
Expand Down
4 changes: 3 additions & 1 deletion vale/styles/config/vocabularies/EksDocsVocab/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,6 @@ ENIs?
IPs?
CSRs?
routable
ARN
ARN
CloudWatch
CoreDNS