You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure if this is a bug or I'm doing something wrong here...
Our k8s v1.31 cluster setup if fairly simple: three worker nodes we run some services on, use Nginx Fabric as a reverse proxy with some routes set up for our services. aws-load-balancer-controllerv2.11.0 is installed using helm and configured as NLB in instance mode.
This all works, including cert-manager provisioning Let's Encrypt certs and external-dns taking care of our service FQDNs setting CNAME records to the AWS Load Balancer FQDN.
Nginx Fabric is installed with practically default settings (we enable Gateway API experimental features) and spins up a single pod.
The problem I'm having is that AWS Load Balancer's target groups include all three of the worker nodes' EC2 instances and only the one on which the nginx pod is actually running shows as healthy. The target groups run default checks, HTTP on path /healthz and port 30632.
Each EC2 instance has 2 private IPv4 addresses, both in the same subnet, e.g. 10.0.6.47 and 10.0.6.195. When I log into each node and run curl http://<private IP addr>:30632/healthz, I get the following results: when a node is connecting to its own IPs, both IPs work. When a node is connecting to another node, only one of the IPs will work (and it's the same IP for node X when nodes Y and Z are connecting). The "successful" IP is the one instance's private IP DNS name resolves to (e.g. ip-10-0-6-47.eu-west-1.compute.internal), on the network interface with index 0.
netstat on all three nodes shows kube-proxy listening on the 30632 port:
tcp6 0 0 :::30632 :::* LISTEN 2125/kube-proxy
There are no custom firewall rules on nodes, all AWS security groups are defaults.
This looks like target group health checks connect to one of the two private IPs, failing for some nodes, succeeding for others.
The text was updated successfully, but these errors were encountered:
And to reply to myself, it's because nginx fabric gateway (like nginx ingress controller) sets service.externalTrafficPolicy to Local in order to preserve source IPs, but as a consequence, requests are not routed to nodes not running nginx gateway/ingress pods.
Run nginx gateway with Cluster external traffic policy, making our lives easier, but masking source IPs. In this case, health checks are TCP checks performed on traffic ports.
Run nginx gateway with Local external traffic policy if we need to preserve source IPs. In that case, we would need to have aws load balancer only select nodes running nginx gateway pods for target groups. In this case, health checks are HTTP checks on the health check port and /healthz path.
Hi all,
I'm not sure if this is a bug or I'm doing something wrong here...
Our k8s
v1.31
cluster setup if fairly simple: three worker nodes we run some services on, use Nginx Fabric as a reverse proxy with some routes set up for our services.aws-load-balancer-controller
v2.11.0
is installed using helm and configured as NLB in instance mode.This all works, including
cert-manager
provisioning Let's Encrypt certs andexternal-dns
taking care of our service FQDNs setting CNAME records to the AWS Load Balancer FQDN.Nginx Fabric is installed with practically default settings (we enable Gateway API experimental features) and spins up a single pod.
The problem I'm having is that AWS Load Balancer's target groups include all three of the worker nodes' EC2 instances and only the one on which the nginx pod is actually running shows as healthy. The target groups run default checks, HTTP on path
/healthz
and port30632
.Each EC2 instance has 2 private IPv4 addresses, both in the same subnet, e.g.
10.0.6.47
and10.0.6.195
. When I log into each node and runcurl http://<private IP addr>:30632/healthz
, I get the following results: when a node is connecting to its own IPs, both IPs work. When a node is connecting to another node, only one of the IPs will work (and it's the same IP for node X when nodes Y and Z are connecting). The "successful" IP is the one instance's private IP DNS name resolves to (e.g.ip-10-0-6-47.eu-west-1.compute.internal
), on the network interface with index 0.netstat
on all three nodes showskube-proxy
listening on the30632
port:There are no custom firewall rules on nodes, all AWS security groups are defaults.
This looks like target group health checks connect to one of the two private IPs, failing for some nodes, succeeding for others.
The text was updated successfully, but these errors were encountered: