Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External-DNS in GKE fails to insert Ingress A records when DNS provider is AWS Route 53 since helm chart version 6.28.2 #4707

Open
edison-vflow opened this issue Aug 27, 2024 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@edison-vflow
Copy link

What happened:

We have a GKE cluster running with ExternalDNS chart version 6.23.3
We use AWS Route 53 as DNS provider.

When updating ExternalDNS to latest chart version 8.3.5, there are errors in the ExternalDNS pods

external-dns {"level":"info","msg":"Desired change: CREATE realtime.cluster-prefix.company-domain.com A [Id: /hostedzone/*******]","time":"2024-08-23T23:42:52Z"}
external-dns {"level":"info","msg":"Desired change: CREATE realtime.cluster-prefix.company-domain.com TXT [Id: /hostedzone/*******]","time":"2024-08-23T23:42:52Z"}
external-dns {"level":"error","msg":"Failure in zone company-domain.com. [Id: /hostedzone/*******] when submitting change batch: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone, Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone, status code: 400, request id: 101e36ce-821e-45f5-9ead-a7ab6d0ea373","time":"2024-08-23T23:42:52Z"}
external-dns {"level":"error","msg":"Failed submitting change (error: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 7ad567af-abee-4894-905d-4770f5a708be), it will be retried in a separate change batch in the next iteration","time":"2024-08-23T23:42:53Z"}
external-dns {"level":"error","msg":"Failed submitting change (error: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 5bb64b3b-ed06-4911-8a5e-ec4c006d35c9), it will be retried in a separate change batch in the next iteration","time":"2024-08-23T23:42:53Z"}
external-dns {"level":"error","msg":"Failed to do run once: soft error\nfailed to submit all changes for the following zones: [/hostedzone/*******]","time":"2024-08-23T23:42:57Z"}

From Investigations carried, I can confirm that the issue starts from helm chart version 6.28.2 onwards.
So from 6.23.3 to 6.28.1, ExternalDNS is able to add all GKE ingress records to AWS Route 53 as A records correctly

From 6.28.2 to 8.3.5 ExternalDNS fails with the above mentioned error.
The error shows us that from version 6.28.2, ExternalDNS is interpreting the GKE loadbalancer IP address as a domain name.
It is then trying to add a DNS A record into Route53 with Alias=Yes
However, this is not correct and it will fail because the IP address of the GKE loadbalancer is not a domain that is within the hosted zone
This attempt to add the IP address of the GKE loadbalancer as a domain in the same hosted zone would not have occurred if ExternalDNS treated the GKE loadbalancer as an A record without an Alias

What you expected to happen:

For the versions that work, we can observe that ExternalDNS is able to correctly determine that the GKE ingress entries must be inserted into Route53 as A records with Alias = NO

i.e since the GKE loadbalancer Route53 is pointing to is an IP address, it should be pointed to directly and not as an alias

image

How to reproduce it (as minimally and precisely as possible):

Happy path

  • Create a GKE cluster
  • Use Nginx as Ingress
  • Annotate the Ingress objects with ExternalDNS annotations
  • Use helm chart with any version between 6.23.3 to 6.28.1 to deploy ExternalDNS
  • In the ExternalDNS helm chart, configure AWS Route53 as your DNS provider (do all the necessary setup according to documentation on how to use aws as a provider)
  • Deploy the helm chart to your GKE cluster
  • Observe that for 6.23.3 to 6.28.1, ExternalDNS is able to add all GKE ingress records to AWS Route 53 as A records
  • Undeploy the helm chart
  • Delete the Route53 records inserted by ExternalDNS

Breaking path

  • Create a GKE cluster (Can use same cluster as happy path)
  • Use Nginx as Ingress (Can use same cluster and settings as happy path)
  • Annotate the Ingress objects with ExternalDNS annotations (Can use same settings as happy path)
  • Use helm chart with any version between 6.28.2 to 8.3.5 to deploy ExternalDNS
  • In the ExternalDNS helm chart, configure AWS Route53 as your DNS provider (do all the necessary setup according to documentation on how to use aws as a provider)
  • Deploy the helm chart to your GKE cluster
  • Observe the error
external-dns {"level":"error","msg":"Failed submitting change (error: InvalidChangeBatch: [Tried to create an alias that targets 74.17.111.38., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 5bb64b3b-ed06-4911-8a5e-ec4c006d35c9), it will be retried in a separate change batch in the next 

Anything else we need to know?:

Environment: GKE , Kubernetes version 1.30

  • External-DNS version (use external-dns --version):
    • Breaking versions -> 6.28.2 to 8.3.5
  • DNS provider: AWS Route 53
  • Others:
@edison-vflow edison-vflow added the kind/bug Categorizes issue or PR as related to a bug. label Aug 27, 2024
@xavidop
Copy link

xavidop commented Aug 28, 2024

hi, we are seeing the same issue

@stephanpelikan
Copy link

Me too, using Helm chart 8.3.5:

Failure in zone my-hosted-zone.my-company.com. [Id: /hostedzone/*******] when submitting change batch: InvalidChangeBatch: [Tried to create an alias that targets k8s-wordpres-wpdemowo-******.eu-central-1.elb.amazonaws.com., type A in zone *******, but the alias target name does not lie within the target zone]\n\tstatus code: 400, request id: 6d3b6622-8246-4a79-a338-da642d4158db

@edison-vflow : That you for figuring out that old charts work.

Using an old version will be OK for now (I will see) but it would be great to use the most recent version.

@leonardocaylent
Copy link
Contributor

@edison-vflow Can you test using these versions and share the results with us?
7.0.1
7.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

5 participants
@xavidop @stephanpelikan @edison-vflow @leonardocaylent and others