Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigating Let's Encrypt Rate Limiting Issues #174

Open
osterman opened this issue Jul 14, 2018 · 1 comment
Open

Mitigating Let's Encrypt Rate Limiting Issues #174

osterman opened this issue Jul 14, 2018 · 1 comment

Comments

@osterman
Copy link
Member

osterman commented Jul 14, 2018

what

We're concerned about LetsEncrypt rate limiting issues. It's fair enough to switch our staging environment over to using Lets Encrypt's staging env, but I'm concerned about this in production.

why

It basically means we could be blocked from changes to our infrastructure if let's encrypt rate limits us again. So we need a solution to that in some respect. Naively we could switch to using a wildcard cert. *.example.net and just make sure all of the servers use the dns name of server-123-123.example.net

@osterman
Copy link
Member Author

osterman commented Jul 14, 2018

There are a few options.

option 1

Use an ACM certificate provisioned with terraform and associated with the nginx-ingress.

https://github.com/cloudposse/terraform-aws-acm-request-certificate

Reference implementation here: https://github.com/cloudposse/terraform-root-modules/tree/master/aws/acm

Then set the ingress annotations to leverage this ACM certificate (e.g. SAN for *.ourapp.us-west-2.staging.example.net, ourapp.us-west-2.staging.example.net)

AWS Service annotations


These are passed to the Helm chart in the helmfile.yaml
https://github.com/cloudposse/geodesic/blob/master/rootfs/conf/kops/helmfile.yaml#L556-L557

option 2

Use a different operational domain for production to reduce sharing across stages. E.g. treat example.net as a staging domain and example.co as the production operations domain. This is what another one of our customers do. They incidentally use ACM certs as well, but only because we started this journey before kube-lego existed

other considerations

The likelihood of getting rate limited in production is small for a few reasons:

  1. Very few new services are launched
  2. Namespaces are seldom, if ever, destroyed
  3. certificates are still long-lived so requests to APIs are few and far between. They can be renewed earlier than the 90 day cut off and rate limits would have to be in effect for several days for it to utlimately fail or timeout.

The reason you're at elevated risk in staging is due to the large number of publically exposed services as a result of running "unlimited staging environments". By moving staging to the staging domain of Let's Encrypt, the risks of inducingn rate limits in production. By using an entirely separate domain in production, the impact is even further mitigated.

@osterman osterman self-assigned this Jul 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant