Skip to content

Conversation

@weshinsley
Copy link

@weshinsley weshinsley commented Oct 23, 2025

NOT FOR MERGING - needs to go in montagu-system
This isn't tested in the wild (ie UAT) yet - but hopefully is close. It can wait for when I'm back!

@codecov
Copy link

codecov bot commented Oct 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.44%. Comparing base (6f19418) to head (cc6a1c4).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #22      +/-   ##
==========================================
+ Coverage   97.10%   99.44%   +2.34%     
==========================================
  Files           5        4       -1     
  Lines         414      362      -52     
  Branches       52       44       -8     
==========================================
- Hits          402      360      -42     
+ Misses          5        0       -5     
+ Partials        7        2       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@weshinsley weshinsley marked this pull request as ready for review October 26, 2025 16:45
Copy link
Contributor

@EmmaLRussell EmmaLRussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, before we were using some sort of acme setup, but all through the proxy container?

Are there any implications for switching an existing deployment from the previous set up to the new one?

We'll need to update https://github.com/vimc/montagu-config for UAT, Science and Prod configurations. Currently UAT and Science are static certs, but presumably the idea is to get those set up with acme buddy too if static certs are no longer supported?

Let's talk about this when you get back from holiday!

obj.stop(kill=True, remove_volumes=True)


def test_proxy_configured_ssl():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to modify this test for the new config rather than just deleting it.

Copy link
Author

@weshinsley weshinsley Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's anything left to test in this function - I don't think acme-buddy container is actually going to get as far as producing a cert to populate /etc/montagu/proxy with during tests. (is it?)

Comment on lines +174 to +176
env_dict = dict(e.split("=", 1) for e in env)
assert "hdb-us3r" in env_dict["HDB_ACME_USERNAME"]
assert "hdb-p@assword" in env_dict["HDB_ACME_PASSWORD"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're now just testing that the env vars have been correctly set in the container from the vault, rather than that we have a valid cert. Can we really not do more substantial tests here any more?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I've assumed that once acme-buddy is up and running (and we know it is if it gets this far), then the other behaviour of fetching/renewing the lets-encrypt certs is covered by the tests on acme-buddy itself, and I haven't tested them again here.

I think it would also be a bit tricky with the DNS work that acme-buddy does...?

Copy link
Contributor

@plietar plietar Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah getting integration tests working with DNS is a bit of a pain. You need a fake ACME service and a fake DNS service. See https://github.com/reside-ic/acme-buddy/tree/main/e2e for how I did it.

What we could do here is run just the fake ACME service (pebble) and tell it to skip all validation, see https://github.com/letsencrypt/pebble?tab=readme-ov-file#skipping-validation.
It would still test the interaction between montagu-deploy, acme-buddy and the proxy but it would not test the DNS provider implementation of acme-buddy. I think that is fine since the acme-buddy integration tests cover it.

You still need to give acme-buddy a dns provider to contact, even if we don't run anything. Simplest might be to add a "null" DNS provider implementation in acme-buddy, which would do nothing and of course never work on a production ACME server, but it would work with a pebble service that has disabled its validation.

Copy link
Contributor

@plietar plietar Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting acme-buddy to talk to pebble is little tricky because it will reject pebble's self-signed cert. You need a way to work around this. The easiest way would be to disable TLS verification (ie. the equivalent of the no_verify_ssl=true) that was used in the old test.

I never added support for that in acme-buddy but it should be feasible. Replicate https://github.com/go-acme/lego/blob/master/cmd/setup.go#L65-L72 into acme-buddy.

In acme-buddy's own tests I extracted the TLS root cert from pebble and injected it into the acme-buddy container. It's a nice approach because it doesn't require any support from the acme-buddy binary itself, but it is a bit tedious to setup in the integration test: https://github.com/reside-ic/acme-buddy/blob/9c8c42a1f87e4d955a5a6ae19d79cd0d61f97a57/e2e/e2e_test.go

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to modify acme-buddy to have a special "self-signed" mode. In this case it would not contact an ACME server at all but would instead just generate sign its own certificates. It makes testing here much easier since we wouldn't have to run pebble at all.

Even though this diverges from the production setup, acme-buddy's externally visible behaviour remains fairly identical.

I guess I've assumed that once acme-buddy is up and running (and we know it is if it gets this far), then the other behaviour of fetching/renewing the lets-encrypt certs is covered by the tests on acme-buddy itself, and I haven't tested them again here.

I think the integration between acme-buddy and the proxy (eg. making sure the certificates produced by acme-buddy are picked up by nginx) is useful to test.

@weshinsley
Copy link
Author

weshinsley commented Oct 27, 2025

So before this PR, we are using a method that can only get Lets Encrypt certs for public facing websites using the web-based acme-challenge method, where we can temporarily put a secret somewhere that the Lets Encrypt servers can see. Hence that works (and is live) on production, but wouldn't be the solution for UAT or Science which aren't public facing, so can't host that secret themselves.

acme-buddy supercedes the above method and can handle public and private servers in exactly the same way, by talking to IC's hardware database (? HDB) server, and effectively setting DNS tags, which LetsEncrypt can view from the outside, even if it can't get to port 443 on the machine for some web content. And in general acme-buddy is the more polished solution, which we use everywhere else; montagu-deploy and packit-deploy are the last deployments requring certs that don't yet use it.

There is a PR on montagu-config to go with this one - but yes, this can wait til next week!
vimc/montagu-config#34

@plietar
Copy link
Contributor

plietar commented Oct 27, 2025

The montagu-deploy repo is retired and these changes should go to montagu-system. It is on the TODO list to archive this repo.

@plietar
Copy link
Contributor

plietar commented Oct 27, 2025

HDB will only work for UAT and Science. For Production we need to use Cloudflare as the DNS provider.

acme-buddy already supports this. Use --dns-provider=cloudflare and then see the lego docs for the relevant environment variables. The different variations on keys and tokens is a bit confusing, I'll try to see if I can find what I had used when developing acme-buddy.

The YAML syntax we expose will need to support both. Rather than hardcode hdb and cloudflare into the tool, we are better off keeping these flexible and just passing the values to acme-buddy. Something like:

acme_buddy:
  [ ... ]
  dns_provider: hdb
  env:
    HDB_ACME_USERNAME: VAULT:secret/certbot-hdb/credentials:username
    HDB_ACME_PASSWORD: VAULT:secret/certbot-hdb/credentials:password

and

acme_buddy:
  [ ... ]
  dns_provider: cloudflare
  env:
    CLOUDFLARE_DNS_API_TOKEN: VAULT:secret/montagu/something/something

return constellation.ConstellationContainer(name, cfg.fake_smtp_ref, ports=[1025, 1080])


def acme_buddy_container(cfg, proxy):
Copy link
Contributor

@plietar plietar Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point I think we've defined this enough times that it is worth refactoring it into constellation itself? We have a good idea now of all the patterns and how how we want to use it?
It will ensure all of our deploy projects expose it in a consistent way.

@EmmaLRussell
Copy link
Contributor

The montagu-deploy repo is retired and these changes should go to montagu-system. It is on the TODO list to archive this repo.

Yeah, sorry @weshinsley I forgot we'd already moved montagu-deploy into montagu-system!

@weshinsley
Copy link
Author

The montagu-deploy repo is retired and these changes should go to montagu-system. It is on the TODO list to archive this repo.

Yeah, sorry @weshinsley I forgot we'd already moved montagu-deploy into montagu-system!

Bums. Never mind - looks like a bit of a rework next week!

@weshinsley
Copy link
Author

See vimc/montagu-system#17

@weshinsley weshinsley closed this Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants