Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server Attestation #5950

Open
kfox1111 opened this issue Mar 13, 2025 · 0 comments
Open

Server Attestation #5950

kfox1111 opened this issue Mar 13, 2025 · 0 comments
Assignees
Labels
triage/in-progress Issue triage is in progress

Comments

@kfox1111
Copy link
Contributor

kfox1111 commented Mar 13, 2025

In the agent, we have Node and Workload Attestors.

Today we kind of conflate two different things. The current TrustBundle used for the TrustDomain, and Server Attestation.

This creates a problem for sysadmins.

Sysamins don't generally want to maintain CAs. One of the great features of SPIRE over something like ACME, is that it manages very short lived CAs in addition to short lived Certificates. So its possible to make the argument to sysadmins, they dont have to worry about managing CAs, as SPIRE is just doing it. The CAs are just an implementation detail.

Machines go down unexpectedly. If they are to recover without a lot of painful effort from a sysadmin, today they must come up within the validity window of the TrustBundle at the time they went down. But conflating the lifecycle of the spire-server's certs and Server Attestation with the TrustDomain's CA's means the sysadmin has a tough choice to make:

A. Make the TrustDomain's CA TTL long enough to allow automatic recovery of machines that are down for a while (maybe 1 month+ ttl?) and now the sysadmin has to worry about all the negative affects of having a longish lived CA
B. Live with the Sword of Damocles over their heads that a power outage or other issue at the wrong time/place could make huge amounts of work for them.

But this doesn't have to be this way. If we had a way to provide for a new type of Attestor, ServerAttestor, we could decouple the ServerAttestation process from the Workload's Trust and allow things like longer lived CA's used for ServerAttestation, and still have very short lived CA's for workloads.

This would include making 2 changes to the SPIRE Agent.

  1. Add a mechanism for having the agent to bootstrap again if it connects to the spire-server and its x509 cert no longer matches the cached trust bundle
  2. Some kind of plugable mechanism to get a trust bundle for the spire-server, to attest/reattest and fetch the current workload trust bundle

A first stab at #1 is being done here: #5892

For the user, it may make sense for a new plugin type, say, ServerAttestor to be created for Server Attestation.
An initial exploratory stab at that was made here: spiffe/spire-plugin-sdk#58 and some discussion in this issue: #5881

Externalizing the attestor via pluins would allow multiple ways of attesting along with more advanced policy based plugins to be written, but to keep this policy out of the main spire-server so it can be easily extended by the end user.

There's levels of safety in the spire system when checking the spire-server's cert:

  1. server cert in the established trust bundle - very very unlikely there is a security issue
  2. bootstrapping - happens very infrequently, and often with a sysadmin involved in the process, so there is actively looking for funny business.
  3. rebootstrapping - happens potentially at any time. harder to protect against badness. Kind of up to each organization to decide the tradeoff between system unavailability and risk of recovery from a compromised server.

As an alternate to plugins, the mechanism could be outsourced via the existing trust_bundle_url mechanism if unix socket support was added and some additional metadata passed. That option is explored a bit here: #5932

This approach has some drawbacks:

  • it would have a separate lifecycle from the rest of the system. plugins fork off with the agent, so are a bit easier to manage that way
  • we have plugins for everything else. its odd to use a completely different mechanism for this type of extension. This will lead to user confusion
  • configuration for it would be completely different/not stored in spire-agent config like everything else
  • for policy based plugins, if there is a formal plugin api for ServerAttestors, it can call out to other ServerAttestor plugins to share code.
@amartinezfayo amartinezfayo self-assigned this Mar 18, 2025
@amartinezfayo amartinezfayo added the triage/in-progress Issue triage is in progress label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/in-progress Issue triage is in progress
Projects
None yet
Development

No branches or pull requests

2 participants