From 4db63ea2240f2eca0b1acc8a13f17342eac49eba Mon Sep 17 00:00:00 2001 From: Paul O'Fallon Date: Sat, 30 May 2026 13:13:41 +0000 Subject: [PATCH 1/2] docs(spec): add 002 laptop-push-secrets; mark 001 superseded MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After end-to-end testing of remo's 005-credential-broker (PR #32) on 2026-05-29, the bootstrap-token-on-instance design implemented in this repo as v0.1.0 was identified as carrying a residual on-disk credential that contradicts the supply-chain threat model the broker was built to defend. Closed PR #32 in remo; superseding 001 here with 002. The new design strips the entire external-backend integration: - delete src/backend.rs (114 LOC) and src/bootstrap.rs (819 LOC) - drop fnox-core dependency; this cascades to delete Cross.toml, empty deny.toml's [advisories].ignore (all 6 entries reach via fnox-core → AWS SDK), and shrink the binary from ~32 MiB toward the original 15 MiB NFR target - replace BackendSession with InMemorySecretStore populated from /var/lib/remo-broker/secrets.enc (age ciphertext), decrypted at startup using a key from \$CREDENTIALS_DIRECTORY/secrets-key (systemd LoadCredentialEncrypted=) - new admin ops: push-creds, clear-creds, get-public-key - remove rotate-bootstrap and BootstrapMode; wire protocol bumps to v2 per docs/wire-protocol.md §4 (removing ops/fields = breaking) - ship schema/remo-broker.v2.json as release artifact ~80% of the daemon chassis carries forward unchanged (proto framing, manifest, registry, audit, cache, server lifecycle, systemd hardening). Cross-repo: paired with remo spec 006-credential-broker-laptop-push (landed on remo via separate PR). Co-Authored-By: Claude Opus 4.7 --- specs/001-broker-daemon/spec.md | 2 +- specs/002-laptop-push-secrets/spec.md | 132 ++++++++++++++++++++++++++ 2 files changed, 133 insertions(+), 1 deletion(-) create mode 100644 specs/002-laptop-push-secrets/spec.md diff --git a/specs/001-broker-daemon/spec.md b/specs/001-broker-daemon/spec.md index 1848dbe..8a73827 100644 --- a/specs/001-broker-daemon/spec.md +++ b/specs/001-broker-daemon/spec.md @@ -2,7 +2,7 @@ **Feature Branch**: `001-broker-daemon` **Created**: 2026-05-24 -**Status**: In Progress +**Status**: Superseded as of 2026-05-30 by [`002-laptop-push-secrets`](../002-laptop-push-secrets/spec.md). The external-backend / on-instance bootstrap-token model this spec describes was implemented through v0.1.0 but turned out to carry a residual on-disk credential (the bootstrap token itself) that contradicts the supply-chain threat model. The redesign drops `fnox-core`, the backend integration, and the bootstrap-token concept; the laptop now pushes an age-encrypted secrets bundle to the instance and the broker decrypts in memory via systemd-credentials. Kept intact as historical reference. **Last Updated**: 2026-05-24 (commit `beead1f`; 130 tests passing, clippy + cargo deny + systemd-analyze + NFR measurements + soak smoke + killtest smoke + bench compile all green) **Input**: User description: "A long-lived Rust daemon for Linux instances that holds a per-instance bootstrap token, authenticates upward to a credential backend (1Password / Vault / AWS Secrets Manager / age / OS keychain via the fnox-core library), and serves per-project Unix sockets enforcing per-project allowlists. Built to be the on-instance half of Remo's credential-broker feature (see Remo `005-credential-broker/spec.md`)." diff --git a/specs/002-laptop-push-secrets/spec.md b/specs/002-laptop-push-secrets/spec.md new file mode 100644 index 0000000..fd872be --- /dev/null +++ b/specs/002-laptop-push-secrets/spec.md @@ -0,0 +1,132 @@ +# Feature Specification: Laptop-Push Secrets Daemon + +**Feature Branch**: `002-laptop-push-secrets` +**Created**: 2026-05-30 +**Status**: Draft +**Supersedes**: [`001-broker-daemon`](../001-broker-daemon/) (external-backend / bootstrap-token model) +**Cross-repo dependency**: [`remo` spec 006](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) (the laptop CLI half) + +**Input**: Redesign the `remo-broker` daemon for a model where the developer's laptop pushes an encrypted, age-bundled set of project secrets to the instance at create time (and on subsequent `remo push-creds` calls), the daemon decrypts the bundle in memory using a systemd-credentials-sourced key (TPM2 → host-key → plaintext-mode-0600 fallback ladder), and serves cleartext secrets to devcontainers via the existing per-project Unix socket protocol. No external secret backend, no on-disk bootstrap token, no `fnox-core` dependency, no AWS-SM / Vault / 1Password integration. + +## Why the redesign + +`001-broker-daemon` was built around an external secret backend (1P / Vault / AWS-SM via `fnox-core`) and a per-instance bootstrap token on disk at `/etc/remo-broker/bootstrap-token` that the daemon used to fetch on demand. End-to-end testing on remo on 2026-05-29 surfaced that this design carries a residual on-disk credential (the bootstrap token) that contradicts the supply-chain threat model the broker was built to defend — and that the operational complexity of running a backend is unnecessary for the actual audience. + +See [`remo:specs/006-credential-broker-laptop-push/spec.md`](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) for the full motivation, threat model, and laptop-side requirements. + +## What this changes in the daemon + +The wire protocol bumps to **v2** per the additive-only-within-major rule in [`docs/wire-protocol.md` §4](../../docs/wire-protocol.md): removing the `rotate-bootstrap` admin op and the `bootstrap_mode` field from `StatusResponse` are breaking. A new artifact `schema/remo-broker.v2.json` ships alongside the v0.2.0 release. + +## Requirements + +### Functional + +| ID | Requirement | +|---|---| +| FR-001 | The daemon reads its encrypted secrets bundle from `/var/lib/remo-broker/secrets.enc` (under `StateDirectory=remo-broker`, owned by the service user, mode 0600) at startup. | +| FR-002 | The decryption key is loaded via systemd's `LoadCredentialEncrypted=secrets-key:`, exposed to the daemon at `$CREDENTIALS_DIRECTORY/secrets-key`. The daemon never reads the key from any other location. | +| FR-003 | The encryption primitive is `age` (X25519 + ChaCha20-Poly1305). The on-disk file is a standard age ciphertext encrypted to the instance's age public recipient. | +| FR-004 | The plaintext, once decrypted, is a TOML map of `{ secret_name = "value" }` (string values only; binary out of scope for v1). The map is held in memory as `Arc>>` (zeroize-on-drop via the existing `secrecy` crate). | +| FR-005 | If the encrypted bundle is absent at startup, the daemon binds its sockets and runs in a "no-secrets" mode; every `get` returns a `not_found` outcome. If the key is absent, the daemon refuses to start (hard error). | +| FR-006 | The admin socket exposes a new operation `push-creds` (NDJSON). Request: `{ "op": "push-creds", "ciphertext_b64": "" }`. Response: `{ "ok": true, "loaded_at": "", "secret_count": N }` or an `ErrorResponse` with code `decrypt_failed` / `invalid_payload`. | +| FR-007 | `push-creds` writes the ciphertext to `secrets.enc.tmp`, calls `fsync`, then `rename`s atomically over `secrets.enc`. After the on-disk swap, the in-memory `ArcSwap` is replaced atomically. In-flight `get` requests complete against whichever snapshot they loaded. | +| FR-008 | The admin socket exposes a new operation `clear-creds`. Request: `{ "op": "clear-creds" }`. Response: `{ "ok": true }`. Effect: in-memory map is replaced with an empty map; `secrets.enc` is zeroized on disk (overwritten with zeros, then unlinked). | +| FR-009 | The admin socket exposes a new operation `get-public-key`. Request: `{ "op": "get-public-key" }`. Response: `{ "ok": true, "recipient": "age1..." }`. Effect: returns the instance's age public recipient so the laptop can encrypt to it. | +| FR-010 | The admin operation `rotate-bootstrap` (and its `BootstrapMode` companion type, and the `bootstrap_mode` field in `StatusResponse`) is removed. The `StatusResponse` gains `{ secrets_loaded_at: >, secret_count: , decryption_key_source: <"tpm2" | "host-key" | "plaintext"> }`. | +| FR-011 | A new audit event `AuditEvent::SecretsPushed { timestamp, secret_count, source: "push-creds" }` is emitted on successful `push-creds`. A new event `AuditEvent::SecretsCleared` on successful `clear-creds`. Values are not written; only counts. | +| FR-012 | The per-project socket protocol (`get` / `ping` / `info`), per-project manifest enforcement, per-project bounded cache, and audit log format from 001 carry forward unchanged except that `Outcome::BackendError` and `Outcome::BackendUnreachable` become unreachable in practice (kept in the enum to avoid wire-protocol churn for downstream parsers). | +| FR-013 | The daemon advertises `PROTOCOL_VERSION = 2` in admin `status` and project `ping` responses. | +| FR-014 | Wire schema `schema/remo-broker.v2.json` is generated by an extended `schema-gen` Cargo feature and published as a release artifact alongside the binaries. | + +### Non-functional + +| ID | Requirement | +|---|---| +| NFR-001 | Stripped Linux binary ≤ 15 MiB (the original 001 NFR target, missed at v0.1.0 due to `fnox-core` transitive deps including `hidapi`, `libudev`, AWS SDK, hyper/rustls/webpki). | +| NFR-002 | `Cross.toml` is deleted. `cargo build --target aarch64-unknown-linux-gnu` succeeds against the standard cross-rs image with no pre-build hooks. | +| NFR-003 | `deny.toml`'s `[advisories].ignore` list (currently 6 entries: RUSTSEC-2024-0375 atty, RUSTSEC-2023-0071 rsa Marvin, RUSTSEC-2025-0134 rustls-pemfile, RUSTSEC-2026-0098/-0099/-0104 webpki) is empty after the redesign; all 6 reach the broker via `fnox-core → AWS SDK`. | +| NFR-004 | `push-creds` admin op completes (decrypt + atomic-swap + audit) in < 50ms for a 10 KiB ciphertext, measured on a stock Debian 13 LXC. | +| NFR-005 | `get` request latency from a per-project socket connection is unchanged from 001 (sub-millisecond p99 against the in-memory store, since no backend roundtrip is involved). | +| NFR-006 | All other 001 NFRs (FR-023 systemd hardening profile, FR-022 graceful shutdown drain, FR-018 audit-log degraded buffer) carry forward unchanged. | + +## What carries forward from 001 (the chassis) + +Source files that stay essentially as-is (renames + import updates only): + +- `src/proto/mod.rs` (NDJSON framing, 64 KiB cap, smoke-fuzz tests) +- `src/proto/project.rs` (`ProjectRequest::{Get, Ping, Info}`, `GetResponse`, `ProjectErrorCode`) +- `src/manifest.rs` (TOML parser, validators, manifest discovery, `MANIFEST_CANDIDATES`) +- `src/registry.rs` (`ProjectRegistry`, `Project`, per-project socket bind, atomic reload) +- `src/audit.rs` (NDJSON append-only writer, bounded channel + degraded buffer) +- `src/cache.rs` (`BoundedCache`, `SecretString`, zeroize semantics — kept for derived/decrypted values) +- `src/server.rs` core lifecycle (admin socket bind, accept loop, sigterm handling, `JoinSet` drain) — ~80% of the 1676 lines +- `src/config.rs` (after `BootstrapSource` / `BOOTSTRAP_ENV_VAR` / `backend_fetch_timeout` / `fnox_config_path` are removed) +- `packaging/systemd/remo-broker.service` (rename `LoadCredentialEncrypted=bootstrap-token` to `secrets-key`; everything else holds) +- `packaging/sysusers.d/remo-broker.conf` + `tmpfiles.d/remo-broker.conf` (verbatim) +- `schema/remo-broker.v1.json` (this is the **manifest** schema, not wire — unaffected) + +## What gets ripped out + +Source files deleted entirely: + +- `src/backend.rs` (114 LOC) — only call site for `fnox_core::*` +- `src/bootstrap.rs` (819 LOC, including hand-rolled IMDSv2 HTTP/1.1 client + ~500 LOC of mock tests) + +Code surgically excised from kept files: + +- `src/main.rs`: `--bootstrap-source`, `--bootstrap-token-path`, `--fnox-config`, `--backend-fetch-timeout-ms` CLI flags; `fetch_token` startup validation; `BackendSession::open`/`discover` branch +- `src/config.rs`: `BootstrapSource` enum, `BootstrapSourceKind`, `BOOTSTRAP_ENV_VAR`, `DEFAULT_BOOTSTRAP_TOKEN_PATH`, `DEFAULT_BACKEND_FETCH_TIMEOUT_MS`, related `Overrides` / `RawConfig` fields, `ConfigError::BackendTimeoutZero`, 8 unit tests +- `src/server.rs`: `dispatch_rotate_bootstrap`, `bootstrap_mode()` helper, `AdminRequest::RotateBootstrap` arm, `backend: Option` field + clone in fallback `Server`, `BackendSession` and `fetch_token` imports, 3 unit tests +- `src/proto/admin.rs`: `AdminRequest::RotateBootstrap`, `RotateBootstrapResponse`, `BackendAuthState`, `BootstrapMode`, `AdminErrorCode::BootstrapError`, `StatusResponse.bootstrap_mode`, 3 unit tests + +Build / dependency artifacts: + +- `Cross.toml` — delete entire file (libudev is gone with `fnox-core`) +- `Cargo.toml`: remove `fnox-core = "1.25"`; keep `secrecy` (still used by `cache.rs`); add `age` (~v0.10) for ciphertext handling +- `deny.toml` lines 10-34: delete the entire `[advisories].ignore` array +- `.github/workflows/release.yml` lines 49-58: drop the "Install native libudev (x86_64 fast path)" step; `cross` install can stay or be replaced with bare cargo + linker +- `.github/workflows/ci.yml` line 33-34: drop libudev install step + +Examples / benches (logic carries; constructor changes): + +- `examples/soak.rs`, `examples/killtest.rs`, `benches/latency.rs` — rewrite the harness to construct an `InMemorySecretStore` instead of a `BackendSession` + +## What's new + +- `src/store.rs` — new module. `InMemorySecretStore { inner: Arc>> }`. Constructed from a decrypted plaintext map; supports `get(name) -> Option` and `swap(new_map)`. +- `src/crypto.rs` — new module. age decrypt of `secrets.enc` ciphertext using the identity loaded from `$CREDENTIALS_DIRECTORY/secrets-key`. age encrypt is NOT needed in the daemon (only the laptop encrypts). +- Admin ops `push-creds`, `clear-creds`, `get-public-key` — new variants in `AdminRequest`, new response types, dispatch logic in `src/server.rs`. +- `AuditEvent::SecretsPushed` and `AuditEvent::SecretsCleared` — new variants in `src/audit.rs`. +- `schema/remo-broker.v2.json` — extended `schema-gen` feature emits the wire schema (currently only manifest is schema'd). The schema describes the v2 admin + project protocol. + +## Cross-cutting decisions (mirrored from remo spec 006) + +1. **`age` for encryption** — audited, multi-recipient native, mature Rust crate +2. **Decryption-key fallback ladder** — TPM2 → host-key → plaintext-mode-0600. The Ansible role on the remo side decides which tier; the daemon just reads whatever ends up at `$CREDENTIALS_DIRECTORY/secrets-key`. The chosen tier is surfaced in `StatusResponse.decryption_key_source` so operators can audit posture. +3. **Wire protocol v2** with published `schema/remo-broker.v2.json` +4. **No backward-compat shims with v0.1** — clean break; documentation flags the migration path for any existing user (which is essentially "wipe `/etc/remo-broker/`, install v0.2, re-push from laptop") + +## Sequencing + +| Day | Work | +|---|---| +| 1 | Delete `backend.rs`, `bootstrap.rs`, related config + tests. Verify `cargo build` clean without `fnox-core`. | +| 2 | Simplify `Cross.toml` (delete), `release.yml`, `ci.yml`, `deny.toml`. Verify cross-builds + `cargo deny check` green. | +| 3-4 | Implement `src/store.rs` + `src/crypto.rs` + systemd-credentials loading wiring. Unit tests for decrypt + atomic swap. | +| 5-6 | Implement `push-creds`, `clear-creds`, `get-public-key` admin ops + `AuditEvent::SecretsPushed/Cleared` variants. Update `dispatch_get` cache-miss path to hit `InMemorySecretStore`. Tests. | +| 7 | Rewrite `examples/soak.rs`, `examples/killtest.rs`, `benches/latency.rs` against new constructor. Update `StatusResponse` shape + downstream tests. Generate + commit `schema/remo-broker.v2.json`. | + +Total: ~7 days focused work. + +## What happens to 001 artifacts + +- **`specs/001-broker-daemon/`** stays intact as historical reference. A "Status" header note is added marking it superseded by this spec. +- **`v0.1.0` release** stays published; `v0.2.0` will supersede on the remo side via `BROKER_PINNED_VERSION` bump. +- **`docs/wire-protocol.md`** rewritten as part of the implementation (removing `rotate-bootstrap` section, adding `push-creds` / `clear-creds` / `get-public-key`, documenting v2 schema). +- **`README.md`, `REMO_HANDOFF.md`, `docs/binary-size.md`, `CONTRIBUTING.md`** rewritten as part of the implementation. + +## See also + +- [remo spec 006](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) — the laptop CLI half +- [001-broker-daemon spec](../001-broker-daemon/spec.md) — the superseded design From d0fcba54161b5adfad2b22c2f1ed396ee3c189b5 Mon Sep 17 00:00:00 2001 From: Paul O'Fallon Date: Sun, 31 May 2026 00:53:42 +0000 Subject: [PATCH 2/2] docs(spec): pivot 002 from laptop-push to sidecar-push (in-memory only) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mirrors remo's 006 second pivot. The first 002 draft (2026-05-30, laptop pushes age-encrypted blob; daemon decrypts at startup from /var/lib/remo-broker/secrets.enc via systemd LoadCredentialEncrypted=) is replaced by a much simpler model: the daemon is purely in-memory. A sidecar devcontainer on the same LXC pushes plaintext to the broker's admin socket whenever its fnox storage changes. Push is over a local Unix socket — no network in transit, no encryption needed, no pubkey trust. On broker restart, in-memory store is empty; sidecar re-pushes as part of its own startup. What disappears compared to the first 002 draft: - On-disk secrets.enc blob (no persistence in the broker) - LoadCredentialEncrypted= block in the systemd unit (broker loads no credential from systemd at startup) - age decrypt in the daemon (no encryption anywhere in the broker) - get-public-key admin op (no pubkey because no encryption) - Atomic write-to-tmp + fsync + rename dance (no on-disk blob) - The src/crypto.rs module entirely What remains: the same chassis described in the first 002 draft (~80% of v0.1.0 carries forward), plus: - src/store.rs — simple Arc>> - push-creds + clear-creds admin ops (plaintext input now) - AuditEvent::SecretsPushed / SecretsCleared - StatusResponse v2 (drops decryption_key_source — no key to source) - MAX_MESSAGE_BYTES raised to 1 MiB for push-creds (typical payload of ~10 secrets exceeds the v0.1.0 64 KiB cap) Wire protocol still bumps to v2 (removing rotate-bootstrap and bootstrap_mode are breaking per docs/wire-protocol.md §4). Estimate down from ~7 days to ~5 days focused work. Also updates specs/001-broker-daemon/spec.md status line to reflect both pivots. Co-Authored-By: Claude Opus 4.7 --- specs/001-broker-daemon/spec.md | 2 +- specs/002-laptop-push-secrets/spec.md | 141 +++++++++++++++----------- 2 files changed, 81 insertions(+), 62 deletions(-) diff --git a/specs/001-broker-daemon/spec.md b/specs/001-broker-daemon/spec.md index 8a73827..c17102a 100644 --- a/specs/001-broker-daemon/spec.md +++ b/specs/001-broker-daemon/spec.md @@ -2,7 +2,7 @@ **Feature Branch**: `001-broker-daemon` **Created**: 2026-05-24 -**Status**: Superseded as of 2026-05-30 by [`002-laptop-push-secrets`](../002-laptop-push-secrets/spec.md). The external-backend / on-instance bootstrap-token model this spec describes was implemented through v0.1.0 but turned out to carry a residual on-disk credential (the bootstrap token itself) that contradicts the supply-chain threat model. The redesign drops `fnox-core`, the backend integration, and the bootstrap-token concept; the laptop now pushes an age-encrypted secrets bundle to the instance and the broker decrypts in memory via systemd-credentials. Kept intact as historical reference. +**Status**: Superseded as of 2026-05-30 by [`002-laptop-push-secrets`](../002-laptop-push-secrets/spec.md), which was itself pivoted on 2026-05-31 to a sidecar-devcontainer-push model. The 001 design (external backend via `fnox-core`, on-instance bootstrap token, broker fetches on demand) was implemented through v0.1.0 but turned out to carry a residual on-disk credential (the bootstrap token itself) that contradicts the supply-chain threat model. v0.2.0 drops `fnox-core`, the backend integration, the bootstrap-token concept, and all encryption from the daemon itself: the broker becomes a purely in-memory secrets server populated by `push-creds` admin calls from a sidecar devcontainer on the same LXC. Kept intact as historical reference. **Last Updated**: 2026-05-24 (commit `beead1f`; 130 tests passing, clippy + cargo deny + systemd-analyze + NFR measurements + soak smoke + killtest smoke + bench compile all green) **Input**: User description: "A long-lived Rust daemon for Linux instances that holds a per-instance bootstrap token, authenticates upward to a credential backend (1Password / Vault / AWS Secrets Manager / age / OS keychain via the fnox-core library), and serves per-project Unix sockets enforcing per-project allowlists. Built to be the on-instance half of Remo's credential-broker feature (see Remo `005-credential-broker/spec.md`)." diff --git a/specs/002-laptop-push-secrets/spec.md b/specs/002-laptop-push-secrets/spec.md index fd872be..685e552 100644 --- a/specs/002-laptop-push-secrets/spec.md +++ b/specs/002-laptop-push-secrets/spec.md @@ -1,18 +1,23 @@ -# Feature Specification: Laptop-Push Secrets Daemon +# Feature Specification: In-Memory Secrets Daemon (Sidecar-Push Model) -**Feature Branch**: `002-laptop-push-secrets` -**Created**: 2026-05-30 +**Feature Branch**: `002-laptop-push-secrets` *(branch name retained for PR continuity; the model has since been simplified — see [§Why the design pivoted twice](#why-the-design-pivoted-twice))* +**Created**: 2026-05-30 (laptop-push model) +**Pivoted**: 2026-05-31 (sidecar-push model) **Status**: Draft **Supersedes**: [`001-broker-daemon`](../001-broker-daemon/) (external-backend / bootstrap-token model) -**Cross-repo dependency**: [`remo` spec 006](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) (the laptop CLI half) +**Cross-repo dependency**: [`remo` spec 006](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) (the laptop + sidecar half) -**Input**: Redesign the `remo-broker` daemon for a model where the developer's laptop pushes an encrypted, age-bundled set of project secrets to the instance at create time (and on subsequent `remo push-creds` calls), the daemon decrypts the bundle in memory using a systemd-credentials-sourced key (TPM2 → host-key → plaintext-mode-0600 fallback ladder), and serves cleartext secrets to devcontainers via the existing per-project Unix socket protocol. No external secret backend, no on-disk bootstrap token, no `fnox-core` dependency, no AWS-SM / Vault / 1Password integration. +**Input**: Redesign the `remo-broker` daemon as a purely in-memory secrets server. Project devcontainers fetch via the existing per-project Unix sockets with manifest allowlist enforcement (unchanged from v0.1.0). The secrets are pushed into the broker's memory by a *sidecar devcontainer* running on the same LXC instance, over a local Unix admin socket, as plaintext (no encryption-in-transit because there is no network in transit). The broker has no persistent storage of secrets, no external backend integration, no bootstrap token, no `fnox-core` dependency. -## Why the redesign +## Why the design pivoted twice -`001-broker-daemon` was built around an external secret backend (1P / Vault / AWS-SM via `fnox-core`) and a per-instance bootstrap token on disk at `/etc/remo-broker/bootstrap-token` that the daemon used to fetch on demand. End-to-end testing on remo on 2026-05-29 surfaced that this design carries a residual on-disk credential (the bootstrap token) that contradicts the supply-chain threat model the broker was built to defend — and that the operational complexity of running a backend is unnecessary for the actual audience. +`001-broker-daemon` was built around an external secret backend (1P / Vault / AWS-SM via `fnox-core`) with a per-instance bootstrap token on disk. End-to-end testing of remo's 005 spec on 2026-05-29 surfaced that the bootstrap token at `/etc/remo-broker/bootstrap-token` is itself an on-disk credential, contradicting the supply-chain threat model. -See [`remo:specs/006-credential-broker-laptop-push/spec.md`](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) for the full motivation, threat model, and laptop-side requirements. +The first 002 redesign (2026-05-30) replaced the external backend with a "laptop pushes age-encrypted blob to instance over SSH" model. Daemon would decrypt at startup from `/var/lib/remo-broker/secrets.enc` using a key sourced via systemd `LoadCredentialEncrypted=`. Cleaner — no backend, no bootstrap token — but still keyed on the laptop being the source-of-truth, with significant new daemon-side machinery (encrypted-blob reader, age decrypt, atomic file swap). + +On 2026-05-31, the remo-side design pivoted again: the source-of-truth moves into a *sidecar devcontainer* on the same LXC instance. The sidecar holds the encrypted-at-rest fnox storage; it pushes plaintext to the broker over a local Unix socket; the broker is purely in-memory. + +The cascading simplification on the daemon side is significant. **The encrypted-blob reader, age decrypt, atomic file swap, and `LoadCredentialEncrypted` for secrets all go away.** The daemon is now smaller than 001 was, despite gaining `push-creds` and `clear-creds` admin ops. ## What this changes in the daemon @@ -24,47 +29,45 @@ The wire protocol bumps to **v2** per the additive-only-within-major rule in [`d | ID | Requirement | |---|---| -| FR-001 | The daemon reads its encrypted secrets bundle from `/var/lib/remo-broker/secrets.enc` (under `StateDirectory=remo-broker`, owned by the service user, mode 0600) at startup. | -| FR-002 | The decryption key is loaded via systemd's `LoadCredentialEncrypted=secrets-key:`, exposed to the daemon at `$CREDENTIALS_DIRECTORY/secrets-key`. The daemon never reads the key from any other location. | -| FR-003 | The encryption primitive is `age` (X25519 + ChaCha20-Poly1305). The on-disk file is a standard age ciphertext encrypted to the instance's age public recipient. | -| FR-004 | The plaintext, once decrypted, is a TOML map of `{ secret_name = "value" }` (string values only; binary out of scope for v1). The map is held in memory as `Arc>>` (zeroize-on-drop via the existing `secrecy` crate). | -| FR-005 | If the encrypted bundle is absent at startup, the daemon binds its sockets and runs in a "no-secrets" mode; every `get` returns a `not_found` outcome. If the key is absent, the daemon refuses to start (hard error). | -| FR-006 | The admin socket exposes a new operation `push-creds` (NDJSON). Request: `{ "op": "push-creds", "ciphertext_b64": "" }`. Response: `{ "ok": true, "loaded_at": "", "secret_count": N }` or an `ErrorResponse` with code `decrypt_failed` / `invalid_payload`. | -| FR-007 | `push-creds` writes the ciphertext to `secrets.enc.tmp`, calls `fsync`, then `rename`s atomically over `secrets.enc`. After the on-disk swap, the in-memory `ArcSwap` is replaced atomically. In-flight `get` requests complete against whichever snapshot they loaded. | -| FR-008 | The admin socket exposes a new operation `clear-creds`. Request: `{ "op": "clear-creds" }`. Response: `{ "ok": true }`. Effect: in-memory map is replaced with an empty map; `secrets.enc` is zeroized on disk (overwritten with zeros, then unlinked). | -| FR-009 | The admin socket exposes a new operation `get-public-key`. Request: `{ "op": "get-public-key" }`. Response: `{ "ok": true, "recipient": "age1..." }`. Effect: returns the instance's age public recipient so the laptop can encrypt to it. | -| FR-010 | The admin operation `rotate-bootstrap` (and its `BootstrapMode` companion type, and the `bootstrap_mode` field in `StatusResponse`) is removed. The `StatusResponse` gains `{ secrets_loaded_at: >, secret_count: , decryption_key_source: <"tpm2" | "host-key" | "plaintext"> }`. | -| FR-011 | A new audit event `AuditEvent::SecretsPushed { timestamp, secret_count, source: "push-creds" }` is emitted on successful `push-creds`. A new event `AuditEvent::SecretsCleared` on successful `clear-creds`. Values are not written; only counts. | -| FR-012 | The per-project socket protocol (`get` / `ping` / `info`), per-project manifest enforcement, per-project bounded cache, and audit log format from 001 carry forward unchanged except that `Outcome::BackendError` and `Outcome::BackendUnreachable` become unreachable in practice (kept in the enum to avoid wire-protocol churn for downstream parsers). | -| FR-013 | The daemon advertises `PROTOCOL_VERSION = 2` in admin `status` and project `ping` responses. | -| FR-014 | Wire schema `schema/remo-broker.v2.json` is generated by an extended `schema-gen` Cargo feature and published as a release artifact alongside the binaries. | +| FR-001 | The daemon starts with an empty in-memory secrets store. There is no on-disk secrets blob to read at startup. | +| FR-002 | The daemon does not require any secrets-related credential from systemd-credentials at startup. The systemd unit's `LoadCredentialEncrypted=` directive (used in v0.1.0 for the bootstrap token) is removed. | +| FR-003 | The admin socket exposes a new operation `push-creds` (NDJSON). Request: `{ "op": "push-creds", "secrets": { "": "", ... } }`. The map is plaintext — no encryption envelope. Response: `{ "ok": true, "loaded_at": "", "secret_count": N }` or `ErrorResponse` with code `invalid_payload` / `payload_too_large`. | +| FR-004 | `push-creds` atomically replaces the in-memory secrets map via `ArcSwap`. In-flight `get` requests complete against whichever snapshot they loaded. | +| FR-005 | The admin socket exposes a new operation `clear-creds`. Request: `{ "op": "clear-creds" }`. Response: `{ "ok": true }`. Effect: in-memory map is replaced with an empty map. | +| FR-006 | The admin operation `rotate-bootstrap` (and its `BootstrapMode` companion type, and the `bootstrap_mode` field in `StatusResponse`) is removed. The `StatusResponse` gains `{ secrets_loaded_at: >, secret_count: }`. The `decryption_key_source` field considered in the first 002 draft is removed (no key is sourced in the daemon now). | +| FR-007 | A new audit event `AuditEvent::SecretsPushed { timestamp, secret_count }` is emitted on successful `push-creds`. A new event `AuditEvent::SecretsCleared { timestamp }` on successful `clear-creds`. Values are not written; only counts. | +| FR-008 | The per-project socket protocol (`get` / `ping` / `info`), per-project manifest enforcement, per-project bounded cache, and audit log format from 001 carry forward unchanged except that `Outcome::BackendError` and `Outcome::BackendUnreachable` become unreachable in practice (kept in the enum to avoid wire-protocol churn for downstream parsers). | +| FR-009 | The daemon advertises `PROTOCOL_VERSION = 2` in admin `status` and project `ping` responses. | +| FR-010 | Wire schema `schema/remo-broker.v2.json` is generated by an extended `schema-gen` Cargo feature and published as a release artifact alongside the binaries. | +| FR-011 | The admin socket's UNIX permissions (mode 0660, root-owned, group-accessible) must allow the sidecar devcontainer's bind-mount user to call admin ops. The remo Ansible role handles group setup; the daemon itself just binds the socket with the configured mode. | +| FR-012 | Push payload size limit: the daemon accepts `push-creds` requests up to 1 MiB (raised from the v0.1.0 `MAX_MESSAGE_BYTES = 64 KiB`, since a realistic credential bundle can be larger than a single admin request). | ### Non-functional | ID | Requirement | |---|---| -| NFR-001 | Stripped Linux binary ≤ 15 MiB (the original 001 NFR target, missed at v0.1.0 due to `fnox-core` transitive deps including `hidapi`, `libudev`, AWS SDK, hyper/rustls/webpki). | +| NFR-001 | Stripped Linux binary ≤ 15 MiB (the original 001 NFR target, missed at v0.1.0 due to `fnox-core` transitive deps). | | NFR-002 | `Cross.toml` is deleted. `cargo build --target aarch64-unknown-linux-gnu` succeeds against the standard cross-rs image with no pre-build hooks. | -| NFR-003 | `deny.toml`'s `[advisories].ignore` list (currently 6 entries: RUSTSEC-2024-0375 atty, RUSTSEC-2023-0071 rsa Marvin, RUSTSEC-2025-0134 rustls-pemfile, RUSTSEC-2026-0098/-0099/-0104 webpki) is empty after the redesign; all 6 reach the broker via `fnox-core → AWS SDK`. | -| NFR-004 | `push-creds` admin op completes (decrypt + atomic-swap + audit) in < 50ms for a 10 KiB ciphertext, measured on a stock Debian 13 LXC. | -| NFR-005 | `get` request latency from a per-project socket connection is unchanged from 001 (sub-millisecond p99 against the in-memory store, since no backend roundtrip is involved). | +| NFR-003 | `deny.toml`'s `[advisories].ignore` list (currently 6 entries reaching via `fnox-core → AWS SDK`) is empty after the redesign. | +| NFR-004 | `push-creds` admin op completes (validate + atomic-swap + audit) in < 20 ms for a typical 10-secret payload, measured on a stock Debian LXC. | +| NFR-005 | `get` request latency from a per-project socket is unchanged from 001 (sub-millisecond p99 against the in-memory store; no backend roundtrip). | | NFR-006 | All other 001 NFRs (FR-023 systemd hardening profile, FR-022 graceful shutdown drain, FR-018 audit-log degraded buffer) carry forward unchanged. | ## What carries forward from 001 (the chassis) Source files that stay essentially as-is (renames + import updates only): -- `src/proto/mod.rs` (NDJSON framing, 64 KiB cap, smoke-fuzz tests) -- `src/proto/project.rs` (`ProjectRequest::{Get, Ping, Info}`, `GetResponse`, `ProjectErrorCode`) -- `src/manifest.rs` (TOML parser, validators, manifest discovery, `MANIFEST_CANDIDATES`) -- `src/registry.rs` (`ProjectRegistry`, `Project`, per-project socket bind, atomic reload) -- `src/audit.rs` (NDJSON append-only writer, bounded channel + degraded buffer) -- `src/cache.rs` (`BoundedCache`, `SecretString`, zeroize semantics — kept for derived/decrypted values) -- `src/server.rs` core lifecycle (admin socket bind, accept loop, sigterm handling, `JoinSet` drain) — ~80% of the 1676 lines -- `src/config.rs` (after `BootstrapSource` / `BOOTSTRAP_ENV_VAR` / `backend_fetch_timeout` / `fnox_config_path` are removed) -- `packaging/systemd/remo-broker.service` (rename `LoadCredentialEncrypted=bootstrap-token` to `secrets-key`; everything else holds) -- `packaging/sysusers.d/remo-broker.conf` + `tmpfiles.d/remo-broker.conf` (verbatim) -- `schema/remo-broker.v1.json` (this is the **manifest** schema, not wire — unaffected) +- `src/proto/mod.rs` — NDJSON framing, smoke-fuzz tests. `MAX_MESSAGE_BYTES` raised from 64 KiB to 1 MiB (or a per-op cap with `push-creds` higher than others). +- `src/proto/project.rs` — `ProjectRequest::{Get, Ping, Info}`, `GetResponse`, `ProjectErrorCode` +- `src/manifest.rs` — TOML parser, validators, manifest discovery. The remo side may extend the manifest schema with `fetch_as` per-secret directives; the broker does not interpret those (they're for the project devcontainer's `remo-fetch-secrets` helper) but the schema generator will need to know about them. +- `src/registry.rs` — `ProjectRegistry`, `Project`, per-project socket bind, atomic reload +- `src/audit.rs` — NDJSON append-only writer, bounded channel + degraded buffer (extended with new event variants) +- `src/cache.rs` — `BoundedCache`, `SecretString`, zeroize semantics (kept for the per-project cache between `get` calls) +- `src/server.rs` core lifecycle (~80% of the 1676 lines) +- `src/config.rs` (after the bootstrap-related fields are removed) +- `packaging/systemd/remo-broker.service` — **remove** the `LoadCredentialEncrypted=` block entirely; the daemon doesn't load any secret from systemd at startup. Keep the rest of the FR-023 hardening profile. +- `packaging/sysusers.d/remo-broker.conf` + `tmpfiles.d/remo-broker.conf` — verbatim +- `schema/remo-broker.v1.json` — this is the **manifest** schema; carries forward with `fetch_as` extension ## What gets ripped out @@ -75,37 +78,48 @@ Source files deleted entirely: Code surgically excised from kept files: -- `src/main.rs`: `--bootstrap-source`, `--bootstrap-token-path`, `--fnox-config`, `--backend-fetch-timeout-ms` CLI flags; `fetch_token` startup validation; `BackendSession::open`/`discover` branch -- `src/config.rs`: `BootstrapSource` enum, `BootstrapSourceKind`, `BOOTSTRAP_ENV_VAR`, `DEFAULT_BOOTSTRAP_TOKEN_PATH`, `DEFAULT_BACKEND_FETCH_TIMEOUT_MS`, related `Overrides` / `RawConfig` fields, `ConfigError::BackendTimeoutZero`, 8 unit tests -- `src/server.rs`: `dispatch_rotate_bootstrap`, `bootstrap_mode()` helper, `AdminRequest::RotateBootstrap` arm, `backend: Option` field + clone in fallback `Server`, `BackendSession` and `fetch_token` imports, 3 unit tests -- `src/proto/admin.rs`: `AdminRequest::RotateBootstrap`, `RotateBootstrapResponse`, `BackendAuthState`, `BootstrapMode`, `AdminErrorCode::BootstrapError`, `StatusResponse.bootstrap_mode`, 3 unit tests +- `src/main.rs`: all bootstrap-related CLI flags; `fetch_token` startup validation; `BackendSession::open`/`discover` branch +- `src/config.rs`: `BootstrapSource` enum, related fields, ~8 unit tests +- `src/server.rs`: `dispatch_rotate_bootstrap`, `bootstrap_mode()`, `AdminRequest::RotateBootstrap` arm, `backend: Option` field, related imports, 3 unit tests +- `src/proto/admin.rs`: `RotateBootstrap` request/response/error variants, `BackendAuthState`, `BootstrapMode`, `StatusResponse.bootstrap_mode`, related tests Build / dependency artifacts: - `Cross.toml` — delete entire file (libudev is gone with `fnox-core`) -- `Cargo.toml`: remove `fnox-core = "1.25"`; keep `secrecy` (still used by `cache.rs`); add `age` (~v0.10) for ciphertext handling +- `Cargo.toml`: remove `fnox-core = "1.25"`; remove `age` / `pyrage` / similar (none were added — but the first 002 draft would have added `age`; this rewrite removes that addition) - `deny.toml` lines 10-34: delete the entire `[advisories].ignore` array -- `.github/workflows/release.yml` lines 49-58: drop the "Install native libudev (x86_64 fast path)" step; `cross` install can stay or be replaced with bare cargo + linker -- `.github/workflows/ci.yml` line 33-34: drop libudev install step +- `.github/workflows/release.yml`: drop the libudev fast-path step; `cross` install can stay or be replaced with bare cargo + linker +- `.github/workflows/ci.yml`: drop libudev install step + +Compared to the first 002 draft (laptop-push with age encryption), additionally NOT building: + +- `src/store.rs` as designed with encrypted-blob reader — instead a much simpler `InMemorySecretStore { inner: ArcSwap> }` with no decrypt logic +- `src/crypto.rs` for age decrypt — not needed +- Atomic write-to-tmp-then-rename for the on-disk blob — no on-disk blob +- `LoadCredentialEncrypted=secrets-key` in the systemd unit — removed entirely +- `get-public-key` admin op — not needed (no encryption, no pubkey) +- The on-disk-blob fsync + rename + verify dance — not needed Examples / benches (logic carries; constructor changes): -- `examples/soak.rs`, `examples/killtest.rs`, `benches/latency.rs` — rewrite the harness to construct an `InMemorySecretStore` instead of a `BackendSession` +- `examples/soak.rs`, `examples/killtest.rs`, `benches/latency.rs` — rewrite the harness to construct a daemon with an empty `InMemorySecretStore` and to call `push-creds` admin op as part of the workload ## What's new -- `src/store.rs` — new module. `InMemorySecretStore { inner: Arc>> }`. Constructed from a decrypted plaintext map; supports `get(name) -> Option` and `swap(new_map)`. -- `src/crypto.rs` — new module. age decrypt of `secrets.enc` ciphertext using the identity loaded from `$CREDENTIALS_DIRECTORY/secrets-key`. age encrypt is NOT needed in the daemon (only the laptop encrypts). -- Admin ops `push-creds`, `clear-creds`, `get-public-key` — new variants in `AdminRequest`, new response types, dispatch logic in `src/server.rs`. -- `AuditEvent::SecretsPushed` and `AuditEvent::SecretsCleared` — new variants in `src/audit.rs`. -- `schema/remo-broker.v2.json` — extended `schema-gen` feature emits the wire schema (currently only manifest is schema'd). The schema describes the v2 admin + project protocol. +Smaller surface than the first 002 draft: -## Cross-cutting decisions (mirrored from remo spec 006) +- `src/store.rs` — new module. `InMemorySecretStore { inner: Arc>> }`. `get(name) -> Option`, `swap(new_map)`. That's it. No file I/O, no crypto. +- Admin ops `push-creds`, `clear-creds` — new variants in `AdminRequest`, new response types, dispatch logic in `src/server.rs` +- `AuditEvent::SecretsPushed` and `AuditEvent::SecretsCleared` — new variants in `src/audit.rs` +- `schema/remo-broker.v2.json` — extended `schema-gen` feature emits the v2 wire schema -1. **`age` for encryption** — audited, multi-recipient native, mature Rust crate -2. **Decryption-key fallback ladder** — TPM2 → host-key → plaintext-mode-0600. The Ansible role on the remo side decides which tier; the daemon just reads whatever ends up at `$CREDENTIALS_DIRECTORY/secrets-key`. The chosen tier is surfaced in `StatusResponse.decryption_key_source` so operators can audit posture. +## Cross-cutting decisions + +1. **No encryption in the daemon at all.** Push is plaintext over local Unix socket; in-memory store is plaintext. The sidecar handles encryption-at-rest on its own side (with its own fnox storage); the broker is downstream of that boundary and doesn't need to know. +2. **No `LoadCredentialEncrypted` in the systemd unit.** Daemon doesn't load any credential from systemd at startup. (The remo Ansible side still uses `systemd-creds` to encrypt the sidecar's fnox-storage decryption key on the LXC host — but that's a sidecar concern, not a broker concern.) 3. **Wire protocol v2** with published `schema/remo-broker.v2.json` -4. **No backward-compat shims with v0.1** — clean break; documentation flags the migration path for any existing user (which is essentially "wipe `/etc/remo-broker/`, install v0.2, re-push from laptop") +4. **`MAX_MESSAGE_BYTES` raised to 1 MiB** for `push-creds` specifically. Other ops keep their existing limits. A per-op cap is cleaner than a global one; implementer's call. +5. **No backward-compat shims with v0.1.** Clean break; documentation flags the migration path. ## Sequencing @@ -113,20 +127,25 @@ Examples / benches (logic carries; constructor changes): |---|---| | 1 | Delete `backend.rs`, `bootstrap.rs`, related config + tests. Verify `cargo build` clean without `fnox-core`. | | 2 | Simplify `Cross.toml` (delete), `release.yml`, `ci.yml`, `deny.toml`. Verify cross-builds + `cargo deny check` green. | -| 3-4 | Implement `src/store.rs` + `src/crypto.rs` + systemd-credentials loading wiring. Unit tests for decrypt + atomic swap. | -| 5-6 | Implement `push-creds`, `clear-creds`, `get-public-key` admin ops + `AuditEvent::SecretsPushed/Cleared` variants. Update `dispatch_get` cache-miss path to hit `InMemorySecretStore`. Tests. | -| 7 | Rewrite `examples/soak.rs`, `examples/killtest.rs`, `benches/latency.rs` against new constructor. Update `StatusResponse` shape + downstream tests. Generate + commit `schema/remo-broker.v2.json`. | +| 3 | Implement `src/store.rs` (~50 LOC). Update `dispatch_get` to use it. Update systemd unit (remove `LoadCredentialEncrypted=`). | +| 4 | Implement `push-creds` + `clear-creds` admin ops + `AuditEvent::SecretsPushed/Cleared` variants. Update `StatusResponse` shape. Tests. | +| 5 | Raise message-size cap (1 MiB for push-creds). Rewrite `examples/soak.rs`, `killtest.rs`, `benches/latency.rs`. Update wire-protocol doc + generate `schema/remo-broker.v2.json`. | -Total: ~7 days focused work. +**Total**: ~5 days focused work. (Down from ~7 in the first 002 draft.) ## What happens to 001 artifacts -- **`specs/001-broker-daemon/`** stays intact as historical reference. A "Status" header note is added marking it superseded by this spec. +- **`specs/001-broker-daemon/`** stays intact as historical reference; status header marked superseded by this spec. - **`v0.1.0` release** stays published; `v0.2.0` will supersede on the remo side via `BROKER_PINNED_VERSION` bump. -- **`docs/wire-protocol.md`** rewritten as part of the implementation (removing `rotate-bootstrap` section, adding `push-creds` / `clear-creds` / `get-public-key`, documenting v2 schema). +- **`docs/wire-protocol.md`** rewritten as part of the implementation (remove `rotate-bootstrap` section, add `push-creds` / `clear-creds`, document v2 schema). - **`README.md`, `REMO_HANDOFF.md`, `docs/binary-size.md`, `CONTRIBUTING.md`** rewritten as part of the implementation. +## What happens to the first 002 draft + +- The first 002 draft (2026-05-30, laptop-push with age encryption) is captured in this PR's commit history (commit `4db63ea`) for archival reference +- This rewrite (2026-05-31, sidecar-push) supersedes it within the same spec dir; no separate spec number used + ## See also -- [remo spec 006](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) — the laptop CLI half +- [remo spec 006](https://github.com/get2knowio/remo/tree/main/specs/006-credential-broker-laptop-push) — the laptop + sidecar + project-devcontainer half - [001-broker-daemon spec](../001-broker-daemon/spec.md) — the superseded design