Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
2b7a80f
feat(obo): Phase 1A foundation - envelope-crypto + user-context-store
chkraw Jun 3, 2026
62987b9
feat(obo): Phase 1B orchestration plumbing + portal/CLI envelope wiring
chkraw Jun 4, 2026
f9a7a95
chore(obo): fix corrupted JSDoc on UserEnvelopeCarrier
chkraw Jun 4, 2026
0f40d84
feat(obo): Phase 2 user-context store lookup + worker-affined public API
chkraw Jun 8, 2026
72de0d1
Phase 3: portal MSAL downstream-scope acquisition + envelope encryption
chkraw Jun 8, 2026
42d995e
Phase 4: structured tool outcomes (interaction_required, service_unav…
chkraw Jun 8, 2026
c1c9bc4
Phase 5: OBO smoke plugin + live-tenant smoke checklist
chkraw Jun 8, 2026
c5c48df
Phase 6: KEK provisioning, deploy wiring, docs, versions -> 0.1.36
chkraw Jun 8, 2026
c327312
Phase 6 review fixes: missing dep, worker scope, kid versioning, buil…
chkraw Jun 8, 2026
5c5bae0
Phase 6 final-review fixes: smoke principal access, worker.stop final…
chkraw Jun 9, 2026
3338fc8
Land deferred OBO skills + contributor doc updates
chkraw Jun 9, 2026
f05e1a4
Add unit tests asserting plaintext-mode startup warning (envelope-cry…
chkraw Jun 9, 2026
cbedaf3
Land deferred OBO deploy-skill + agent updates
chkraw Jun 9, 2026
e74f996
Phase 7: live-smoke primitives + final-review must-fixes
chkraw Jun 9, 2026
c37d978
Final-review fixes: pin interactionRequired reason-code taxonomy + KE…
chkraw Jun 9, 2026
6c97b7b
Phase 7 deploy plumbing: project OBO_SMOKE_WORKER_APP_* through worke…
chkraw Jun 10, 2026
fdd41c7
Docs: pin interactionRequired reason-code taxonomy + Phase 7 deploy p…
chkraw Jun 10, 2026
698213c
Docs: surface OBO_KEK_KID + OBO_SMOKE_* in deploy/scripts/README.md
chkraw Jun 10, 2026
23b731c
Phase 8: auto-provision OBO smoke worker AAD app
chkraw Jun 10, 2026
8d0aa9b
Final-review polish: fix doc API-name drift + lock InteractionRequire…
chkraw Jun 10, 2026
d807965
Drop PAW-phase labels from code, docs, skills, and tests
chkraw Jun 10, 2026
df01c88
Persist OBO spec and neutralize internal-product references
chkraw Jun 10, 2026
71a812e
Remove non-runnable live-smoke GHA workflow scaffold
chkraw Jun 10, 2026
1ba1b8d
Slim OBO smoke checklist; tighten plugin README
chkraw Jun 10, 2026
22fd8f9
feat(sdk): plugin tools contract + obo-smoke plugin migration
chkraw Jun 11, 2026
4d4b484
chore(deploy): remove obo-smoke bloat from default deploy surface
chkraw Jun 11, 2026
cb0a915
build(deploy): multi-stage worker Dockerfile with opt-in smoke variant
chkraw Jun 11, 2026
f28459f
docs: align operator/builder docs with new plugin contract + smoke op…
chkraw Jun 11, 2026
12fe822
feat(sdk): export PluginManifest as a public type for plugin authors
chkraw Jun 11, 2026
878113a
chore(obo-smoke-plugin): drop internal phase-numbering leak in tools.…
chkraw Jun 11, 2026
337a431
docs: clarify OBO smoke runs as the operator, not a dedicated test user
chkraw Jun 11, 2026
2096d93
docs: drop `ADO is the first consumer` framing from OBO docs
chkraw Jun 11, 2026
1e8b6e7
docs: restore pre-existing references and genericize new local-env leaks
chkraw Jun 11, 2026
ea84a40
refactor: two-phase OBO smoke worker app (app-shell + patch-fic modes)
chkraw Jun 11, 2026
bb50716
docs: run OBO patch-fic at end of deploy, not mid-sandwich
chkraw Jun 11, 2026
ea8c4c1
docs: remove 'skip if .env already pasted' escape hatch from OBO smok…
chkraw Jun 11, 2026
d8a4a9e
docs: reframe OBO smoke consent - per-user default, admin-consent opt…
chkraw Jun 11, 2026
667560e
refactor(bicep): extract OBO KEK creation + RBAC into obo-kek.bicep
chkraw Jun 11, 2026
cfe038a
OBO live-smoke: pivot worker app FIC to MSI-as-FIC pattern
chkraw Jun 11, 2026
9ac6f78
PR #51 live-validation cleanup: native overlay tool reachability, OBO…
chkraw Jun 12, 2026
a84586d
Merge branch 'main' into feature/user-obo-propagation
chkraw Jun 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,21 @@ PORTAL_AUTH_PROVIDER=entra
PORTAL_AUTH_ENTRA_TENANT_ID=<your-entra-tenant-id>
PORTAL_AUTH_ENTRA_CLIENT_ID=<your-entra-client-id>

# User OBO: when set, the portal acquires an additional access
# token at sign-in / RPC time and forwards it via the per-RPC envelope so
# worker tools can perform OAuth2 On-Behalf-Of flows. Format is the
# downstream worker app's API scope, e.g.
# `api://<worker-app>/.default`. Leave unset to disable OBO entirely;
# the portal continues to operate with the existing admission-only flow.
# Pair with OBO_KEK_KID (AKV key URL) for production envelope encryption,
# or with OBO_ENVELOPE_PLAINTEXT_MODE=1 for non-production dev/test.
# PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE=api://<your-worker-app>/.default
# OBO_KEK_KID is the un-versioned AKV key URL — the wrap call returns the
# current key version, and that version is stored alongside the ciphertext
# so rotation just means adding a new key version in AKV (no env change).
# OBO_KEK_KID=https://<your-kv>.vault.azure.net/keys/<key-name>
# OBO_ENVELOPE_PLAINTEXT_MODE=0

# Optional portal authz email allowlists.
# Use normalized user email addresses.
# If omitted, any successfully authenticated user is allowed in.
Expand Down
128 changes: 127 additions & 1 deletion .github/agents/pilotswarm-npm-deployer.agent.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
---
schemaVersion: 1
version: 1.3.0
name: pilotswarm-npm-deployer
description: "Use when deploying PilotSwarm via the npm Bicep/GitOps orchestrator at `deploy/scripts/deploy.mjs` — bringing up a fresh isolated environment (new-env), rolling out updates against an already-deployed new-env stamp, or running the optional Entra app-registration pre-step. Routes between the fresh-scaffold and rollout-to-existing paths, enforces the DO NOT WIPE handshake on destructive ops, and drives interactive resource-naming + edge/TLS selection for new envs. For the legacy bash path (`scripts/deploy-aks.sh`, `scripts/deploy-portal.sh`), use `pilotswarm-aks-deployer` instead."
---
Expand Down Expand Up @@ -53,11 +55,12 @@ If after those cues it's still ambiguous, ask the user one clarifying question b

- `.github/skills/pilotswarm-new-env-deploy/SKILL.md` — for any npm new-env work (fresh or rollout)
- `.github/skills/pilotswarm-portal-app-reg/SKILL.md` — Entra app registration for portal auth (optional new-env pre-step)
- `.github/skills/pilotswarm-obo-smoke-app-reg/SKILL.md` — Entra app registration for the OBO live-smoke worker app (optional pre-step for stamps that run OBO live-smoke)
- `.github/skills/pilotswarm-portal-auth-assignments/SKILL.md` — assign / revoke / list app-role assignments (mandatory follow-up to app-reg when posture is roles-driven)
- `.github/skills/pilotswarm-vpn-client-profile/SKILL.md` — download the Azure VPN client profile (`azurevpnconfig.xml`) for VPN-enabled stamps; offer to run automatically at the end of a first successful VPN-enabled deploy
- `.github/copilot-instructions.md` — source of truth for DO NOT WIPE, repo-scope boundary, sensitive-files rule
- `deploy/scripts/README.md` — canonical orchestrator reference (services, steps, EDGE_MODE × TLS_SOURCE, troubleshooting)
- `deploy/scripts/auth/README.md` — portal app-registration scripts
- `deploy/scripts/auth/README.md` — portal + OBO-smoke app-registration scripts
- `deploy/envs/template.env` — every operator-settable env key with inline documentation

## New-Env Rollout to Existing Stamp
Expand All @@ -78,6 +81,8 @@ Match the change to a service and a minimal step set. Always invoke via `node de
| Cert refresh after AKV cert rotation | `node deploy/scripts/deploy.mjs portal <stamp> --force-module portal --steps bicep` |
| Worker-t3 (StatefulSet) manifest change | `node deploy/scripts/deploy.mjs worker-t3 <stamp> --steps manifests,rollout` |
| End-to-end re-render after multi-service change | `node deploy/scripts/deploy.mjs all <stamp>` (filters by EDGE_MODE/TLS_SOURCE automatically) |
| Toggle OBO User Context on a stamp (`OBO_ENABLED=true`) | `node deploy/scripts/deploy.mjs base-infra <stamp> --steps bicep` then `node deploy/scripts/deploy.mjs all <stamp> --steps manifests,rollout` (re-renders overlay .env with the new `OBO_KEK_KID` bicep output and re-projects worker + portal ConfigMaps). Operator must edit `deploy/envs/local/<stamp>/.env` to set `OBO_ENABLED=true` and `PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE=api://<worker-app>/.default` before re-running base-infra. See `pilotswarm-new-env-deploy` §"User OBO Propagation" + `docs/operations/obo-kek-runbook.md`. |
| Enable OBO live-smoke on a stamp | Build/push the worker image with `--variant smoke`, compose `deploy/envs/template.smoke.env` into `deploy/envs/local/<stamp>/.env`, then run **Step 0.b-early** (app-shell) before bicep to provision the worker app + scope + pre-auth and paste the emitted env lines (`PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE`, `OBO_SMOKE_WORKER_APP_TENANT_ID/_CLIENT_ID/_GRAPH_SCOPE`, and `PLUGIN_DIRS=/app/packages/obo-smoke-plugin`). Run the full deploy (bicep + manifests + rollout). When the stamp is up, run **Step 0.b-late** (patch-fic) just before `pilotswarm smoke` to wire the default MSI-as-FIC trust on the Entra app: `WORKLOAD_IDENTITY_CLIENT_ID` from the bicep cache → UAMI object id subject → eSTS issuer `https://login.microsoftonline.com/<tenant>/v2.0`. No `.env` or k8s changes, no pod restart. Use `-FicPattern aks-direct` only where direct AKS-on-app FICs are allowed; Microsoft CORP requires MSI-as-FIC. `OBO_SMOKE_ENABLED=true` is the smoke-driver marker; worker tool registration is governed by `PLUGIN_DIRS`. `OBO_SMOKE_TEST_USER_UPN` is an optional UPN-assertion knob — leave it empty to accept whichever user signs in. After patch-fic, run `pilotswarm smoke <stamp> --profile obo` from a workstation; the default `--auth device-code` flow lets the operator sign in as themselves (no dedicated test user required — see `docs/operations/live-smoke.md` for MFA / Conditional Access notes). Default production stamps should use the default image and omit the smoke overlay. |

### Pre-flight (mandatory before invoking)

Expand Down Expand Up @@ -226,6 +231,127 @@ role-authoritative branch ignores it when `roles[]` is present in the
JWT. Without the assignment step, every sign-in is denied at the
portal engine (deny-by-default) because no one has a role claim yet.

### Step 0.b — Auto-provision OBO smoke worker app (two-phase; only for OBO live-smoke stamps)

Skip this step entirely for default production stamps or any stamp that
will not run `pilotswarm smoke <stamp> --profile obo`. For smoke
stamps, build the worker with `--variant smoke` and compose the smoke
env overlay first. This step uses the two-phase wrapper so nothing has
to wait for bicep.

**Prerequisite (both phases)**: Step 0 (portal app-reg) must already
have run for the stamp — the wrapper reads
`deploy/envs/local/<stamp>/entra-app.json` to pre-authorize the portal
app. (Operators can override via `-PortalClientId` if they have a
non-standard portal-app source.)

#### Step 0.b-early — `-Mode app-shell` (before bicep)

Runs alongside Step 0; **no OIDC dependency**. Creates the worker app,
mints the OAuth2 scope, declares Microsoft Graph `User.Read` delegated
permission, pre-authorizes the portal app, and emits the `.env` paste
block.

```pwsh
pwsh -NoProfile -ExecutionPolicy Bypass `
-File deploy/scripts/auth/Setup-OboSmokeWorkerApp.ps1 `
-Mode app-shell `
-ServiceTreeId <id> `
-EnvName <stamp>
```

The script writes a sidecar JSON at
`deploy/envs/local/<stamp>/obo-smoke-worker-app.json` (with
`fic.issuer` / `fic.subject` null until patch-fic runs) and prints the
smoke `.env` paste block to stdout:

```
PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE=api://<worker-app-id>/.default offline_access
OBO_SMOKE_WORKER_APP_TENANT_ID=<tenant-id>
OBO_SMOKE_WORKER_APP_CLIENT_ID=<worker-app-id>
OBO_SMOKE_WORKER_APP_GRAPH_SCOPE=https://graph.microsoft.com/User.Read
PLUGIN_DIRS=/app/packages/obo-smoke-plugin
```

**The script never edits `.env`** — same workflow as the portal
`entra-app.json` paste step. Use the `edit` tool to paste the lines
into `deploy/envs/local/<stamp>/.env` after the script returns,
replacing any `__PS_UNSET__` sentinels or empty values for these keys.
If `PLUGIN_DIRS` already contains other plugin directories, append the
smoke path comma-separated. Bicep can now run with the final overlay.

#### Step 0.b-late — `-Mode patch-fic` (after the full deploy completes; just before smoke)

Looks up the worker app by display name (errors out if Step 0.b-early
hasn't run) and create-or-patches the default MSI-as-FIC trust using
`WORKLOAD_IDENTITY_CLIENT_ID` from
`deploy/.tmp/<stamp>/bicep-outputs.cache.json`: eSTS issuer
`https://login.microsoftonline.com/<tenant>/v2.0`, subject
`<uami-object-id>`, audience `api://AzureADTokenExchange`. **No `.env`
or k8s changes** — the worker pod is already using the UAMI and will
start accepting OBO exchanges as soon as the app FIC exists in AAD (no
pod restart required). Run this just before
`pilotswarm smoke <stamp> --profile obo`. Use `-FicPattern aks-direct`
only in tenants that explicitly allow direct AKS-on-app FICs; Microsoft
CORP requires the default MSI-as-FIC pattern.

```pwsh
pwsh -NoProfile -ExecutionPolicy Bypass `
-File deploy/scripts/auth/Setup-OboSmokeWorkerApp.ps1 `
-Mode patch-fic `
-ServiceTreeId <id> `
-EnvName <stamp>
```

The wrapper updates `fic.pattern`, `fic.issuer`, and `fic.subject` in
the existing sidecar JSON and prints a short confirmation pointing at
the smoke command.

#### Single-shot fallback — `-Mode all` (back-compat default)

For operator re-runs against an already-deployed stamp, omit `-Mode`
to run both phases in a single invocation. Requires bicep to have
produced the selected FIC inputs already (`WORKLOAD_IDENTITY_CLIENT_ID`
for default MSI-as-FIC).

**Tightened verification gate (before `worker manifests,rollout`)**:
for OBO live-smoke stamps, the standard Step 3b grep is *not
sufficient* — it only checks key presence. The smoke plugin will fail
at runtime if any of the four keys is empty or still set to the
`__PS_UNSET__` sentinel. Run this stricter check and require zero
matches:

```bash
grep -E '^(PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE|OBO_SMOKE_WORKER_APP_(TENANT_ID|CLIENT_ID|GRAPH_SCOPE)|PLUGIN_DIRS)=(__PS_UNSET__)?$' deploy/envs/local/<stamp>/.env
```

If any line matches, you forgot to paste — re-read the wrapper's
stdout from Step 0.b-early and apply the paste block via `edit` before
invoking `worker manifests,rollout`.

**Consent**: the wrapper declares Microsoft Graph `User.Read`
delegated permission on the worker app. **Per-user consent at portal
sign-in is the default and recommended path** — each user accepts the
"Sign you in and read your profile" prompt once for themselves on
their first OBO smoke sign-in. No tenant admin involvement required.
For shared stamps, you can optionally pre-grant tenant-wide consent
by passing `-GrantAdminConsent` (Global Admin) or by having a Cloud
Application Administrator run `az ad app permission admin-consent
--id <worker-app-id>` once. In highly restricted tenants where user
consent is blocked even for Graph `User.Read`, admin consent becomes
mandatory and the OBO exchange returns `AADSTS65001` until granted.

**Re-runs**: idempotent by display name (`PilotSwarm OBO Smoke Worker -
<stamp>`). The wrapper re-reads the existing OAuth2 scope id rather
than minting a new GUID, overwrites `preAuthorizedApplications` with
the current portal clientId, and create-or-patches the FIC by
deterministic name (`pilotswarm-worker-<stamp>`). If you renamed the
app in the Entra portal, the wrapper creates a fresh app and logs that
the old one was orphaned — clean it up manually.

See the `pilotswarm-obo-smoke-app-reg` skill for the full reference
(parameters, troubleshooting, sidecar shape).

### Step 1 — Discover environment defaults

Before opening the dialogue, run a quick discovery so the user sees
Expand Down
25 changes: 25 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,31 @@ Current overlap to preserve unless intentionally changed:
- `f` in the logs inspector opens the log-filter dialog, `f` in the files inspector opens the files-filter dialog, and `f` in the stats inspector cycles between session, fleet, and users views
- `Shift+A` opens or closes the per-user Admin Console (profile + GitHub Copilot key); inside the console `e` edits the key, `c` clears it, `r` refreshes the profile, and `Esc` returns to the workspace

## User OBO (User-On-Behalf-Of) Propagation

PilotSwarm propagates the signed-in portal user's identity (and, when configured, an envelope-encrypted downstream access token) to worker tool handlers so downstream consumer apps can perform OAuth2 OBO flows (e.g. Microsoft Graph, Azure DevOps, or any Entra-protected resource) as the engineer rather than as the worker UAMI. This is a generic propagation surface — PilotSwarm itself does not call any specific downstream resource; consumer apps that build on PilotSwarm do.

Architecture invariants — do not break these without an explicit cross-repo coordination:

- **Wire field is `envelope`** (carrying plaintext `principal` claims plus optional `accessTokenCipher`), not `envelopeCipher`. Plaintext principal flows on every worker-bound RPC; only the access token is encrypted.
- **Envelope encryption** uses AKV-wrapped DEK + AES-256-GCM ciphertext. KEK selection is via `OBO_KEK_KID` (full versioned or unversioned AKV key URL); on encrypt the cipher records `wrapResult.keyID` (versioned URL) so KEK rotation with prior-version retention works correctly.
- **Three crypto backends** in `packages/sdk/src/envelope-crypto.ts` selected by `selectEnvelopeCrypto(env)`: `AkvEnvelopeCrypto` (production; AKV SDKs lazy-loaded so non-OBO consumers don't pull deps), `InMemoryEnvelopeCrypto` (tests), `PlaintextEnvelopeCrypto` (dev-only, sentinel `kekKid: "plaintext-mode"` — workers must refuse cross-mode interpretation).
- **Worker lookup contract**: tool handlers call `getUserContextForSession(sessionId)` from `pilotswarm-sdk` (worker side). Returns `{ principal: { provider, subject, email, displayName }, accessToken, accessTokenExpiresAt } | null`. The lookup is synchronous, O(1), worker-affined, and resolves through chain resolution (sub-agent sessions → root portal-bound parent at lookup time, not at spawn time) so re-rooting works correctly.
- **`accessToken: null`** is the universal absence signal (no token configured, system/orchestration session, AKV unwrap failure). Tools that need only the principal continue to work; tools that need the token emit `serviceUnavailable` for unwrap failure and `interactionRequired` for AAD interaction-required errors.
- **Structured tool outcomes** in `packages/sdk/src/tool-outcomes.ts`: `interactionRequired({ reasonCode, message?, claims? })` with pinned reason codes (`reauth_required` | `mfa_refresh` | `conditional_access` | `consent_required`) and `serviceUnavailable({ reasonCode, retryAfter?, message? })`. Three-way machine-distinguishable from generic tool failure. The `claims` blob is opaque AAD plumbing and must never reach the LLM transcript; portal re-auth UI keys off `reasonCode`, not message text.
- **Portal-side refresh, not worker-side**: portal MSAL re-acquires silently when the cached token is within ~5 min of expiry at RPC time. The worker never persists or refreshes tokens. Refresh token (`offline_access`) lives only in the in-memory MSAL session cache portal-side.
- **Single-tenant** assumption (configured `https://login.microsoftonline.com/<tenant-id>` authority). Scope minimization: only the configured `PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE` is acquired.
- **System / non-portal sessions**: lookup returns `null`. Local-TUI hosts have no portal envelope and thus no user context.

Trust boundary: the portal-issued envelope is the trust root. Worker tools must not synthesize their own principal from CMS owner fields when an envelope is absent — they must refuse the operation or emit `serviceUnavailable`/`interactionRequired` per the outcome contract.

Operator-visible config:
- Portal: `PORTAL_AUTH_PROVIDER=entra`, `PORTAL_AUTH_ENTRA_TENANT_ID`, `PORTAL_AUTH_ENTRA_CLIENT_ID`, `PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE` (e.g. `api://<worker-app>/.default offline_access`).
- Worker: `OBO_KEK_KID` (AKV key URL), `WORKLOAD_IDENTITY_CLIENT_ID` for the federated-credential exchange.
- Both pods must hold `Key Vault Crypto User` on the OBO KEK AKV. Bicep accepts an array `oboKekUamiPrincipalIds` so single-UAMI deployments (single-UAMI shape) and dual-UAMI deployments (PilotSwarm reference shape) both work.

Live-tenant smoke is the npm publish gate for OBO changes — see `packages/obo-smoke-plugin/` (`obo_smoke_whoami` against Graph `/me`, `obo_smoke_force_reauth`) and `docs/operations/obo-kek-runbook.md`. The smoke plugin is opt-in through the `--variant smoke` worker image plus `PLUGIN_DIRS=/app/packages/obo-smoke-plugin`; `OBO_SMOKE_ENABLED=true` is only the smoke-driver stamp marker. Reference smoke env vars are read at handler-time, not at module-load time, so a loaded smoke plugin still functions correctly once configured.

## TUI Maintenance

The shared terminal UI is a maintained product surface, not an experiment.
Expand Down
2 changes: 2 additions & 0 deletions .github/skills/pilotswarm-aks-deploy/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ Do not hard-code `ACR_NAME` on the deploy command line — `scripts/deploy-aks.s
- When starting all workers simultaneously against a fresh DB, duroxide migrations can race. Duroxide 0.1.19+ uses advisory locks to handle this safely — workers that lose the race will retry and succeed. Earlier versions crash on duplicate migration keys.
- Portal listens on port 3001 (HTTP) internally; TLS termination happens at the app-routing nginx ingress.
- Portal is publicly accessible with Entra ID as the sole access gate.
- OBO live-smoke is opt-in via the smoke worker image variant (`--variant smoke`) plus the smoke env overlay (`deploy/envs/template.smoke.env`, including `PLUGIN_DIRS=/app/packages/obo-smoke-plugin`). Default deploys are smoke-free; `OBO_SMOKE_ENABLED=true` is a smoke-driver marker, not a worker startup gate.
- User OBO Propagation is opt-in and lives on the npm/Bicep deploy path, not on this legacy bash path. If you roll the new SDK forward to `waldemort-aks` via `scripts/deploy-aks.sh`, the worker / portal start in non-OBO mode (backwards-compatible: `selectEnvelopeCrypto` returns null when `OBO_KEK_KID` is unset, principal-only envelopes engage). To enable OBO on this cluster, the operator must (a) provision the OBO KEK in Key Vault out-of-band (or migrate this stamp to the npm Bicep flow), and (b) add `OBO_KEK_KID=<un-versioned AKV key URL>` and `PORTAL_AUTH_ENTRA_DOWNSTREAM_SCOPE=<api://<worker-app>/.default>` to `.env.remote` so the deploy script picks them up into the K8s secret. See `docs/operations/obo-kek-runbook.md` for the canonical rotation / RBAC checklist regardless of which deploy path provisioned the key.
- VPN Gateway P2S is a feature of the **GitOps IaC path** (`deploy/scripts/deploy.mjs` + base-infra bicep), not this legacy `scripts/deploy-aks.sh` flow. If a user mentions VPN-enabled stamps, route them to the `pilotswarm-new-env-deploy` skill ("Optional: VPN Gateway P2S" section) and `docs/deploying-to-aks.md`. Two operator-visible costs to surface up-front when discussing VPN: **45+ minutes** added to the first deploy (gateway provisioning is the long pole) and **~$450/month** runtime cost for `VpnGw2AZ` + Azure Private DNS Resolver (~$280 gateway + ~$170 resolver inbound endpoint). The Resolver is co-provisioned with the VPN gateway because P2S clients cannot reach 168.63.129.16 through the tunnel — without it, clients cannot resolve the portal Private DNS Zone hostname. Generation1 SKUs including `VpnGw1AZ` are excluded — they silently drop OpenVPN+AAD HardResetClientV2 packets. Subsequent param-change deploys are minutes, not 45+.

## Default Deploy Workflow
Expand Down
Loading
Loading