Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
cd48571
feat(vpn-p2s): Phase 1 - base-infra bicep + global-infra frontDoorId
chkraw Jun 15, 2026
59c80e9
feat(vpn-p2s): Phase 2 - validator + env threading + AppGw WAF file +…
Jun 15, 2026
14805a9
fix(vpn-p2s): Phase 2 review follow-ups (SHOULD-FIX-1, SHOULD-FIX-2, …
chkraw Jun 15, 2026
a402914
feat(vpn-p2s): Phase 3 - scaffolder VPN UX (new-env.mjs prompts + tests)
chkraw Jun 15, 2026
2831b7d
fix(vpn-p2s): Phase 3 review follow-ups (BLOCKING + 3 SHOULD-FIX)
chkraw Jun 16, 2026
18f91bd
docs(vpn-p2s): Phase 4 - skill + canonical-doc updates
chkraw Jun 16, 2026
fa8f2f6
docs(vpn-p2s): Phase 4 review follow-ups
chkraw Jun 16, 2026
d108d8e
fix(vpn-p2s): final-review follow-ups (C-1, S-2, S-3)
chkraw Jun 16, 2026
430d359
Scrub Microsoft-internal terms from VPN-introduced files
ChrisKrawczyk Jun 17, 2026
c746b57
Strip PAW phase/spec references from VPN-introduced comments
ChrisKrawczyk Jun 17, 2026
61b9afb
Add VPN reliability context to npm-deployer agent
ChrisKrawczyk Jun 17, 2026
56d55d5
scaffolder: prompt for SSL_CERT_DOMAIN_SUFFIX on AKV TLS source
chkraw Jun 17, 2026
de234ba
skill: add VPN block to new-env defaults table + one-liner intake gui…
chkraw Jun 17, 2026
174535a
agent+skill: forbid auto-suggesting Entra app-reg reuse from sibling …
chkraw Jun 17, 2026
4b5d4df
parseEnvFile: strip inline '# comment' from unquoted values
chkraw Jun 17, 2026
e674858
vpn-gateway: switch default SKU to VpnGw1AZ (non-AZ SKUs deprecated)
chkraw Jun 17, 2026
4971cf7
vpn-gateway: make VPN Gateway PIP zone-redundant for AZ SKUs
chkraw Jun 17, 2026
6714e65
vpn-gateway: seamless P2S DNS via Private DNS Resolver + dual-URI por…
chkraw Jun 18, 2026
63be9fc
docs(proposals): VPN access management via per-stamp custom audience app
chkraw Jun 18, 2026
bc521b8
vpn-gateway: add Get-VpnClientProfile.ps1 + pilotswarm-vpn-client-pro…
chkraw Jun 19, 2026
a7b90a5
vpn-gateway: docs-consistency pass before merge
chkraw Jun 19, 2026
e591b3f
vpn-access-mgmt proposal: add Phase 6 for iOS cert-auth support
chkraw Jun 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 25 additions & 10 deletions .github/agents/pilotswarm-npm-deployer.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ On your first turn, identify which sub-path the user wants. Ask if ambiguous:
|---|---|---|
| "new env", "sandbox", "stamp", `new-env`, fresh RG name, `chkrawps*`-style names | **new-env (fresh)** | `pilotswarm-new-env-deploy` |
| "redeploy / roll out / update / patch <service> to/on `ps<stamp>`", "rebuild and push the worker image to my stamp", any reference to an existing `deploy/envs/local/<stamp>/` directory or `ps<stamp>-*` resource | **new-env (rollout to existing)** | `pilotswarm-new-env-deploy` — §"Per-service redeploys" |
| Anything mentioning **VPN**, P2S, Entra-ID VPN auth, `VPN_GATEWAY_ENABLED`, `GatewaySubnet`, "trusted-bypass for off-allow-list users", or hybrid AFD+VPN ingress | **new-env (fresh or rollout)** — VPN-enabled variant | `pilotswarm-new-env-deploy` — §"Optional: VPN Gateway P2S" (Step 4) |
| Bare "the cluster" / "prod" / "live" with no stamp qualifier, references to `scripts/deploy-aks.sh` or `scripts/deploy-portal.sh`, or to k8s manifests under `deploy/k8s/` | **legacy bash** (out of scope here) | hand off to `pilotswarm-aks-deployer` agent (skills: `pilotswarm-aks-deploy`, `pilotswarm-aks-reset`) |

Disambiguation cues, in order of strength:
Expand All @@ -53,6 +54,7 @@ If after those cues it's still ambiguous, ask the user one clarifying question b
- `.github/skills/pilotswarm-new-env-deploy/SKILL.md` — for any npm new-env work (fresh or rollout)
- `.github/skills/pilotswarm-portal-app-reg/SKILL.md` — Entra app registration for portal auth (optional new-env pre-step)
- `.github/skills/pilotswarm-portal-auth-assignments/SKILL.md` — assign / revoke / list app-role assignments (mandatory follow-up to app-reg when posture is roles-driven)
- `.github/skills/pilotswarm-vpn-client-profile/SKILL.md` — download the Azure VPN client profile (`azurevpnconfig.xml`) for VPN-enabled stamps; offer to run automatically at the end of a first successful VPN-enabled deploy
- `.github/copilot-instructions.md` — source of truth for DO NOT WIPE, repo-scope boundary, sensitive-files rule
- `deploy/scripts/README.md` — canonical orchestrator reference (services, steps, EDGE_MODE × TLS_SOURCE, troubleshooting)
- `deploy/scripts/auth/README.md` — portal app-registration scripts
Expand Down Expand Up @@ -129,14 +131,18 @@ Ask, in order:
- `entra` → continue.
2. **"Do you already have a `PORTAL_AUTH_ENTRA_CLIENT_ID` (an existing
Entra app registration), or shall I provision one for this stamp?"**
- **Default recommendation: provision a new dedicated app for this
stamp.** One app per stamp keeps redirect URI lists clean, lets
each environment be retired (and its app deleted) independently,
and avoids the "shared app blast radius" where revoking one
stamp's access touches every other stamp on the same client id.
Only reuse a shared existing app when the user explicitly asks
for it.
- "Provision one" (recommended) → invoke the
- **Each stamp gets its own dedicated Entra app — always provision
a new one.** One app per stamp keeps redirect URI lists clean,
lets each environment be retired (and its app deleted)
independently, and avoids the "shared app blast radius" where
revoking one stamp's access touches every other stamp on the
same client id.
- **Never auto-suggest copying `PORTAL_AUTH_ENTRA_CLIENT_ID` from a
sibling stamp's `.env` file.** When a previous local stamp is
used as a reference (for subscription, tenant, region, etc.),
pull non-auth values only — the client id is bound to that
stamp's lifecycle and redirect URIs and must not be reused.
- The only valid path is: invoke the
`pilotswarm-portal-app-reg` skill **before** Step 1. That skill
produces the `clientId` and writes it to
`deploy/envs/local/<stamp>/entra-app.json`. The script requires
Expand All @@ -146,8 +152,11 @@ Ask, in order:
The script auto-derives the display name `"PilotSwarm Portal -
<stamp>"`; do not pass `-DisplayName` unless the user wants to
override.
- "I have one / I want to share" → take the client id directly, or
invoke the skill in append mode (`-ExistingAppId <appId> -EnvName <stamp>`).
- The only exception is if the user **explicitly and unprompted**
asks to reuse a specific existing app. In that case take the
client id directly from them, or invoke the skill in append mode
(`-ExistingAppId <appId> -EnvName <stamp>`). Do not infer this
intent from the presence of a sibling stamp.
3. **"Should sign-in be locked down to assigned users only, or open to
any tenant member?"** (only when `entra` and provisioning new)
- **Production stamp (recommended)** → `-CreateAppRoles` + assign
Expand Down Expand Up @@ -320,6 +329,8 @@ gate. Pre-fill from the discovered UPN unless the user overrides.

Validate the EDGE_MODE × TLS_SOURCE combination against the supported matrix before running anything (see `pilotswarm-new-env-deploy` skill §"Edge mode × TLS source selection"). The combos `afd+akv-selfsigned` and `private+letsencrypt` are rejected by `deploy.mjs` itself — call them out before the user hits a `UNSUPPORTED_COMBOS` error.

When the user wants `VPN_GATEWAY_ENABLED=true`, also validate the VPN combo gates up-front (see skill §"Optional: VPN Gateway P2S"). The only valid combo is `EDGE_MODE=afd + TLS_SOURCE=akv`; anything else surfaces a named `[vpn-incompatible-combo]` / `vpn-requires-afd` / `vpn-requires-akv` error from `new-env.mjs` or `validateVpnGatewayCombo()` in `deploy.mjs`. Surface the refusal reason and ask the user to revise — do not retry, and do not silently scaffold without VPN. Likewise, the `VPN_CLIENT_ADDRESS_POOL` (default `172.16.200.0/24`) must not overlap the VNet (default `10.20.0.0/16`); a pool clash surfaces as `[vpn-pool-overlap]`. `SSL_CERT_DOMAIN_SUFFIX` is required for AKV TLS (interactive scaffolder prompts for it when `--tls-source akv`; non-interactive runs must pass `--ssl-cert-domain-suffix <suffix>`) — never instruct the user to hand-edit it post-scaffold.

Only proceed after explicit confirmation. The resource prefix written by the scaffolder is `ps<name>` (e.g. `psmysandbox-wus3-rg`, `psmysandboxglobal`). The env file lands at `deploy/envs/local/<name>/.env` — note the `/local/` subdir.

If your first invocation form fails (e.g. you tried the `npm run deploy:new-env -- … --location …` form and npm stripped the flag), **re-confirm the mode with the user** before retrying with a different form. Do not silently switch from interactive to non-interactive — the prompt surface differs materially.
Expand Down Expand Up @@ -425,6 +436,10 @@ rendered service manifests:

**Portal sign-in loop after deploy.** Redirect URI on the Entra app reg doesn't match the deployed AFD endpoint. Run `az ad app show --id <clientId> --query "spa.redirectUris"` and compare. If the app was created before the AFD endpoint was known, re-run `Setup-PortalAuth.ps1 -ExistingAppId <appId> -EnvName <stamp>` to append the now-known redirect URI.

**VPN gateway provisioning lead time (`VPN_GATEWAY_ENABLED=true` only).** First-time provisioning of the Azure VPN Gateway adds **45+ minutes** to the `base-infra` step — gateway hours are the long pole, not anything in our control. During that window, `az network vnet-gateway show -g <rg> -n <name> --query provisioningState` will sit at `Updating` and `deploy.mjs` will appear stalled. This is **expected, not a failure**. Do not interrupt, re-run, or `--force-module` the bicep step. Subsequent `base-infra` re-deploys against an existing gateway are minutes, not 45+. Confirmation that the deploy is healthy = `provisioningState: Succeeded` on both the gateway and its Public IP; only then move to client-profile distribution.

**VPN access depends on a tenant-admin Conditional Access policy.** `deploy.mjs` cannot create or verify the CA policy that gates the Azure VPN Client app (`c632b3df-fb67-4d84-bdcf-b95ad541b5c8`, or the legacy `41b23e61-...` audience if overridden in `.env`). Without it, the deploy itself succeeds but every first-time connect attempt fails with an opaque AAD error. Before declaring a VPN-enabled stamp ready for use, confirm with the user that a tenant admin has created the CA policy (named users group + require MFA, do **not** require device compliance). The post-scaffold reminder block in `new-env.mjs` re-prints these requirements; surface them again on rollout if VPN is being enabled for the first time on an existing stamp.

## Constraints

- Never propagate PilotSwarm changes into downstream consumer repos (e.g. apps that vendor or consume PilotSwarm as an SDK) unless the user explicitly asks.
Expand Down
1 change: 1 addition & 0 deletions .github/skills/pilotswarm-aks-deploy/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Do not hard-code `ACR_NAME` on the deploy command line — `scripts/deploy-aks.s
- When starting all workers simultaneously against a fresh DB, duroxide migrations can race. Duroxide 0.1.19+ uses advisory locks to handle this safely — workers that lose the race will retry and succeed. Earlier versions crash on duplicate migration keys.
- Portal listens on port 3001 (HTTP) internally; TLS termination happens at the app-routing nginx ingress.
- Portal is publicly accessible with Entra ID as the sole access gate.
- VPN Gateway P2S is a feature of the **GitOps IaC path** (`deploy/scripts/deploy.mjs` + base-infra bicep), not this legacy `scripts/deploy-aks.sh` flow. If a user mentions VPN-enabled stamps, route them to the `pilotswarm-new-env-deploy` skill ("Optional: VPN Gateway P2S" section) and `docs/deploying-to-aks.md`. Two operator-visible costs to surface up-front when discussing VPN: **45+ minutes** added to the first deploy (gateway provisioning is the long pole) and **~$450/month** runtime cost for `VpnGw2AZ` + Azure Private DNS Resolver (~$280 gateway + ~$170 resolver inbound endpoint). The Resolver is co-provisioned with the VPN gateway because P2S clients cannot reach 168.63.129.16 through the tunnel — without it, clients cannot resolve the portal Private DNS Zone hostname. Generation1 SKUs including `VpnGw1AZ` are excluded — they silently drop OpenVPN+AAD HardResetClientV2 packets. Subsequent param-change deploys are minutes, not 45+.

## Default Deploy Workflow

Expand Down
Loading
Loading