diff --git a/.github/agents/pilotswarm-npm-deployer.agent.md b/.github/agents/pilotswarm-npm-deployer.agent.md index f9031fac..2f2f6470 100644 --- a/.github/agents/pilotswarm-npm-deployer.agent.md +++ b/.github/agents/pilotswarm-npm-deployer.agent.md @@ -38,6 +38,7 @@ On your first turn, identify which sub-path the user wants. Ask if ambiguous: |---|---|---| | "new env", "sandbox", "stamp", `new-env`, fresh RG name, `chkrawps*`-style names | **new-env (fresh)** | `pilotswarm-new-env-deploy` | | "redeploy / roll out / update / patch to/on `ps`", "rebuild and push the worker image to my stamp", any reference to an existing `deploy/envs/local//` directory or `ps-*` resource | **new-env (rollout to existing)** | `pilotswarm-new-env-deploy` — §"Per-service redeploys" | +| Anything mentioning **VPN**, P2S, Entra-ID VPN auth, `VPN_GATEWAY_ENABLED`, `GatewaySubnet`, "trusted-bypass for off-allow-list users", or hybrid AFD+VPN ingress | **new-env (fresh or rollout)** — VPN-enabled variant | `pilotswarm-new-env-deploy` — §"Optional: VPN Gateway P2S" (Step 4) | | Bare "the cluster" / "prod" / "live" with no stamp qualifier, references to `scripts/deploy-aks.sh` or `scripts/deploy-portal.sh`, or to k8s manifests under `deploy/k8s/` | **legacy bash** (out of scope here) | hand off to `pilotswarm-aks-deployer` agent (skills: `pilotswarm-aks-deploy`, `pilotswarm-aks-reset`) | Disambiguation cues, in order of strength: @@ -53,6 +54,7 @@ If after those cues it's still ambiguous, ask the user one clarifying question b - `.github/skills/pilotswarm-new-env-deploy/SKILL.md` — for any npm new-env work (fresh or rollout) - `.github/skills/pilotswarm-portal-app-reg/SKILL.md` — Entra app registration for portal auth (optional new-env pre-step) - `.github/skills/pilotswarm-portal-auth-assignments/SKILL.md` — assign / revoke / list app-role assignments (mandatory follow-up to app-reg when posture is roles-driven) +- `.github/skills/pilotswarm-vpn-client-profile/SKILL.md` — download the Azure VPN client profile (`azurevpnconfig.xml`) for VPN-enabled stamps; offer to run automatically at the end of a first successful VPN-enabled deploy - `.github/copilot-instructions.md` — source of truth for DO NOT WIPE, repo-scope boundary, sensitive-files rule - `deploy/scripts/README.md` — canonical orchestrator reference (services, steps, EDGE_MODE × TLS_SOURCE, troubleshooting) - `deploy/scripts/auth/README.md` — portal app-registration scripts @@ -129,14 +131,18 @@ Ask, in order: - `entra` → continue. 2. **"Do you already have a `PORTAL_AUTH_ENTRA_CLIENT_ID` (an existing Entra app registration), or shall I provision one for this stamp?"** - - **Default recommendation: provision a new dedicated app for this - stamp.** One app per stamp keeps redirect URI lists clean, lets - each environment be retired (and its app deleted) independently, - and avoids the "shared app blast radius" where revoking one - stamp's access touches every other stamp on the same client id. - Only reuse a shared existing app when the user explicitly asks - for it. - - "Provision one" (recommended) → invoke the + - **Each stamp gets its own dedicated Entra app — always provision + a new one.** One app per stamp keeps redirect URI lists clean, + lets each environment be retired (and its app deleted) + independently, and avoids the "shared app blast radius" where + revoking one stamp's access touches every other stamp on the + same client id. + - **Never auto-suggest copying `PORTAL_AUTH_ENTRA_CLIENT_ID` from a + sibling stamp's `.env` file.** When a previous local stamp is + used as a reference (for subscription, tenant, region, etc.), + pull non-auth values only — the client id is bound to that + stamp's lifecycle and redirect URIs and must not be reused. + - The only valid path is: invoke the `pilotswarm-portal-app-reg` skill **before** Step 1. That skill produces the `clientId` and writes it to `deploy/envs/local//entra-app.json`. The script requires @@ -146,8 +152,11 @@ Ask, in order: The script auto-derives the display name `"PilotSwarm Portal - "`; do not pass `-DisplayName` unless the user wants to override. - - "I have one / I want to share" → take the client id directly, or - invoke the skill in append mode (`-ExistingAppId -EnvName `). + - The only exception is if the user **explicitly and unprompted** + asks to reuse a specific existing app. In that case take the + client id directly from them, or invoke the skill in append mode + (`-ExistingAppId -EnvName `). Do not infer this + intent from the presence of a sibling stamp. 3. **"Should sign-in be locked down to assigned users only, or open to any tenant member?"** (only when `entra` and provisioning new) - **Production stamp (recommended)** → `-CreateAppRoles` + assign @@ -320,6 +329,8 @@ gate. Pre-fill from the discovered UPN unless the user overrides. Validate the EDGE_MODE × TLS_SOURCE combination against the supported matrix before running anything (see `pilotswarm-new-env-deploy` skill §"Edge mode × TLS source selection"). The combos `afd+akv-selfsigned` and `private+letsencrypt` are rejected by `deploy.mjs` itself — call them out before the user hits a `UNSUPPORTED_COMBOS` error. +When the user wants `VPN_GATEWAY_ENABLED=true`, also validate the VPN combo gates up-front (see skill §"Optional: VPN Gateway P2S"). The only valid combo is `EDGE_MODE=afd + TLS_SOURCE=akv`; anything else surfaces a named `[vpn-incompatible-combo]` / `vpn-requires-afd` / `vpn-requires-akv` error from `new-env.mjs` or `validateVpnGatewayCombo()` in `deploy.mjs`. Surface the refusal reason and ask the user to revise — do not retry, and do not silently scaffold without VPN. Likewise, the `VPN_CLIENT_ADDRESS_POOL` (default `172.16.200.0/24`) must not overlap the VNet (default `10.20.0.0/16`); a pool clash surfaces as `[vpn-pool-overlap]`. `SSL_CERT_DOMAIN_SUFFIX` is required for AKV TLS (interactive scaffolder prompts for it when `--tls-source akv`; non-interactive runs must pass `--ssl-cert-domain-suffix `) — never instruct the user to hand-edit it post-scaffold. + Only proceed after explicit confirmation. The resource prefix written by the scaffolder is `ps` (e.g. `psmysandbox-wus3-rg`, `psmysandboxglobal`). The env file lands at `deploy/envs/local//.env` — note the `/local/` subdir. If your first invocation form fails (e.g. you tried the `npm run deploy:new-env -- … --location …` form and npm stripped the flag), **re-confirm the mode with the user** before retrying with a different form. Do not silently switch from interactive to non-interactive — the prompt surface differs materially. @@ -425,6 +436,10 @@ rendered service manifests: **Portal sign-in loop after deploy.** Redirect URI on the Entra app reg doesn't match the deployed AFD endpoint. Run `az ad app show --id --query "spa.redirectUris"` and compare. If the app was created before the AFD endpoint was known, re-run `Setup-PortalAuth.ps1 -ExistingAppId -EnvName ` to append the now-known redirect URI. +**VPN gateway provisioning lead time (`VPN_GATEWAY_ENABLED=true` only).** First-time provisioning of the Azure VPN Gateway adds **45+ minutes** to the `base-infra` step — gateway hours are the long pole, not anything in our control. During that window, `az network vnet-gateway show -g -n --query provisioningState` will sit at `Updating` and `deploy.mjs` will appear stalled. This is **expected, not a failure**. Do not interrupt, re-run, or `--force-module` the bicep step. Subsequent `base-infra` re-deploys against an existing gateway are minutes, not 45+. Confirmation that the deploy is healthy = `provisioningState: Succeeded` on both the gateway and its Public IP; only then move to client-profile distribution. + +**VPN access depends on a tenant-admin Conditional Access policy.** `deploy.mjs` cannot create or verify the CA policy that gates the Azure VPN Client app (`c632b3df-fb67-4d84-bdcf-b95ad541b5c8`, or the legacy `41b23e61-...` audience if overridden in `.env`). Without it, the deploy itself succeeds but every first-time connect attempt fails with an opaque AAD error. Before declaring a VPN-enabled stamp ready for use, confirm with the user that a tenant admin has created the CA policy (named users group + require MFA, do **not** require device compliance). The post-scaffold reminder block in `new-env.mjs` re-prints these requirements; surface them again on rollout if VPN is being enabled for the first time on an existing stamp. + ## Constraints - Never propagate PilotSwarm changes into downstream consumer repos (e.g. apps that vendor or consume PilotSwarm as an SDK) unless the user explicitly asks. diff --git a/.github/skills/pilotswarm-aks-deploy/SKILL.md b/.github/skills/pilotswarm-aks-deploy/SKILL.md index 43048645..386eb591 100644 --- a/.github/skills/pilotswarm-aks-deploy/SKILL.md +++ b/.github/skills/pilotswarm-aks-deploy/SKILL.md @@ -66,6 +66,7 @@ Do not hard-code `ACR_NAME` on the deploy command line — `scripts/deploy-aks.s - When starting all workers simultaneously against a fresh DB, duroxide migrations can race. Duroxide 0.1.19+ uses advisory locks to handle this safely — workers that lose the race will retry and succeed. Earlier versions crash on duplicate migration keys. - Portal listens on port 3001 (HTTP) internally; TLS termination happens at the app-routing nginx ingress. - Portal is publicly accessible with Entra ID as the sole access gate. +- VPN Gateway P2S is a feature of the **GitOps IaC path** (`deploy/scripts/deploy.mjs` + base-infra bicep), not this legacy `scripts/deploy-aks.sh` flow. If a user mentions VPN-enabled stamps, route them to the `pilotswarm-new-env-deploy` skill ("Optional: VPN Gateway P2S" section) and `docs/deploying-to-aks.md`. Two operator-visible costs to surface up-front when discussing VPN: **45+ minutes** added to the first deploy (gateway provisioning is the long pole) and **~$450/month** runtime cost for `VpnGw2AZ` + Azure Private DNS Resolver (~$280 gateway + ~$170 resolver inbound endpoint). The Resolver is co-provisioned with the VPN gateway because P2S clients cannot reach 168.63.129.16 through the tunnel — without it, clients cannot resolve the portal Private DNS Zone hostname. Generation1 SKUs including `VpnGw1AZ` are excluded — they silently drop OpenVPN+AAD HardResetClientV2 packets. Subsequent param-change deploys are minutes, not 45+. ## Default Deploy Workflow diff --git a/.github/skills/pilotswarm-new-env-deploy/SKILL.md b/.github/skills/pilotswarm-new-env-deploy/SKILL.md index 39ade600..1a0e4d69 100644 --- a/.github/skills/pilotswarm-new-env-deploy/SKILL.md +++ b/.github/skills/pilotswarm-new-env-deploy/SKILL.md @@ -34,6 +34,7 @@ updated in lockstep with the code, this skill is a procedural overlay: | T2 | Control AKS, ACR, Postgres Flex, Storage, Key Vault, UAMIs, Flux | Always | | T2 edge (afd) | AppGw v2 + WAF + Private Link Service + AGIC | `EDGE_MODE=afd` | | T2 edge (private) | AKS web-app-routing (NGINX) on ILB + Private DNS Zone | `EDGE_MODE=private` | +| T2 ingress (vpn) | Azure VPN Gateway P2S (OpenVPN + Entra ID) + `GatewaySubnet` + managed Private DNS zone + auto-seeded AppGw WAF guard rules | `VPN_GATEWAY_ENABLED=true` (additive; requires `EDGE_MODE=afd` + `TLS_SOURCE=akv`) | | T3 | Ephemeral worker AKS + workload-SA UAMI + Flux + `worker-t3-manifests` blob container | Always | | Cross-cluster | T2 csi UAMI gets `AKS Cluster User Role` on T3 | For T2 worker → T3 kubeconfig minting | @@ -61,22 +62,31 @@ Ask, in order: - `entra` → continue. 2. **"Do you already have a `PORTAL_AUTH_ENTRA_CLIENT_ID` (an existing Entra app registration), or shall I provision one for this stamp?"** - - **Default recommendation: provision a new dedicated app for this - stamp.** One app per stamp keeps redirect URI lists clean, lets - each environment be retired (and its app deleted) independently, - and avoids the "shared app blast radius" where revoking one - stamp's access touches every other stamp on the same client id. - Only reuse a shared existing app when the user explicitly asks - for it (e.g. they want a single SSO consent prompt across all - dev stamps, or tenant policy makes app creation expensive). - - "Provision one" (recommended) → invoke the - `pilotswarm-portal-app-reg` skill **before** Step 1. That skill + - **Each stamp gets its own dedicated Entra app — always provision + a new one.** One app per stamp keeps redirect URI lists clean, + lets each environment be retired (and its app deleted) + independently, and avoids the "shared app blast radius" where + revoking one stamp's access touches every other stamp on the + same client id. + - **Never auto-suggest copying `PORTAL_AUTH_ENTRA_CLIENT_ID` from a + sibling stamp's `.env` file.** Even when a previous local stamp + (e.g. `chkrawps9/.env`) shows a working client id, that app is + bound to that stamp's lifecycle and redirect URIs. Pulling + non-auth values (subscription, tenant, region) from a reference + stamp is fine; pulling the client id is not. + - The only valid path is: invoke the `pilotswarm-portal-app-reg` + skill **before** Step 1 to provision a fresh app. That skill produces the `clientId` and writes it to `deploy/envs/local//entra-app.json`. The skill requires `-ServiceTreeId` — ask the user for theirs before invoking; do not invent a placeholder. - - "I have one / I want to share" → take the client id directly, or - invoke the skill in append mode (`-ExistingAppId -EnvName `). + - The only exception is if the user **explicitly and unprompted** + asks to reuse a specific existing app (e.g. tenant policy makes + app creation expensive, or they want a single SSO consent prompt + across all dev stamps). In that case take the client id directly + from them, or invoke the skill in append mode + (`-ExistingAppId -EnvName `). Do not infer this + intent from the presence of a sibling stamp. 3. **"Should sign-in be locked down to assigned users only, or open to any tenant member?"** (only when `entra` and provisioning new) - **Production stamp (recommended)** → `-CreateAppRoles` + assign @@ -164,6 +174,12 @@ Edge / TLS host portal (default) # only when edge-mode=private private-dns-zone # only when edge-mode=private +VPN (optional — only when user asks for VPN access / off-network ingress) + vpn-enabled n (default) # n | y; when 'y', auto-implies edge-mode=afd + tls-source=akv + ssl-cert-domain-suffix # DNS suffix whose AKV cert this stamp will serve (e.g. dev.contoso.example) + vpn-client-address-pool 172.16.200.0/24 (default) # must not overlap VNet 10.20.0.0/16 + vpn-aad-audience c632b3df-fb67-4d84-bdcf-b95ad541b5c8 (default) # current Azure VPN Client app; override only for legacy-audience tenants + Per-stamp secrets (Key Vault) GITHUB_TOKEN # optional; sentinel if empty AZURE_MODEL_ROUTER_KEY # optional @@ -202,6 +218,19 @@ Portal auth (ConfigMap) — fields depend on auth posture USER_ASSIGNMENTS # UPNs / object ids / group display names, comma-separated ``` +**When the user asks for VPN access (e.g. "spin up a VPN-enabled env", +"I need off-network access", "trusted-bypass lane"):** populate the VPN +block above and **auto-fill `edge-mode=afd` + `tls-source=akv` as the +defaults** for the Edge/TLS block — these are the only values that pass +the VPN combo gate, so the user shouldn't have to discover that +separately. Then ask for `ssl-cert-domain-suffix` (the one new value the +agent cannot infer), confirm `vpn-client-address-pool` (default is fine +unless their corp VPN already uses 172.16.200.0/24), and leave +`vpn-aad-audience` at the default unless they explicitly mention legacy +VPN client builds. Surface the 45-min provisioning lead time and the +tenant-admin Conditional Access requirement (see §"Optional: VPN +Gateway P2S" for both) *before* invoking the scaffolder, not after. + **Pick one mechanism per stamp; don't mix roles + email allowlist.** The portal authz engine treats the JWT `roles` claim as authoritative when present (see `packages/portal/auth/authz/engine.js`): the @@ -330,17 +359,93 @@ after non-interactive runs too. ## Step 4 — Edge mode × TLS source selection -| `EDGE_MODE` | `TLS_SOURCE` | Supported? | When | -|---|---|---|---| -| `afd` | `letsencrypt` | ✅ default | OSS-friendly public endpoint, ACME HTTP-01. | -| `afd` | `akv` | ✅ | Enterprise-internal OneCertV2-PublicCA cert via AKV. | -| `afd` | `akv-selfsigned` | ❌ | AppGw can't consume an AKV `Self` chain end-to-end. | -| `private` | `akv` | ✅ | Enterprise-internal OneCertV2-PrivateCA via AKV. | -| `private` | `akv-selfsigned` | ✅ | OSS-friendly private demo; AKV `Self`-issued cert. | -| `private` | `letsencrypt` | ❌ | ACME HTTP-01 cannot reach a private/ILB endpoint. | +| `EDGE_MODE` | `TLS_SOURCE` | `VPN_GATEWAY_ENABLED` | Supported? | When | +|---|---|---|---|---| +| `afd` | `letsencrypt` | `false` | ✅ default | OSS-friendly public endpoint, ACME HTTP-01. | +| `afd` | `akv` | `false` | ✅ | Enterprise-internal OneCertV2-PublicCA cert via AKV. | +| `afd` | `akv` | `true` | ✅ | Hybrid AFD + VPN P2S (trusted-bypass for authenticated users not matched by the AFD WAF allow-list). AKV-only — see "Optional: VPN Gateway P2S" below. | +| `afd` | `akv-selfsigned` | any | ❌ | AppGw can't consume an AKV `Self` chain end-to-end. | +| `afd` | `letsencrypt` | `true` | ❌ | VPN path requires an AKV cert; ACME HTTP-01 cannot reach a VPN-only client. Preflight error: `vpn-requires-akv`. | +| `private` | `akv` | `false` | ✅ | Enterprise-internal OneCertV2-PrivateCA via AKV. | +| `private` | `akv-selfsigned` | `false` | ✅ | OSS-friendly private demo; AKV `Self`-issued cert. | +| `private` | `letsencrypt` | any | ❌ | ACME HTTP-01 cannot reach a private/ILB endpoint. | +| `private` | any | `true` | ❌ | VPN path is additive to AFD only (its WAF guard rules assume AFD as the public ingress). Preflight error: `vpn-requires-afd`. | The orchestrator validates the combo against `UNSUPPORTED_COMBOS` in -`deploy.mjs` before any Bicep runs. +`deploy.mjs` and `validateVpnGatewayCombo()` in +`deploy/scripts/lib/overlay-contracts.mjs` before any Bicep runs. + +### Optional: VPN Gateway P2S (hybrid AFD + VPN) + +Tenant users with a valid Entra ID token can still be blocked at the +public edge by AFD WAF allow-lists that the operator configures +(typically service-tag, IP-range, or header-based rules that gate the +public ingress to a known managed-network population). Enabling +`VPN_GATEWAY_ENABLED=true` adds an Azure VPN Gateway (Point-to-Site, +OpenVPN, Microsoft Entra ID auth) that terminates at the same AppGw +private listener as the AFD path, with the same AKV cert — a +"trusted-bypass" lane to the same stamp for authenticated tenant users +who don't match the public allow-list (e.g. remote / off-network / +unmanaged-device users). + +- **Constraints**: `EDGE_MODE=afd` is required (code: `vpn-requires-afd`). + `validateVpnGatewayCombo()` accepts any AKV-family `TLS_SOURCE` (`akv` + or `akv-selfsigned`; code: `vpn-requires-akv`); however, + `akv-selfsigned` is also rejected on AFD stamps end-to-end by + `deploy.mjs` `UNSUPPORTED_COMBOS`, so the only effective combo for the + trusted-bypass is `TLS_SOURCE=akv`. `letsencrypt` is rejected because + ACME HTTP-01 cannot reach a VPN-only client. + `SSL_CERT_DOMAIN_SUFFIX` must be set (the managed Private DNS zone + uses it). The scaffolder prompts for it interactively when + `TLS_SOURCE=akv`, or pass `--ssl-cert-domain-suffix ` for + non-interactive runs (e.g. `--ssl-cert-domain-suffix + dev.contoso.example`). `VPN_CLIENT_ADDRESS_POOL` must not overlap the VNet (default + `10.20.0.0/16`). VPN uses the existing stamp `AZURE_TENANT_ID` — same + Entra tenant as the rest of the deploy. +- **Cost / time**: ~$450/month total for `VpnGw2AZ` + Azure Private DNS Resolver inbound endpoint (~$280 gateway + PIP + ~$170 resolver). The Resolver is co-provisioned with the gateway so P2S clients can resolve Private DNS Zone records (e.g. portal hostname) through the tunnel without hosts-file edits. + First-deploy time is **45+ minutes** (gateway provisioning is the long + pole). Subsequent param changes are still measured in minutes. +- **AAD audience**: defaults to `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` + (current Microsoft-registered Azure VPN Client app). Set + `VPN_AAD_AUDIENCE=41b23e61-6c1e-4545-b367-cd054e0ed4b4` in `.env` only on + tenants that must interop with older Azure VPN client builds registered + against the legacy audience. +- **Auto-seeded AppGw WAF guards**: when `VPN_GATEWAY_ENABLED=true` the + base-infra bicep prepends three custom rules at priorities 90/91/92 to + the AppGw WAF policy — `AllowAfd` (matches `X-Azure-FDID = + `), `AllowVpn` (matches `RemoteAddr ∈ + VPN_CLIENT_ADDRESS_POOL`), and `BlockOther` (catch-all `Block`). + Operator rules loaded from `APPGW_WAF_CUSTOM_RULES_FILE` (mirrors the + AFD-side `WAF_CUSTOM_RULES_FILE`) MUST start at priority ≥ 100 — the + 90–92 band is reserved. + +#### Setting up Conditional Access + +Required out-of-band step before first VPN connect (tenant admin only). +Create one CA policy targeting the Azure VPN Client app +`c632b3df-fb67-4d84-bdcf-b95ad541b5c8` (or the legacy +`41b23e61-6c1e-4545-b367-cd054e0ed4b4` if you set the override above), +assigned to a NAMED users group (do not target "all users"), with the +grant control set to **require MFA**. Explicitly do **NOT** require device +compliance — the VPN client cannot satisfy that grant and the connect +attempt will fail with an opaque AAD error. The post-scaffold reminder +block in `new-env.mjs` re-prints these requirements when `VPN_GATEWAY_ENABLED=true`. + +#### Distributing the VPN client profile + +After the first deploy completes, hand operators the OpenVPN client +profile. Preferred path is the repo helper: +`pwsh -File deploy/scripts/auth/Get-VpnClientProfile.ps1 -EnvName ` +(see the `pilotswarm-vpn-client-profile` skill). It wraps +`az network vnet-gateway vpn-client generate --authentication-method EAPTLS`, +downloads the signed zip, and extracts it under the gitignored +`deploy/envs/local//vpn-client/` folder. As fallbacks: the Azure +portal — `Resource group → → Point-to-site configuration → +Download VPN client` — or the raw CLI `az network vnet-gateway vpn-client +generate --resource-group --name +--authentication-method EAPTLS`. All three emit the same `.zip` that +imports directly into the Azure VPN Client app (Windows / macOS / iOS / +Android). ## Step 5 — Deploy diff --git a/.github/skills/pilotswarm-portal-app-reg/SKILL.md b/.github/skills/pilotswarm-portal-app-reg/SKILL.md index c9af3178..cec95385 100644 --- a/.github/skills/pilotswarm-portal-app-reg/SKILL.md +++ b/.github/skills/pilotswarm-portal-app-reg/SKILL.md @@ -232,7 +232,11 @@ unless the user wants a non-standard name. This auto-discovers the redirect URI from `deploy/envs//bicep-outputs.cache.json` and writes a JSON summary -to `deploy/envs//entra-app.json`. If the bicep cache doesn't +to `deploy/envs//entra-app.json`. **AFD + VPN stamps** +(`VPN_GATEWAY_ENABLED=true` with `EDGE_MODE=afd`) get **two** redirect +URIs registered automatically: the AFD endpoint (public path) and the +portal `PORTAL_HOSTNAME` (VPN-private path, served by the AppGw +listener behind the Private DNS A record). If the bicep cache doesn't exist yet (you're running BEFORE first deploy), keep `-EnvName` for the display name and per-stamp output path, and add `-RedirectUri` once you know the AFD endpoint (or omit it entirely to create the app shell now diff --git a/.github/skills/pilotswarm-vpn-client-profile/SKILL.md b/.github/skills/pilotswarm-vpn-client-profile/SKILL.md new file mode 100644 index 00000000..b8844cb9 --- /dev/null +++ b/.github/skills/pilotswarm-vpn-client-profile/SKILL.md @@ -0,0 +1,165 @@ +--- +name: pilotswarm-vpn-client-profile +description: "Use after deploying a PilotSwarm stamp with VPN_GATEWAY_ENABLED=true when an operator needs the Azure VPN client profile (azurevpnconfig.xml). Wraps deploy/scripts/auth/Get-VpnClientProfile.ps1 — downloads the gateway-issued profile zip via 'az network vnet-gateway vpn-client generate' and extracts it under the gitignored deploy/envs/local//vpn-client/ folder. The XML is the same for every user (no per-user credentials), and end users still authenticate with their own Entra ID at connect time." +--- + +# pilotswarm-vpn-client-profile + +Downloads and stages the Azure VPN client profile for a PilotSwarm +VPN-enabled stamp. This is a small operational helper that wraps an +`az network vnet-gateway vpn-client generate` call, downloads the +returned signed zip, and extracts the `AzureVPN/azurevpnconfig.xml` +file that the Azure VPN Client app imports. + +## When to use this skill + +| User signal | Use this skill? | +|---|---| +| "give me the VPN client profile for " | YES | +| "download the VPN config" / "get the VPN file to share" | YES | +| First-time bring-up of a VPN-enabled stamp, end users need profile | YES | +| Re-issued audience or re-deployed gateway, profile needs refresh | YES (with `-Force`) | +| Stamp does NOT have VPN enabled (`VPN_GATEWAY_ENABLED` unset / false) | NO — the script will fail-fast on the missing `VPN_GATEWAY_ID` | +| Stamp deployed but `deploy/.tmp//bicep-outputs.cache.json` is gone | NO — re-run deploy first to repopulate the cache | + +## When the deployer agent should invoke this automatically + +For VPN-enabled stamps (`VPN_GATEWAY_ENABLED=true` resolved from the +env file), the `pilotswarm-npm-deployer` flow should offer to run this +skill at the end of the first successful deploy — after `Setup-PortalAuth` +and `Set-PortalAuthAssignments`, before the final operator-handoff +banner. The profile is small (~3 KB), gitignored, and pulling it +proactively saves a round-trip when operators are about to test the +new VPN gateway. Skip silently for non-VPN stamps. + +## Underlying tooling + +| Script | Path | Purpose | +|---|---|---| +| `Get-VpnClientProfile.ps1` | `deploy/scripts/auth/` | Download + extract the profile zip | + +The script reads the gateway resource id from +`deploy/.tmp//bicep-outputs.cache.json` (`VPN_GATEWAY_ID`), +parses the RG and gateway name, and calls +`az network vnet-gateway vpn-client generate --authentication-method EAPTLS` +to mint a fresh signed SAS URL (typically valid ~1 hour). It then +downloads the zip with `Invoke-WebRequest` and expands it under the +per-stamp gitignored folder. The first generate per gateway can take +~30–60 seconds; subsequent calls are faster. + +## Output + +``` +deploy/envs/local//vpn-client/ + AzureVPN/ + azurevpnconfig.xml <-- the file end users import + Generic/ + VpnServerRoot.cer_0 (server cert) + VpnSettings.xml (raw P2S settings — not needed by the client app) +``` + +The full local env folder (`deploy/envs/local/`) is gitignored via +`deploy/envs/.gitignore`, so the profile never reaches the repo. + +## Sensitivity + +The `azurevpnconfig.xml` is **semi-sensitive but not credential-bearing**: + +- Contains: gateway public hostname (already public via DNS), Entra + tenant ID, AAD audience GUID, server certificate, OpenVPN protocol + parameters. +- Does NOT contain: any user credential, secret, token, or pre-shared + key. End users authenticate with their own Entra ID via the Azure + VPN Client app at connect time. + +You can safely distribute the same XML to every authorized user — +treat it like a configuration file, not a key. Do not commit it to +git (the gitignore already prevents this), do not paste contents +into public chat, and do not include it in screenshots that show +the gateway hostname or audience GUID unless those are already public +for your environment. + +## Identifier formats + +The script takes one identifier: `-EnvName `. Examples: + +- `chkrawvpn` (matches `deploy/envs/local/chkrawvpn/.env`) +- `pschkrawvpn` (same stamp under a different name) + +## Worked examples + +### Fresh download for a stamp + +```pwsh +pwsh -NoProfile -ExecutionPolicy Bypass ` + -File deploy/scripts/auth/Get-VpnClientProfile.ps1 ` + -EnvName +``` + +Output: profile extracted under `deploy/envs/local//vpn-client/`, +key XML path printed for the operator to import or distribute. + +### Re-download after re-deploying the gateway + +```pwsh +pwsh -NoProfile -ExecutionPolicy Bypass ` + -File deploy/scripts/auth/Get-VpnClientProfile.ps1 ` + -EnvName ` + -Force +``` + +`-Force` overwrites the existing folder. Use whenever the gateway +audience, address pool, or AAD config changes — old profiles cached +on user laptops will continue working until rotated, but new users +should always get the fresh XML. + +### Download and open the folder + +```pwsh +pwsh -NoProfile -ExecutionPolicy Bypass ` + -File deploy/scripts/auth/Get-VpnClientProfile.ps1 ` + -EnvName ` + -OpenFolder +``` + +Opens Explorer (Windows) at the extracted folder for quick attach-to-email. + +### Custom output directory + +```pwsh +pwsh -NoProfile -ExecutionPolicy Bypass ` + -File deploy/scripts/auth/Get-VpnClientProfile.ps1 ` + -EnvName ` + -OutDir C:\Users\me\Desktop\vpn-share +``` + +Useful when staging profiles for several stamps in one shareable +location. The `-OutDir` override is **not** gitignored automatically — +keep it outside the repo. + +## End-user instructions to bundle with the profile + +When sharing `azurevpnconfig.xml` with an end user, include: + +1. Install the **Azure VPN Client**: + - Windows: Microsoft Store + - macOS: App Store +2. Open the Azure VPN Client app. +3. Choose **Import** and select the `azurevpnconfig.xml` file. +4. Click **Save** — the new connection appears in the left pane. +5. Click **Connect**. Sign in with your work Entra ID account. + +If the user gets `Couldn't load packet length bytes` or +`Server did not respond properly to VPN Control Packets` errors, +their network egress (often a corp managed proxy) is intercepting +TCP 443 and killing the OpenVPN handshake — see PR #53 history and +`docs/deploying-to-aks.md` for the diagnostic. The profile is fine; +the problem is the network they are on. + +## Cross-references + +- `deploy/scripts/auth/Get-VpnClientProfile.ps1` — the script +- `deploy/scripts/auth/Setup-PortalAuth.ps1` + skill `pilotswarm-portal-app-reg` — companion portal auth setup +- `deploy/scripts/auth/Set-PortalAuthAssignments.ps1` + skill `pilotswarm-portal-auth-assignments` — companion portal access management +- `docs/deploying-to-aks.md` (§ "Optional: VPN Gateway P2S") — full VPN ingress documentation +- `docs/proposals/vpn-access-management.md` — future-work proposal to fold VPN access into the deployer-owned per-stamp app model diff --git a/deploy/envs/template.env b/deploy/envs/template.env index b7542d43..9f020e13 100644 --- a/deploy/envs/template.env +++ b/deploy/envs/template.env @@ -81,6 +81,17 @@ LOG_ANALYTICS_RETENTION_DAYS=30 # ] WAF_CUSTOM_RULES_FILE= +# Optional path (relative to repo root or absolute) to a JSON file containing +# an array of AppGw WAF custom rules merged into the AppGw WAF policy's +# `customRules.rules` array at deploy time. Recommended location: +# `deploy/envs/local//appgw-waf-custom-rules.json` (gitignored). When +# VPN_GATEWAY_ENABLED=true the bicep auto-seeds three guard rules at +# priorities 90/91/92 (AllowAfd / AllowVpn / BlockOther) — operator rules +# loaded from this file MUST start at priority >= 100. Leave unset to deploy +# with no operator-supplied AppGw rules. Mirrors the AFD-side +# WAF_CUSTOM_RULES_FILE machinery above. +APPGW_WAF_CUSTOM_RULES_FILE= + # ─── Edge / public-ingress topology ─── # Two modes: # afd — Azure Front Door + Private Link → AppGw private listener. @@ -174,3 +185,47 @@ FOUNDRY_SKU=S0 # `deploy/envs/local/` tree is gitignored, so the file is per-stamp and # never checked in. Required when FOUNDRY_ENABLED=true; ignored otherwise. FOUNDRY_DEPLOYMENTS_FILE= + +# ─── Optional: VPN Gateway P2S (hybrid AFD + VPN) ─── +# Adds an Azure VPN Gateway (Point-to-Site, Microsoft Entra ID auth, OpenVPN +# protocol) as an additive, optional ingress path that coexists with the AFD +# edge mode. The trusted-bypass pattern for authenticated tenant users who +# hold a valid Entra ID token but are blocked at the public edge by +# operator-defined AFD WAF allow-lists (typically service-tag, IP-range, or +# header-based rules that gate the public ingress to a known managed-network +# population). +# +# Hard requirements (enforced by deploy/scripts/lib/overlay-contracts.mjs +# validateVpnGatewayCombo()): +# * EDGE_MODE=afd — the auto-seeded AppGw WAF guard rules at +# priorities 90/91/92 only make sense when AFD is +# the public ingress and AppGw is private-FE. +# * TLS_SOURCE=akv — Let's Encrypt is unsupported on the VPN path +# (HTTP-01 cannot reach a VPN-only client). +# * SSL_CERT_DOMAIN_SUFFIX must be set (managed Private DNS zone uses it). +# * VPN_CLIENT_ADDRESS_POOL must NOT overlap the VNet (default 10.20.0.0/16). +# +# Cost: ~$450/mo for VpnGw2AZ + Azure Private DNS Resolver inbound endpoint +# (~$280 VPN + ~$170 resolver). The Resolver is co-provisioned unconditionally +# when VPN_GATEWAY_ENABLED=true because P2S clients cannot reach 168.63.129.16 +# through the tunnel — without the Resolver inbound endpoint advertised via +# the VNet's dhcpOptions.dnsServers (the supported P2S DNS-push path), clients +# cannot resolve the portal Private DNS Zone hostname over the VPN. +# First-deploy time: 45+ min (gateway provisioning is the long pole). +# Subsequent param changes are faster but VPN gateway updates are still +# measured in minutes. +# +# Tenant: VPN runs in the same Entra ID tenant as the rest of the stamp +# (uses existing AZURE_TENANT_ID — threaded into base-infra bicep as the +# `tenantId` param and consumed by the VPN gateway module's AAD config). +# +# Required Conditional Access policy (tenant admin): the Azure VPN Client +# app `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` must be permitted for the user +# groups you want to allow over the VPN. The legacy audience +# `41b23e61-6c1e-4545-b367-cd054e0ed4b4` is supported as an override only +# for stamps that have to interop with older Azure VPN client builds — see +# the `pilotswarm-new-env-deploy` skill for the documented override flow. +VPN_GATEWAY_ENABLED=false +VPN_GATEWAY_SKU=VpnGw2AZ # @allowed: VpnGw2AZ / VpnGw3AZ / VpnGw4AZ / VpnGw5AZ. Generation2 AZ SKUs only. VpnGw1AZ (Generation1) is excluded — silently drops OpenVPN+AAD HardResetClientV2 packets ~5s after TCP accept. Non-AZ SKUs (VpnGw1/2/3) and Basic are also rejected. +VPN_CLIENT_ADDRESS_POOL=172.16.200.0/24 # MUST NOT overlap the VNet (default 10.20.0.0/16). /24 gives ~250 concurrent clients. +VPN_AAD_AUDIENCE=c632b3df-fb67-4d84-bdcf-b95ad541b5c8 # Microsoft-registered Azure VPN Client app ID. Legacy 41b23e61-6c1e-4545-b367-cd054e0ed4b4 supported as documented override. diff --git a/deploy/scripts/auth/Get-VpnClientProfile.ps1 b/deploy/scripts/auth/Get-VpnClientProfile.ps1 new file mode 100644 index 00000000..3c9459cc --- /dev/null +++ b/deploy/scripts/auth/Get-VpnClientProfile.ps1 @@ -0,0 +1,178 @@ +<# +.SYNOPSIS + Downloads the Azure VPN Client profile (azurevpnconfig.xml) for a + PilotSwarm VPN-enabled stamp and extracts it into the per-stamp + local env folder. + +.DESCRIPTION + For stamps deployed with VPN_GATEWAY_ENABLED=true, this script + resolves the gateway from the bicep outputs cache, calls + `az network vnet-gateway vpn-client generate` to mint a fresh + signed SAS URL, downloads the profile zip, and extracts it to: + + deploy/envs/local//vpn-client/ + + The key artifact for end users is: + + deploy/envs/local//vpn-client/AzureVPN/azurevpnconfig.xml + + which imports directly into the Azure VPN Client app (Windows/macOS). + + The local env folder is gitignored (deploy/envs/.gitignore -> `local/`), + so the profile never reaches git. Treat the XML as semi-sensitive: it + contains the gateway public endpoint + AAD audience GUID, but no user + credentials. + +.PARAMETER EnvName + The PilotSwarm stamp name (e.g. `chkrawvpn`). Required. + The script reads VPN_GATEWAY_ID from + deploy/.tmp//bicep-outputs.cache.json. + +.PARAMETER OutDir + Optional override for the output directory. Default: + deploy/envs/local//vpn-client/ + +.PARAMETER Force + Overwrite an existing vpn-client folder. Default: refuses to clobber. + +.PARAMETER OpenFolder + Open the output folder in Explorer after extraction (Windows only). + +.EXAMPLE + .\Get-VpnClientProfile.ps1 -EnvName chkrawvpn + + Downloads and extracts the VPN client profile to + deploy/envs/local/chkrawvpn/vpn-client/. + +.EXAMPLE + .\Get-VpnClientProfile.ps1 -EnvName chkrawvpn -Force -OpenFolder + + Re-downloads (clobbers existing), opens the folder when done. +#> + +[CmdletBinding()] +param( + [Parameter(Mandatory=$true)][string]$EnvName, + [Parameter(Mandatory=$false)][string]$OutDir, + [Parameter(Mandatory=$false)][switch]$Force, + [Parameter(Mandatory=$false)][switch]$OpenFolder +) + +$ErrorActionPreference = 'Stop' + +function Get-RepoRoot { + $here = $PSScriptRoot + if (-not $here) { $here = Split-Path -Parent $MyInvocation.MyCommand.Path } + $cur = Resolve-Path $here + while ($cur -and -not (Test-Path (Join-Path $cur '.git'))) { + $parent = Split-Path -Parent $cur + if ($parent -eq $cur) { throw "Could not locate repo root from $here" } + $cur = $parent + } + return $cur +} + +function Test-AzureCliReady { + $az = Get-Command az -ErrorAction SilentlyContinue + if (-not $az) { Write-Error "Azure CLI ('az') not found in PATH."; return $false } + $acct = az account show 2>$null + if ($LASTEXITCODE -ne 0 -or [string]::IsNullOrWhiteSpace($acct)) { + Write-Error "Not signed in. Run 'az login' first." + return $false + } + return $true +} + +Write-Host "Get-VpnClientProfile - Azure VPN client profile downloader" -ForegroundColor Green +Write-Host "" + +if (-not (Test-AzureCliReady)) { throw "Azure CLI not ready." } + +$repo = Get-RepoRoot +$cache = Join-Path $repo "deploy/.tmp/$EnvName/bicep-outputs.cache.json" +if (-not (Test-Path $cache)) { + throw "bicep-outputs.cache.json not found at $cache. Has '$EnvName' been deployed yet?" +} + +$outputs = Get-Content $cache -Raw | ConvertFrom-Json +if (-not ($outputs.PSObject.Properties.Name -contains 'VPN_GATEWAY_ID')) { + throw "VPN_GATEWAY_ID missing from $cache. Is VPN_GATEWAY_ENABLED=true for '$EnvName'?" +} +$gwId = [string]$outputs.VPN_GATEWAY_ID +if ([string]::IsNullOrWhiteSpace($gwId)) { + throw "VPN_GATEWAY_ID is empty in $cache. Is VPN_GATEWAY_ENABLED=true for '$EnvName'?" +} + +# Parse `/subscriptions//resourceGroups//providers/Microsoft.Network/virtualNetworkGateways/` +$gwName = $gwId.Split('/')[-1] +$gwRg = ($gwId -split '/resourceGroups/')[1].Split('/')[0] + +Write-Host "Stamp : $EnvName" +Write-Host "VPN gateway : $gwName" +Write-Host "Resource grp : $gwRg" +Write-Host "" + +if ([string]::IsNullOrWhiteSpace($OutDir)) { + $OutDir = Join-Path $repo "deploy/envs/local/$EnvName/vpn-client" +} +$OutDir = [System.IO.Path]::GetFullPath($OutDir) + +if (Test-Path $OutDir) { + if (-not $Force) { + Write-Error "Output directory already exists: $OutDir. Pass -Force to overwrite." + exit 1 + } + Write-Host "Removing existing $OutDir (-Force)..." -ForegroundColor Yellow + Remove-Item $OutDir -Recurse -Force +} + +Write-Host "Generating VPN client profile (this can take ~30-60s on first call)..." -ForegroundColor Yellow +$url = az network vnet-gateway vpn-client generate ` + --resource-group $gwRg ` + --name $gwName ` + --authentication-method EAPTLS ` + -o tsv 2>&1 | Where-Object { $_ -notmatch 'WARNING' } | Select-Object -Last 1 +if ($LASTEXITCODE -ne 0 -or [string]::IsNullOrWhiteSpace($url) -or ($url -notmatch '^https?://')) { + throw "az network vnet-gateway vpn-client generate failed. Output: $url" +} +Write-Host " OK: signed URL retrieved (SAS, ~1h validity)" -ForegroundColor Green + +$tempZip = [System.IO.Path]::GetTempFileName() + ".zip" +try { + Write-Host "Downloading profile zip..." -ForegroundColor Yellow + Invoke-WebRequest -Uri $url -OutFile $tempZip -UseBasicParsing + $zipBytes = (Get-Item $tempZip).Length + Write-Host " OK: $zipBytes bytes" -ForegroundColor Green + + Write-Host "Extracting to $OutDir ..." -ForegroundColor Yellow + New-Item -ItemType Directory -Force -Path $OutDir | Out-Null + Expand-Archive -Path $tempZip -DestinationPath $OutDir -Force + Write-Host " OK: extracted" -ForegroundColor Green +} finally { + Remove-Item $tempZip -Force -ErrorAction SilentlyContinue +} + +$xmlPath = Join-Path $OutDir "AzureVPN\azurevpnconfig.xml" +if (-not (Test-Path $xmlPath)) { + Write-Warning "Expected AzureVPN/azurevpnconfig.xml not found under $OutDir. Inspect the contents manually." +} else { + Write-Host "" + Write-Host "=== VPN client profile ready ===" -ForegroundColor Green + Write-Host "Profile XML : $xmlPath" + Write-Host "Folder : $OutDir" + Write-Host "" + Write-Host "Next steps:" -ForegroundColor Cyan + Write-Host " 1. Install the Azure VPN Client (Microsoft Store on Windows; App Store on macOS)." + Write-Host " 2. In the client: Import -> select $xmlPath" + Write-Host " 3. Connect using your Entra ID account." + Write-Host "" + Write-Host "Distribution:" + Write-Host " - The profile XML is the same for every user (no per-user credentials)." + Write-Host " - Each end user still authenticates with their own Entra ID." + Write-Host " - The folder is under deploy/envs/local/ which is gitignored — never commit it." + Write-Host "" +} + +if ($OpenFolder -and $IsWindows -ne $false) { + try { Start-Process explorer.exe -ArgumentList $OutDir } catch { Write-Warning "Could not open Explorer: $_" } +} diff --git a/deploy/scripts/auth/Setup-PortalAuth.ps1 b/deploy/scripts/auth/Setup-PortalAuth.ps1 index 497776a6..002d46b4 100644 --- a/deploy/scripts/auth/Setup-PortalAuth.ps1 +++ b/deploy/scripts/auth/Setup-PortalAuth.ps1 @@ -60,7 +60,9 @@ deploy/.tmp//bicep-outputs.cache.json if -RedirectUri is not given. .PARAMETER RedirectUri - Explicit SPA redirect URI (https://...). Overrides EnvName auto-discovery. + Explicit SPA redirect URI(s) (https://...). Overrides EnvName auto-discovery. + Accepts an array — pass multiple values to register more than one redirect URI + (e.g. an AFD endpoint and a VPN-private portal hostname on the same stamp). .PARAMETER ExistingAppId If provided, the script will NOT create a new app. Instead it appends @@ -184,7 +186,7 @@ param( [Parameter(Mandatory=$true)][string]$ServiceTreeId, [Parameter(Mandatory=$false)][string]$DisplayName, [Parameter(Mandatory=$false)][string]$EnvName, - [Parameter(Mandatory=$false)][string]$RedirectUri, + [Parameter(Mandatory=$false)][string[]]$RedirectUri, [Parameter(Mandatory=$false)][string]$ExistingAppId, [Parameter(Mandatory=$false)][string]$Owner, [Parameter(Mandatory=$false)][switch]$SkipGroupsClaim = $false, @@ -218,37 +220,57 @@ function Resolve-RedirectUriFromEnv { $cache = Join-Path $repo "deploy/.tmp/$Env/bicep-outputs.cache.json" if (-not (Test-Path $cache)) { Write-Warning "bicep-outputs.cache.json not found at $cache - cannot auto-discover redirect URI." - return $null + return @() } try { $outputs = Get-Content $cache -Raw | ConvertFrom-Json } catch { Write-Warning "Failed to parse ${cache}: $_" - return $null + return @() } # The deploy orchestrator's bicep cache uses UPPER_SNAKE keys # (deploy/scripts/lib/bicep-outputs-cache.mjs). For EDGE_MODE=afd the # public-facing URL is the AFD endpoint; for private/AppGw modes it - # is the PORTAL_HOSTNAME (AppGw FQDN). Try EDGE_MODE-aware order. + # is the PORTAL_HOSTNAME (AppGw FQDN). When VPN_GATEWAY_ENABLED=true + # on an AFD stamp we need BOTH: the AFD endpoint (public path) AND + # the PORTAL_HOSTNAME (VPN-private path resolved via the AppGw + # Private DNS A record — matches the AppGw HTTPS listener / AKV cert). $edgeMode = if ($outputs.PSObject.Properties.Name -contains 'EDGE_MODE') { [string]$outputs.EDGE_MODE } else { '' } + $vpnEnabled = $false + if ($outputs.PSObject.Properties.Name -contains 'VPN_GATEWAY_ID') { + $vpnId = [string]$outputs.VPN_GATEWAY_ID + if (-not [string]::IsNullOrWhiteSpace($vpnId)) { $vpnEnabled = $true } + } + $orderedKeys = if ($edgeMode -ieq 'afd') { - @('FRONT_DOOR_ENDPOINT_HOST_NAME', 'PORTAL_HOSTNAME') + if ($vpnEnabled) { + # AFD + VPN: register both endpoints on the same app reg. + @('FRONT_DOOR_ENDPOINT_HOST_NAME', 'PORTAL_HOSTNAME') + } else { + @('FRONT_DOOR_ENDPOINT_HOST_NAME', 'PORTAL_HOSTNAME') + } } else { @('PORTAL_HOSTNAME', 'FRONT_DOOR_ENDPOINT_HOST_NAME') } - $portalHost = $null + + $uris = New-Object System.Collections.Generic.List[string] foreach ($k in $orderedKeys) { if ($outputs.PSObject.Properties.Name -contains $k) { $v = [string]$outputs.$k - if (-not [string]::IsNullOrWhiteSpace($v)) { $portalHost = $v; break } + if (-not [string]::IsNullOrWhiteSpace($v)) { + if ($v -notmatch '^https?://') { $v = "https://$v" } + $v = $v.TrimEnd('/') + if (-not $uris.Contains($v)) { $uris.Add($v) } + # Non-AFD+VPN modes: stop at the first match (preserves old behavior). + if (-not ($edgeMode -ieq 'afd' -and $vpnEnabled)) { break } + } } } - if (-not $portalHost) { + if ($uris.Count -eq 0) { Write-Warning "Could not find a portal hostname (FRONT_DOOR_ENDPOINT_HOST_NAME / PORTAL_HOSTNAME) in $cache. Pass -RedirectUri explicitly." - return $null + return @() } - if ($portalHost -notmatch '^https?://') { $portalHost = "https://$portalHost" } - return $portalHost.TrimEnd('/') + return @($uris) } function Build-RequiredResourceAccessJson { @@ -299,14 +321,18 @@ function Build-AppRolesJson { } function Build-PlatformPatchBodyJson { - param([string]$RedirectUriArg) + param([string[]]$RedirectUris) # Literal JSON to guarantee redirectUris is always a JSON array, even when empty # or single-element. - $urisJson = if ([string]::IsNullOrWhiteSpace($RedirectUriArg)) { + $items = @($RedirectUris | Where-Object { -not [string]::IsNullOrWhiteSpace($_) }) + $urisJson = if ($items.Count -eq 0) { "[]" } else { - $escaped = $RedirectUriArg.Replace('\', '\\').Replace('"', '\"') - "[`"$escaped`"]" + $escapedItems = $items | ForEach-Object { + $esc = $_.Replace('\', '\\').Replace('"', '\"') + "`"$esc`"" + } + "[" + ($escapedItems -join ",") + "]" } return @" { @@ -360,14 +386,19 @@ if ([string]::IsNullOrWhiteSpace($Owner)) { } } -# Resolve redirect URI -if ([string]::IsNullOrWhiteSpace($RedirectUri) -and -not [string]::IsNullOrWhiteSpace($EnvName)) { +# Resolve redirect URIs +if (($null -eq $RedirectUri -or $RedirectUri.Count -eq 0) -and -not [string]::IsNullOrWhiteSpace($EnvName)) { $RedirectUri = Resolve-RedirectUriFromEnv -Env $EnvName - if ($RedirectUri) { Write-Host "Resolved redirect URI from env '$EnvName': $RedirectUri" } + if ($RedirectUri -and $RedirectUri.Count -gt 0) { + Write-Host "Resolved redirect URI(s) from env '$EnvName':" + foreach ($u in $RedirectUri) { Write-Host " - $u" } + } } -if (-not [string]::IsNullOrWhiteSpace($RedirectUri)) { - if ($RedirectUri -notmatch '^https://') { - throw "RedirectUri must be https://. Got: $RedirectUri" +if ($RedirectUri -and $RedirectUri.Count -gt 0) { + foreach ($u in $RedirectUri) { + if ($u -notmatch '^https://') { + throw "RedirectUri must be https://. Got: $u" + } } } @@ -391,8 +422,9 @@ if (-not [string]::IsNullOrWhiteSpace($ExistingAppId)) { Write-Host "" Write-Host "Mode: ADD REDIRECT URI to existing app" -ForegroundColor Cyan Write-Host " Existing App ID: $ExistingAppId" - Write-Host " New redirect URI: $RedirectUri" - if ([string]::IsNullOrWhiteSpace($RedirectUri)) { + Write-Host " New redirect URI(s):" + foreach ($u in @($RedirectUri)) { Write-Host " - $u" } + if ($null -eq $RedirectUri -or $RedirectUri.Count -eq 0) { throw "-RedirectUri (or -EnvName for auto-discovery) is required when -ExistingAppId is set." } @@ -401,16 +433,17 @@ if (-not [string]::IsNullOrWhiteSpace($ExistingAppId)) { $objectId = $existing.id $currentUris = @() if ($existing.spa -and $existing.spa.redirectUris) { $currentUris = @($existing.spa.redirectUris) } - if ($currentUris -contains $RedirectUri) { - Write-Host "Redirect URI already present - no change." -ForegroundColor Yellow + $toAdd = @($RedirectUri | Where-Object { $currentUris -notcontains $_ }) + if ($toAdd.Count -eq 0) { + Write-Host "All redirect URIs already present - no change." -ForegroundColor Yellow } else { - $newUris = @($currentUris + $RedirectUri | Select-Object -Unique) + $newUris = @(($currentUris + $RedirectUri) | Select-Object -Unique) $escapedUris = $newUris | ForEach-Object { '"' + ($_.Replace('\', '\\').Replace('"', '\"')) + '"' } $urisArrJson = "[" + ($escapedUris -join ",") + "]" $body = "{`"spa`":{`"redirectUris`":$urisArrJson}}" - Invoke-GraphPatch -ObjectId $objectId -BodyJson $body -Description "Append SPA redirect URI" + Invoke-GraphPatch -ObjectId $objectId -BodyJson $body -Description "Append SPA redirect URI(s)" } $clientId = $existing.appId @@ -419,7 +452,7 @@ if (-not [string]::IsNullOrWhiteSpace($ExistingAppId)) { Write-Host "Mode: CREATE NEW app registration" -ForegroundColor Cyan Write-Host " Display name : $DisplayName" Write-Host " Tenant : $tenantId (single-tenant)" - Write-Host " Redirect URI : $(if ($RedirectUri) { $RedirectUri } else { '' })" + Write-Host " Redirect URI(s) : $(if ($RedirectUri -and $RedirectUri.Count -gt 0) { ($RedirectUri -join ', ') } else { '' })" Write-Host " Service Tree ID : $ServiceTreeId" Write-Host " Groups claim : $(if ($SkipGroupsClaim) { 'NO' } else { 'YES (idToken+accessToken+saml2Token)' })" Write-Host " App roles : $(if ($CreateAppRoles) { 'YES (admin + user)' } else { 'NO' })" @@ -464,7 +497,7 @@ if (-not [string]::IsNullOrWhiteSpace($ExistingAppId)) { } # Configure SPA platform + implicit grant via Graph PATCH (az ad app create does not expose these) - $platformBody = Build-PlatformPatchBodyJson -RedirectUriArg $RedirectUri + $platformBody = Build-PlatformPatchBodyJson -RedirectUris @($RedirectUri) Invoke-GraphPatch -ObjectId $objectId -BodyJson $platformBody -Description "Configure SPA platform + implicit grant (id+access tokens)" # Set owner @@ -504,7 +537,7 @@ $summary = [ordered]@{ clientId = $clientId objectId = $objectId displayName = $DisplayName - redirectUri = $RedirectUri + redirectUris = @($RedirectUri) envName = $EnvName serviceTreeId = $ServiceTreeId appRolesCreated = [bool]$CreateAppRoles @@ -523,7 +556,9 @@ Write-Host "" Write-Host "=== PilotSwarm Portal App Registration ===" -ForegroundColor Green Write-Host "PORTAL_AUTH_ENTRA_TENANT_ID = $tenantId" Write-Host "PORTAL_AUTH_ENTRA_CLIENT_ID = $clientId" -if ($RedirectUri) { Write-Host "Redirect URI = $RedirectUri" } +if ($RedirectUri -and $RedirectUri.Count -gt 0) { + foreach ($u in $RedirectUri) { Write-Host "Redirect URI = $u" } +} Write-Host "==========================================" -ForegroundColor Green Write-Host "" Write-Host "Next steps:" -ForegroundColor Cyan diff --git a/deploy/scripts/deploy.mjs b/deploy/scripts/deploy.mjs index d258030e..6dda1f6f 100644 --- a/deploy/scripts/deploy.mjs +++ b/deploy/scripts/deploy.mjs @@ -133,6 +133,13 @@ function printHelp() { "", "Spec: .paw/work/oss-deploy-script/Spec.md", "", + "Notes:", + " • VPN_GATEWAY_ENABLED=true adds 45+ min to the first `bicep` step", + " and ~$450/mo (VpnGw2AZ + Private DNS Resolver) to the running stamp.", + " Coexists with the AFD edge mode — does NOT replace it. See", + " deploy/envs/template.env for the full env-var roster and the", + " AKV-only constraint.", + "", ].join("\n"), ); } @@ -395,15 +402,24 @@ async function main() { // off-path keys get auto-stubbed to `unused`. Adding a new overlay key // is a one-line change in overlay-contracts.mjs — deploy.mjs picks it // up automatically. - const missingRequired = validateRequiredEnv({ edgeMode, tlsSource, env }); - if (missingRequired.length > 0) { - log( - "err", - `EDGE_MODE='${edgeMode}' TLS_SOURCE='${tlsSource}' requires ${missingRequired.join(", ")} to be set ` + - `(per overlay contract in deploy/scripts/lib/overlay-contracts.mjs). ` + - `Re-run \`npm run deploy:new-env -- ${envName} --force\` and supply values when prompted, ` + - `or hand-edit deploy/envs/local/${envName}/.env.`, - ); + const { missing: missingRequired, combo: comboErrors } = validateRequiredEnv({ edgeMode, tlsSource, env }); + if (missingRequired.length > 0 || comboErrors.length > 0) { + if (missingRequired.length > 0) { + log( + "err", + `EDGE_MODE='${edgeMode}' TLS_SOURCE='${tlsSource}' requires ${missingRequired.join(", ")} to be set ` + + `(per overlay contract in deploy/scripts/lib/overlay-contracts.mjs). ` + + `Re-run \`npm run deploy:new-env -- ${envName} --force\` and supply values when prompted, ` + + `or hand-edit deploy/envs/local/${envName}/.env.`, + ); + } + // Combo errors render as named errors with a hint that points operators + // at the env file / docs — NOT at the scaffolder (re-running new-env.mjs + // would clobber operator edits, and the underlying problem isn't an + // unset key, it's a bad combination). + for (const e of comboErrors) { + log("err", `[${e.code}] ${e.message} ${e.hint}`); + } process.exit(1); } // TLS_SOURCE=akv no longer requires a pre-set PORTAL_TLS_ISSUER_NAME — diff --git a/deploy/scripts/lib/appgw-waf-rules.mjs b/deploy/scripts/lib/appgw-waf-rules.mjs new file mode 100644 index 00000000..cb9ea3ed --- /dev/null +++ b/deploy/scripts/lib/appgw-waf-rules.mjs @@ -0,0 +1,113 @@ +// Mirror of the AppGw WAF custom-rules merge logic in +// `deploy/services/base-infra/bicep/application-gateway.bicep:82-142`. +// +// The bicep file remains the source of truth for the runtime WAF policy; +// this JS helper exists so the merge shape can be unit-tested without +// shelling out to `az`. The two implementations are kept in lockstep by +// code review — any change to the bicep `autoSeedRules` / `mergedCustomRules` +// vars MUST be reflected here, and vice versa. +// +// The bicep `var` shape is mirrored exactly so the four merged-shape cases +// (VPN on/off × operator rules present/absent) can assert the exact rule +// objects rather than just lengths. + +import { existsSync, readFileSync } from "node:fs"; +import { isAbsolute, join } from "node:path"; +import { REPO_ROOT } from "./common.mjs"; + +// Build the auto-seeded AppGw WAF guard rules. Returns [] when +// vpnGatewayEnabled is false, mirroring the bicep ternary. +// +// frontDoorId / vpnClientAddressPool are inlined verbatim into the +// matchValues arrays, matching the bicep — passing empty strings produces +// rules with empty matchValues which Azure will reject at deploy time, +// but the validator gate (validateVpnGatewayCombo + the FRONT_DOOR_ID +// alias being non-empty when VPN is enabled) guards against that case +// before we get here. +export function buildAutoSeedRules({ vpnGatewayEnabled, frontDoorId, vpnClientAddressPool }) { + if (!vpnGatewayEnabled) return []; + return [ + { + name: "AllowAfd", + priority: 90, + ruleType: "MatchRule", + action: "Allow", + matchConditions: [ + { + matchVariables: [ + { variableName: "RequestHeaders", selector: "X-Azure-FDID" }, + ], + operator: "Equal", + matchValues: [frontDoorId], + }, + ], + }, + { + name: "AllowVpn", + priority: 91, + ruleType: "MatchRule", + action: "Allow", + matchConditions: [ + { + matchVariables: [{ variableName: "RemoteAddr" }], + operator: "IPMatch", + matchValues: [vpnClientAddressPool], + }, + ], + }, + { + name: "BlockOther", + priority: 92, + ruleType: "MatchRule", + action: "Block", + matchConditions: [ + { + matchVariables: [{ variableName: "RemoteAddr" }], + operator: "IPMatch", + matchValues: ["0.0.0.0/0"], + }, + ], + }, + ]; +} + +// Mirror of bicep `var mergedCustomRules = concat(autoSeedRules, appgwWafCustomRules)`. +export function buildMergedCustomRules({ vpnGatewayEnabled, frontDoorId, vpnClientAddressPool, operatorRules }) { + const seed = buildAutoSeedRules({ vpnGatewayEnabled, frontDoorId, vpnClientAddressPool }); + const ops = Array.isArray(operatorRules) ? operatorRules : []; + return [...seed, ...ops]; +} + +// Resolve APPGW_WAF_CUSTOM_RULES_FILE to an absolute path, parse its JSON +// contents, and return `{ absPath, rules }`. Mirrors the AFD-side +// WAF_CUSTOM_RULES_FILE machinery in deploy-bicep.mjs:226-243. +// +// Throws with the same error shape as the AFD path when the file is +// missing or malformed (so the operator sees one diagnostic shape across +// both edges). +export function resolveAppgwWafCustomRulesFile(raw) { + if (raw === undefined || raw === null || String(raw).trim() === "") { + return null; // unset = no file = no operator rules + } + const abs = isAbsolute(raw) ? raw : join(REPO_ROOT, raw); + if (!existsSync(abs)) { + throw new Error( + `APPGW_WAF_CUSTOM_RULES_FILE points to a missing file: ${abs}. ` + + `Either unset it or create the JSON array file (gitignored under deploy/envs/local/).`, + ); + } + let parsed; + try { + parsed = JSON.parse(readFileSync(abs, "utf8")); + } catch (e) { + throw new Error( + `APPGW_WAF_CUSTOM_RULES_FILE is not valid JSON (${abs}): ${e.message}`, + ); + } + if (!Array.isArray(parsed)) { + throw new Error( + `APPGW_WAF_CUSTOM_RULES_FILE must contain a JSON array of WAF custom rules (${abs}).`, + ); + } + return { absPath: abs, rules: parsed }; +} diff --git a/deploy/scripts/lib/common.mjs b/deploy/scripts/lib/common.mjs index 9d29025c..ccfcdd0d 100644 --- a/deploy/scripts/lib/common.mjs +++ b/deploy/scripts/lib/common.mjs @@ -41,11 +41,17 @@ export function parseEnvFile(path) { const key = line.slice(0, eq).trim(); if (!/^[A-Za-z_][A-Za-z0-9_]*$/.test(key)) continue; let value = line.slice(eq + 1).trim(); - if ( + const quoted = (value.startsWith('"') && value.endsWith('"')) || - (value.startsWith("'") && value.endsWith("'")) - ) { + (value.startsWith("'") && value.endsWith("'")); + if (quoted) { value = value.slice(1, -1); + } else { + // Strip inline `# comment` for unquoted values. Standard dotenv + // behavior: the `#` must be preceded by whitespace so values + // containing `#` (e.g. URLs with fragments) aren't truncated. + const commentMatch = value.match(/\s+#.*$/); + if (commentMatch) value = value.slice(0, commentMatch.index).trimEnd(); } out[key] = value; } diff --git a/deploy/scripts/lib/deploy-bicep.mjs b/deploy/scripts/lib/deploy-bicep.mjs index 017bccf2..ad5f5625 100644 --- a/deploy/scripts/lib/deploy-bicep.mjs +++ b/deploy/scripts/lib/deploy-bicep.mjs @@ -21,6 +21,7 @@ import { saveMarker, } from "./deploy-marker.mjs"; import { assertFoundryDeploymentsValid } from "./validate-foundry-deployments.mjs"; +import { resolveAppgwWafCustomRulesFile } from "./appgw-waf-rules.mjs"; // Bicep main.bicep paths and params templates are derived by convention from // the module name: deploy/services//bicep/{main.bicep,.params.template.json}. @@ -221,6 +222,32 @@ async function deployOne({ moduleName, service, envName, env, region, stagingDir baseArgs.push("--parameters", `foundryDeployments=@${abs}`); log("info", `[${moduleName}] applying Foundry deployments from ${abs}`); } + // AppGw WAF custom rules: optional JSON array file at + // APPGW_WAF_CUSTOM_RULES_FILE. Mirrors the AFD-side WAF_CUSTOM_RULES_FILE + // pattern below (gitignored location, same error shape on missing file). + // When VPN_GATEWAY_ENABLED=true the bicep auto-seeds three guard rules + // at priorities 90/91/92 — operator rules from this file MUST start at + // priority >= 100. The bicep param defaults to [] so this is purely + // additive for non-VPN stamps. + if (env.APPGW_WAF_CUSTOM_RULES_FILE) { + const raw = env.APPGW_WAF_CUSTOM_RULES_FILE; + const abs = isAbsolute(raw) ? raw : join(REPO_ROOT, raw); + if (!existsSync(abs)) { + throw new Error( + `APPGW_WAF_CUSTOM_RULES_FILE points to a missing file: ${abs}. ` + + `Either unset it or create the JSON array file (gitignored under deploy/envs/local/).`, + ); + } + // Parse + structural validation BEFORE we shell out to `az`. Mirrors + // the AFD-side WAF_CUSTOM_RULES_FILE precedent (fail-closed, named + // error). The existsSync check above stays as the first gate so a + // missing-file diagnostic is distinct from a malformed-JSON one. + // Final-review S-2 follow-up — wires the validated helper into the + // deploy path (was previously dead code reachable only via tests). + resolveAppgwWafCustomRulesFile(raw); + baseArgs.push("--parameters", `appgwWafCustomRules=@${abs}`); + log("info", `[${moduleName}] applying AppGw WAF custom rules from ${abs}`); + } } // global-infra optionally accepts a custom-rules JSON file via env var diff --git a/deploy/scripts/lib/overlay-contracts.mjs b/deploy/scripts/lib/overlay-contracts.mjs index 036c369c..f69f2eaa 100644 --- a/deploy/scripts/lib/overlay-contracts.mjs +++ b/deploy/scripts/lib/overlay-contracts.mjs @@ -146,8 +146,81 @@ export function getContract({ edgeMode, tlsSource }) { return c; } -// Pre-deploy validator. Asserts every userRequiredEnvKey is present in -// `env` and non-empty. Returns an array of missing keys (empty = OK). +// Human-readable rendering for VPN combo error codes. Keeps the message +// and remediation hint adjacent to the code so deploy.mjs / new-env.mjs +// don't have to re-derive them. The hint NEVER points at the scaffolder +// (re-running new-env.mjs would clobber operator edits) — it points at +// the env file and the VPN docs. +const VPN_COMBO_ERROR_DETAILS = Object.freeze({ + "vpn-requires-afd": Object.freeze({ + message: + "VPN gateway requires EDGE_MODE='afd' (the auto-seeded AppGw WAF guard at " + + "priorities 90/91/92 only makes sense when AFD is the public ingress and " + + "AppGw is the private front-end).", + hint: + "Set EDGE_MODE=afd in deploy/envs/local//.env, or disable the VPN " + + "gateway by setting VPN_GATEWAY_ENABLED=false. See docs/deploying-to-aks.md.", + }), + "vpn-requires-akv": Object.freeze({ + message: + "VPN gateway requires TLS_SOURCE='akv' or 'akv-selfsigned' " + + "(Let's Encrypt's HTTP-01 challenge cannot reach a VPN-only client).", + hint: + "Set TLS_SOURCE=akv (or akv-selfsigned) in deploy/envs/local//.env, " + + "or disable the VPN gateway by setting VPN_GATEWAY_ENABLED=false. " + + "See docs/deploying-to-aks.md.", + }), + "vpn-requires-domain-suffix": Object.freeze({ + message: + "VPN gateway requires SSL_CERT_DOMAIN_SUFFIX to be set (the managed " + + "Private DNS zone is named from this suffix).", + hint: + "Set SSL_CERT_DOMAIN_SUFFIX in deploy/envs/local//.env. " + + "See docs/deploying-to-aks.md.", + }), + "vpn-pool-overlap": Object.freeze({ + message: + "VPN_CLIENT_ADDRESS_POOL overlaps the stamp VNet CIDR (or is malformed). " + + "The client pool MUST be disjoint from the VNet address space — overlap " + + "would black-hole intra-stamp traffic from connected VPN clients.", + hint: + "Pick a non-overlapping IPv4 CIDR for VPN_CLIENT_ADDRESS_POOL in " + + "deploy/envs/local//.env (the default stamp VNet is 10.20.0.0/16; " + + "set VNET_CIDR if you've customised it). See docs/deploying-to-aks.md.", + }), + "vpn-requires-tenant-id": Object.freeze({ + message: + "VPN gateway requires AZURE_TENANT_ID to be set (the P2S Entra ID " + + "authentication flow needs the tenant GUID; an empty value would fall " + + "through to bicep's empty default and produce a confusing late failure).", + hint: + "Set AZURE_TENANT_ID in deploy/envs/local//.env (it ships populated " + + "in template.env — restore it if it was blanked). See docs/deploying-to-aks.md.", + }), +}); + +// Build a rich combo-error object from a code returned by +// validateVpnGatewayCombo(). Unknown codes get a generic message so a +// future combo-error addition isn't silently rendered as "[object Object]". +function describeVpnComboError(code) { + const detail = VPN_COMBO_ERROR_DETAILS[code]; + if (!detail) { + return { + code, + message: `VPN gateway combo error: ${code}.`, + hint: "See docs/deploying-to-aks.md.", + }; + } + return { code, message: detail.message, hint: detail.hint }; +} + +// Pre-deploy validator. Returns `{ missing, combo }`: +// missing — array of userRequiredEnvKey names that are unset/blank (and +// ACME_EMAIL when malformed on letsencrypt). These render as +// "requires to be set" with a scaffolder hint. +// combo — array of `{ code, message, hint }` objects describing VPN +// gateway combo errors. These render distinctly: a named +// error + a hint that does NOT point at the scaffolder. // Callers decide whether to throw or log — deploy.mjs throws via // process.exit(1); new-env.mjs warn-and-continues at scaffold time. export function validateRequiredEnv({ edgeMode, tlsSource, env }) { @@ -168,7 +241,136 @@ export function validateRequiredEnv({ edgeMode, tlsSource, env }) { missing.push("ACME_EMAIL"); // present but malformed → treat as missing } } - return missing; + // VPN gateway combo validation. Returned as a SEPARATE channel from + // `missing` because the rendering and remediation + // differ — combo errors are not "set this key" errors and should not + // direct operators at the scaffolder. Gated on VPN_GATEWAY_ENABLED=true + // inside validateVpnGatewayCombo so non-VPN stamps are unaffected. + const vpnCodes = validateVpnGatewayCombo({ edgeMode, tlsSource, env }); + const combo = vpnCodes.map(describeVpnComboError); + return { missing, combo }; +} + +// =========================================================================== +// VPN gateway combo validator. +// +// Gated on `env.VPN_GATEWAY_ENABLED === 'true'`. Returns an array of error +// codes (empty = OK). Codes: +// vpn-requires-afd — EDGE_MODE must be 'afd' (the auto-seeded +// AppGw WAF guard at priorities 90/91/92 +// only makes sense when AFD is the public +// ingress and AppGw is private-FE). +// vpn-requires-akv — TLS_SOURCE must be 'akv' or +// 'akv-selfsigned'; Let's Encrypt is +// unsupported on the VPN path (HTTP-01 +// cannot reach a VPN-only client). +// vpn-requires-domain-suffix — SSL_CERT_DOMAIN_SUFFIX must be non-empty +// (managed Private DNS zone uses it). +// vpn-pool-overlap — VPN_CLIENT_ADDRESS_POOL overlaps the +// stamp VNet CIDR. Reads VNET_CIDR from +// env when present, defaults to +// '10.20.0.0/16' (matches vnet.bicep's +// default base address space). +// vpn-requires-tenant-id — AZURE_TENANT_ID must be set (non-empty) +// when the VPN gateway is enabled. +// Closes the gap behind the bicep +// `tenantId=''` default — even though +// template.env ships a populated value, +// an operator who blanks it gets a clear +// pre-deploy error instead of a confusing +// bicep failure. +// +// Returns an empty array when VPN_GATEWAY_ENABLED is anything other than +// the literal string 'true' (handles the unset / 'false' / 'FALSE' / +// boolean-false cases). +// =========================================================================== +export function validateVpnGatewayCombo({ edgeMode, tlsSource, env }) { + const enabled = String(env?.VPN_GATEWAY_ENABLED ?? "").toLowerCase() === "true"; + if (!enabled) return []; + + const errors = []; + + const em = String(edgeMode ?? "").toLowerCase(); + if (em !== "afd") errors.push("vpn-requires-afd"); + + // Accept both `akv` and `akv-selfsigned` (they collapse to the same + // overlay anyway; the only delta is the AKV issuer). The forbidden value + // is `letsencrypt`. + const ts = String(tlsSource ?? "").toLowerCase(); + if (ts !== "akv" && ts !== "akv-selfsigned") errors.push("vpn-requires-akv"); + + const suffix = env?.SSL_CERT_DOMAIN_SUFFIX; + if (suffix === undefined || suffix === null || String(suffix).trim() === "") { + errors.push("vpn-requires-domain-suffix"); + } + + const pool = env?.VPN_CLIENT_ADDRESS_POOL; + // VNET_CIDR is forward-compatible — not in template.env today; default + // mirrors vnet.bicep's `10.20.0.0/16` baseline. + const vnetCidr = env?.VNET_CIDR && String(env.VNET_CIDR).trim() !== "" + ? String(env.VNET_CIDR).trim() + : "10.20.0.0/16"; + if (pool && String(pool).trim() !== "") { + try { + if (cidrsOverlap(String(pool).trim(), vnetCidr)) { + errors.push("vpn-pool-overlap"); + } + } catch { + // Malformed CIDR → also a pool problem. + errors.push("vpn-pool-overlap"); + } + } + + // AZURE_TENANT_ID must be present when VPN is enabled. The bicep side + // declares `param tenantId string = ''` so a blanked value would silently + // flow through and surface as a confusing late error in the gateway + // module — fail-closed here instead. + const tenantId = env?.AZURE_TENANT_ID; + if (tenantId === undefined || tenantId === null || String(tenantId).trim() === "") { + errors.push("vpn-requires-tenant-id"); + } + + return errors; +} + +// IPv4 CIDR overlap helper. Returns true when the two prefixes share any +// address. Pure-JS, no dependencies; only handles IPv4 (the VPN gateway +// supports IPv6 client pools too, but the stamp VNet is IPv4-only — when +// IPv6 support arrives we extend here and add tests). +function cidrsOverlap(a, b) { + const [na, ma] = parseCidr(a); + const [nb, mb] = parseCidr(b); + // Two prefixes overlap iff one contains the other. Apply the SHORTER + // (less specific) mask to both networks — if they match, the longer + // prefix's network is contained in the shorter prefix. + const mask = ma < mb ? ma : mb; + return networkOf(na, mask) === networkOf(nb, mask); +} + +function parseCidr(cidr) { + const m = /^([0-9]{1,3}(?:\.[0-9]{1,3}){3})\/([0-9]{1,2})$/.exec(cidr); + if (!m) throw new Error(`invalid IPv4 CIDR: ${cidr}`); + const ip = m[1].split(".").map((o) => { + const n = Number(o); + if (!Number.isInteger(n) || n < 0 || n > 255) { + throw new Error(`invalid IPv4 octet in ${cidr}`); + } + return n; + }); + const mask = Number(m[2]); + if (!Number.isInteger(mask) || mask < 0 || mask > 32) { + throw new Error(`invalid IPv4 prefix length in ${cidr}`); + } + // Use unsigned right shift to keep the result in [0, 2^32-1]. + const num = ((ip[0] << 24) | (ip[1] << 16) | (ip[2] << 8) | ip[3]) >>> 0; + return [num, mask]; +} + +function networkOf(ipNum, prefix) { + if (prefix === 0) return 0; + // 32-bit left-aligned mask; >>> 0 to coerce back to unsigned. + const mask = (0xffffffff << (32 - prefix)) >>> 0; + return (ipNum & mask) >>> 0; } // Off-path stub-fill. Mutates `env` in place, setting any stubKey that diff --git a/deploy/scripts/new-env.mjs b/deploy/scripts/new-env.mjs index 805bdf33..f1e4c2c1 100644 --- a/deploy/scripts/new-env.mjs +++ b/deploy/scripts/new-env.mjs @@ -54,6 +54,7 @@ import { DEFAULT_EDGE_MODE as CONTRACT_DEFAULT_EDGE_MODE, DEFAULT_TLS_SOURCE as CONTRACT_DEFAULT_TLS_SOURCE, validateRequiredEnv, + validateVpnGatewayCombo, applyStubKeys, } from "./lib/overlay-contracts.mjs"; @@ -113,6 +114,79 @@ function unsupportedReason(edgeMode, tlsSource) { return hit ? hit.reason : null; } +// Normalise a y/n/yes/no/true/false answer to the literal "y" or "n". +// Used by every yes/no INPUT (foundryEnabled, vpnEnabled). +function normaliseYesNo(v) { + const s = String(v ?? "").toLowerCase(); + if (s === "y" || s === "yes" || s === "true") return "y"; + if (s === "n" || s === "no" || s === "false" || s === "") return "n"; + return s; +} + +// VPN gateway combo refusal at scaffold time. Mirrors the +// validateVpnGatewayCombo() error-code style used by deploy.mjs at +// pre-deploy time. Throws a single named error covering BOTH +// edge-mode and tls-source mismatches in one message — the operator +// shouldn't have to bounce the prompt twice to discover both. +// +// Called from interactive flow (after vpnEnabled is captured) AND from +// the non-interactive --vpn-enabled path so flags-only invocations +// (tests, CI) hit the same hard gate. Refusal is a hard error before any +// .env file is written, so operators never end up with a partial env +// file on disk. +function assertVpnGatewayCombo(edgeMode, tlsSource) { + const em = String(edgeMode ?? "").toLowerCase(); + const ts = String(tlsSource ?? "").toLowerCase(); + const reasons = []; + if (em !== "afd") { + reasons.push( + "EDGE_MODE must be 'afd' (got '" + (edgeMode ?? "") + "')", + ); + } + if (ts !== "akv") { + reasons.push( + "TLS_SOURCE must be 'akv' (got '" + (tlsSource ?? "") + "')", + ); + } + if (reasons.length === 0) return; + throw new Error( + "[vpn-incompatible-combo] VPN gateway requires EDGE_MODE=afd + TLS_SOURCE=akv. " + + reasons.join("; ") + ". " + + "Re-run scaffolder with --edge-mode afd --tls-source akv, or omit " + + "--vpn-enabled (defaults to no). See deploy/docs/vpn-p2s.md.", + ); +} + +// VPN client-address-pool overlap check. Reuses +// validateVpnGatewayCombo() so the overlap arithmetic stays in one place +// (cidrsOverlap is module-private to overlay-contracts.mjs, so we drive +// the public helper with a stub env that satisfies every other gate and +// only surfaces the pool check). Returns true| in +// the shape runDeclarativePrompt's `validate` field expects. +function validateVpnClientAddressPool(pool, vnetCidr) { + const env = { + VPN_GATEWAY_ENABLED: "true", + SSL_CERT_DOMAIN_SUFFIX: "scaffolder.local", + AZURE_TENANT_ID: "00000000-0000-0000-0000-000000000000", + VPN_CLIENT_ADDRESS_POOL: pool, + }; + if (vnetCidr) env.VNET_CIDR = vnetCidr; + // Drive with a known-valid edge/tls so only pool/tenant/etc errors can fire. + const codes = validateVpnGatewayCombo({ + edgeMode: "afd", + tlsSource: "akv", + env, + }); + if (codes.includes("vpn-pool-overlap")) { + const vnet = vnetCidr || "10.20.0.0/16"; + return ( + `VPN_CLIENT_ADDRESS_POOL '${pool}' overlaps the stamp VNet CIDR '${vnet}' ` + + `(or is malformed). Pick a non-overlapping IPv4 CIDR, e.g. 172.16.200.0/24.` + ); + } + return true; +} + // ─── Interactive prompt helpers ─────────────────────────────────────────── async function prompt(rl, question, defaultValue) { const suffix = defaultValue ? ` [${defaultValue}]` : ""; @@ -317,6 +391,67 @@ export const INPUTS = [ "akv-selfsigned": "AKV `Self` issuer; bicep auto-creates a self-signed cert (OSS private demo)", }), }, + // VPN gateway prompts are placed immediately after tlsSource so the + // [vpn-incompatible-combo] gate fires BEFORE any secret prompts — + // operators don't waste time entering secrets only to be refused. The + // cross-field combo check runs in gatherInputs() right after the + // vpnEnabled answer is captured. main() repeats the same check as + // defense-in-depth for fully non-interactive paths where this loop is + // bypassed. + { + argKey: "vpnEnabled", + flag: "--vpn-enabled", + metavar: "", + help: [ + "Provision an Azure VPN Gateway P2S (Microsoft Entra ID auth, OpenVPN)", + "as an additive ingress alongside AFD. Requires --edge-mode afd and", + "--tls-source akv. Adds ~$450/mo (~$280 VPN + ~$170 DNS Resolver) and 45+ min to the first deploy.", + "Default no.", + ], + cliChoices: ["y", "n", "yes", "no", "true", "false"], + nonInteractiveDefault: () => "n", + type: "menu", + prompt: "Enable Azure VPN Gateway P2S (additive VPN ingress)?", + default: "n", + choices: ["n", "y"], + choiceDescriptions: { + n: "No VPN gateway (default)", + y: "Provision VPN Gateway P2S (~$450/mo total, 45+ min first deploy; requires EDGE_MODE=afd + TLS_SOURCE=akv)", + }, + transform: (v) => normaliseYesNo(v), + }, + { + argKey: "vpnClientAddressPool", + flag: "--vpn-client-address-pool", + metavar: "", + help: [ + "VPN client address pool (IPv4 CIDR). Required when --vpn-enabled y.", + "MUST NOT overlap the stamp VNet CIDR (default 10.20.0.0/16).", + ], + prompt: "VPN client address pool (IPv4 CIDR; must not overlap VNet 10.20.0.0/16)", + default: "172.16.200.0/24", + promptIf: (ctx) => normaliseYesNo(ctx.vpnEnabled) === "y", + validate: (v) => validateVpnClientAddressPool(v, null), + }, + { + argKey: "sslCertDomainSuffix", + flag: "--ssl-cert-domain-suffix", + metavar: "", + help: [ + "DNS suffix the portal cert is issued for. Required when", + "--tls-source akv (cert subject = .). Must be", + "a domain your AKV cert issuer (e.g. OneCertV2-PublicCA) is", + "authorized to issue for. Also used as the managed Private DNS", + "zone name when --vpn-enabled y.", + ], + prompt: "SSL cert domain suffix (e.g. dev.pilotswarm.example.com — must be authorized for your AKV cert issuer)", + promptIf: (ctx) => ctx.tlsSource === "akv", + validate: (v) => + /^[a-z0-9]([a-z0-9.-]*[a-z0-9])?$/i.test(v) && v.includes(".") + ? true + : "must be a valid DNS suffix (lowercase, must contain at least one dot)", + transform: "lowercase", + }, { argKey: "host", flag: "--host", @@ -390,12 +525,7 @@ export const INPUTS = [ n: "No Foundry account (default; uses github-copilot + anthropic providers)", y: "Provision Foundry account + write azure-oai-key into KV", }, - transform: (v) => { - const s = String(v ?? "").toLowerCase(); - if (s === "y" || s === "yes" || s === "true") return "y"; - if (s === "n" || s === "no" || s === "false" || s === "") return "n"; - return s; - }, + transform: (v) => normaliseYesNo(v), }, ]; @@ -483,7 +613,7 @@ function usage() { // Derive deployment-target values from a small set of inputs, matching the enterprise path // serviceModel.json naming patterns. Pure function — no I/O. -export function deriveTargets({ name, subscription, location, regionShort, edgeMode, host, privateDnsZone, portalHostname, tlsSource, acmeEmail, foundryEnabled }) { +export function deriveTargets({ name, subscription, location, regionShort, edgeMode, host, privateDnsZone, portalHostname, tlsSource, acmeEmail, sslCertDomainSuffix, foundryEnabled, vpnEnabled, vpnClientAddressPool }) { const prefix = `ps${name}`; const globalPrefix = `${prefix}global`; const resolvedEdgeMode = edgeMode ?? DEFAULT_EDGE_MODE; @@ -497,7 +627,19 @@ export function deriveTargets({ name, subscription, location, regionShort, edgeM if (!derivedPortalHostname && resolvedEdgeMode === "private" && host && privateDnsZone) { derivedPortalHostname = `${host}.${privateDnsZone}`; } - const foundryOn = foundryEnabled === "y" || foundryEnabled === true; + // Normalise yes/no inputs the same way main() does (normaliseYesNo). + // The CLI accepts y|n|yes|no|true|false (and JS boolean true from + // programmatic callers), so a plain `=== "y"` check would silently + // disable the feature on `--foundry-enabled yes` / `--vpn-enabled true` + // and friends. Boolean `true` is normalised via String(v). + const foundryOn = normaliseYesNo(foundryEnabled) === "y"; + // VPN_GATEWAY_SKU and VPN_AAD_AUDIENCE flow through from template.env + // defaults (VpnGw2AZ and the Microsoft-registered Azure VPN Client app id + // c632b3df-... respectively); we don't prompt for them and don't + // override here. When VPN is disabled the template's + // VPN_GATEWAY_ENABLED=false default flows through unchanged — we still + // emit it explicitly to keep the rendered .env deterministic. + const vpnOn = normaliseYesNo(vpnEnabled) === "y"; return { SUBSCRIPTION_ID: subscription ?? "", LOCATION: location, @@ -530,6 +672,21 @@ export function deriveTargets({ name, subscription, location, regionShort, edgeM // through as their disabled values. FOUNDRY_ENABLED: foundryOn ? "true" : "false", FOUNDRY_DEPLOYMENTS_FILE: foundryOn ? `deploy/envs/local/${name}/foundry-deployments.json` : "", + // VPN gateway. Only the two prompted keys are threaded; + // VPN_GATEWAY_SKU and VPN_AAD_AUDIENCE remain at their template + // defaults (VpnGw2AZ / c632b3df-... respectively) and the operator + // overrides them by hand-editing the .env if needed (e.g. legacy + // 41b23e61-... audience, larger SKU). When VPN is disabled the pool + // override is omitted so the template's documented default + // (172.16.200.0/24) flows through unchanged. + VPN_GATEWAY_ENABLED: vpnOn ? "true" : "false", + ...(vpnOn && vpnClientAddressPool ? { VPN_CLIENT_ADDRESS_POOL: vpnClientAddressPool } : {}), + // SSL_CERT_DOMAIN_SUFFIX is userRequired by the afd-akv overlay (and + // by the VPN combo gate when VPN is enabled). The scaffolder prompts + // for it whenever tlsSource=akv; we override the template's empty + // default with the captured value. Other tlsSources don't consume it + // and the template's empty default flows through unchanged. + ...(sslCertDomainSuffix ? { SSL_CERT_DOMAIN_SUFFIX: sslCertDomainSuffix } : {}), }; } @@ -692,12 +849,34 @@ async function gatherInputs(args, existingSecrets, existingPortalConfig) { for (const i of INPUTS) { if (args[i.argKey] != null) { ctx[i.argKey] = args[i.argKey]; - continue; + } else { + // Custom imperative override wins; otherwise drive the declarative runner. + ctx[i.argKey] = i.interactive + ? await i.interactive(rl, ctx) + : await runDeclarativePrompt(rl, ctx, i); + } + + // Early VPN combo gate — fires immediately after the vpnEnabled + // value is resolved (whether from --vpn-enabled flag or from the + // interactive prompt) so the operator sees the + // [vpn-incompatible-combo] error BEFORE being asked for + // vpnClientAddressPool, secrets, or portal config. main() repeats + // this check (and the pool-overlap check) for fully + // non-interactive invocations where this loop is bypassed + // entirely. + if (i.argKey === "vpnEnabled" && normaliseYesNo(ctx.vpnEnabled) === "y") { + assertVpnGatewayCombo(ctx.edgeMode, ctx.tlsSource); + } + // Pool-overlap re-check immediately after the pool input is + // resolved. The declarative `validate` hook already loops on bad + // input in the interactive path; this catches the case where + // --vpn-client-address-pool was supplied via flag (runDeclarativePrompt + // is skipped when args[argKey] != null). + if (i.argKey === "vpnClientAddressPool" && normaliseYesNo(ctx.vpnEnabled) === "y") { + const pool = ctx.vpnClientAddressPool || "172.16.200.0/24"; + const r = validateVpnClientAddressPool(pool, null); + if (r !== true) throw new Error("[vpn-pool-overlap] " + r); } - // Custom imperative override wins; otherwise drive the declarative runner. - ctx[i.argKey] = i.interactive - ? await i.interactive(rl, ctx) - : await runDeclarativePrompt(rl, ctx, i); } // Defensive cross-field check — covers the case where the user passed @@ -865,6 +1044,21 @@ async function main() { // Final combo gate — covers the non-interactive path where one of // edge-mode / tls-source was defaulted rather than provided. assertSupportedCombo(inputs.edgeMode, inputs.tlsSource); + // VPN combo gate — refuse with a named error before any .env is + // written when --vpn-enabled y is paired with an incompatible + // edge/tls combo. Mirrors validateVpnGatewayCombo's error style so + // the scaffolder and deploy.mjs speak the same language. + if (normaliseYesNo(inputs.vpnEnabled) === "y") { + assertVpnGatewayCombo(inputs.edgeMode, inputs.tlsSource); + // Pool-overlap gate — declarative `validate` hook only runs in the + // interactive path, so we re-check here so flags-only invocations + // (tests, CI, scripted runs) hit the same hard refusal. + const pool = inputs.vpnClientAddressPool || "172.16.200.0/24"; + const poolResult = validateVpnClientAddressPool(pool, null); + if (poolResult !== true) { + throw new Error("[vpn-pool-overlap] " + poolResult); + } + } const targets = deriveTargets(inputs); @@ -876,7 +1070,7 @@ async function main() { // scaffolded .env carries valid `unused` sentinels for off-path keys, // matching what deploy.mjs would seed. applyStubKeys({ edgeMode: targets.EDGE_MODE, tlsSource: targets.TLS_SOURCE, env: targets }); - const missingRequired = validateRequiredEnv({ + const { missing: missingRequired, combo: comboErrors } = validateRequiredEnv({ edgeMode: targets.EDGE_MODE, tlsSource: targets.TLS_SOURCE, env: targets, @@ -889,6 +1083,9 @@ async function main() { `See deploy/scripts/lib/overlay-contracts.mjs for the per-overlay roster.`, ); } + for (const e of comboErrors) { + log("warn", `[${e.code}] ${e.message} ${e.hint}`); + } const dst = envFilePath(inputs.name); if (existsSync(dst) && !args.force) { @@ -944,6 +1141,49 @@ async function main() { const subForCmd = targets.SUBSCRIPTION_ID || ""; console.log(` ${stepNum++}. az login && az account set --subscription ${subForCmd}`); console.log(` ${stepNum}. npm run deploy -- all ${inputs.name}`); + + // Post-scaffold VPN reminder block. Surfaces the out-of-band + // requirements deploy.mjs cannot enforce: tenant Conditional + // Access policy, audience override, cost / first-deploy-time disclosure, + // and a docs pointer. Skipped silently when VPN is disabled — keeps the + // VPN=no path's stdout unchanged so existing tests (and operators in + // private mode) see no surprises. + if (targets.VPN_GATEWAY_ENABLED === "true") { + console.log(""); + console.log("─── VPN Gateway P2S — required out-of-band setup ─────────────────────"); + console.log("⚠ Conditional Access policy (tenant admin, REQUIRED before first connect):"); + console.log(" • Target app: c632b3df-fb67-4d84-bdcf-b95ad541b5c8 (Azure VPN Client, Microsoft-registered)"); + console.log(" • Assignment: a NAMED users group — do NOT target 'all users'"); + console.log(" • Grant: require MFA"); + console.log(" • Grant: do NOT require device compliance"); + console.log(" • Legacy override: set VPN_AAD_AUDIENCE=41b23e61-6c1e-4545-b367-cd054e0ed4b4"); + console.log(" in .env for tenants on older Azure VPN client builds"); + console.log(" (audience override is documented in template.env)."); + console.log(""); + console.log("💰 Cost: ~$450/month total"); + console.log(" • VpnGw2AZ SKU + Public IP + gateway hours: ~$280"); + console.log(" • Azure Private DNS Resolver inbound endpoint: ~$170"); + console.log(" (auto-provisioned with VPN so P2S clients can"); + console.log(" resolve the portal Private DNS Zone hostname)."); + console.log("⏱ First-deploy time: 45+ minutes (VPN gateway provisioning is the long pole)"); + console.log(""); + console.log("⬇ VPN client profile (after first deploy completes):"); + console.log(` • Preferred — helper: pwsh -File deploy/scripts/auth/Get-VpnClientProfile.ps1 \\`); + console.log(` -EnvName ${inputs.name}`); + console.log(" (see .github/skills/pilotswarm-vpn-client-profile/SKILL.md)"); + console.log(` • Azure portal: Resource group → ${targets.RESOURCE_GROUP}`); + console.log(" → → Point-to-site configuration"); + console.log(" → Download VPN client"); + console.log(` • az CLI fallback: az network vnet-gateway vpn-client generate \\`); + console.log(` --resource-group ${targets.RESOURCE_GROUP} \\`); + console.log(" --name \\"); + console.log(" --authentication-method EAPTLS"); + console.log(" Then import the .zip into the Azure VPN Client app."); + console.log(""); + console.log("📖 Full guidance: docs/deploying-to-aks.md → 'Optional: VPN Gateway P2S' section"); + console.log(" (also: pilotswarm-new-env-deploy skill VPN matrix entry)."); + console.log("──────────────────────────────────────────────────────────────────────"); + } } const invokedDirectly = diff --git a/deploy/scripts/test/appgw-waf-rules.test.mjs b/deploy/scripts/test/appgw-waf-rules.test.mjs new file mode 100644 index 00000000..4e570d03 --- /dev/null +++ b/deploy/scripts/test/appgw-waf-rules.test.mjs @@ -0,0 +1,328 @@ +// Unit tests for deploy/scripts/lib/appgw-waf-rules.mjs. +// +// Covers: +// * File path resolution (relative → absolute; missing → named error; +// unset → null; malformed JSON → named error). +// * Bicep merged-output shape — four cases: +// VPN on + operator rules empty → exactly the three auto-seeded +// rules at priorities 90/91/92. +// VPN off + operator rules non-empty → only operator rules (no seed). +// VPN on + operator rules non-empty → seed (90/91/92) followed by +// operator rules in array order. +// VPN off + operator rules empty → []. +// * Alias mapping: bicep `frontDoorId` output → `FRONT_DOOR_ID` env key +// via deploy-bicep.mjs `_internals.aliasFor()`. +// +// The JS helper mirrors the bicep `var` in application-gateway.bicep:82-142 +// exactly and is kept in lockstep by code review. The bicep is the runtime +// source of truth; this test guards the JS shadow. + +import { test } from "node:test"; +import assert from "node:assert/strict"; +import { writeFileSync, mkdtempSync, rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; + +import { + buildAutoSeedRules, + buildMergedCustomRules, + resolveAppgwWafCustomRulesFile, +} from "../lib/appgw-waf-rules.mjs"; +import { _internals } from "../lib/deploy-bicep.mjs"; + +const SAMPLE_OPS = [ + { name: "AllowOperator", priority: 100, ruleType: "MatchRule", action: "Allow", matchConditions: [] }, + { name: "BlockAnon", priority: 110, ruleType: "MatchRule", action: "Block", matchConditions: [] }, +]; + +const SAMPLE_ARGS = Object.freeze({ + frontDoorId: "00000000-1111-2222-3333-444444444444", + vpnClientAddressPool: "172.16.200.0/24", +}); + +// =========================================================================== +// Merged-shape cases +// =========================================================================== + +test("buildAutoSeedRules: vpnGatewayEnabled=false → []", () => { + assert.deepEqual( + buildAutoSeedRules({ vpnGatewayEnabled: false, ...SAMPLE_ARGS }), + [], + ); +}); + +test("buildAutoSeedRules: vpnGatewayEnabled=true → three rules at priorities 90/91/92", () => { + const rules = buildAutoSeedRules({ vpnGatewayEnabled: true, ...SAMPLE_ARGS }); + assert.equal(rules.length, 3); + assert.deepEqual(rules.map((r) => r.name), ["AllowAfd", "AllowVpn", "BlockOther"]); + assert.deepEqual(rules.map((r) => r.priority), [90, 91, 92]); + assert.deepEqual(rules.map((r) => r.action), ["Allow", "Allow", "Block"]); +}); + +test("buildAutoSeedRules: AllowAfd matches X-Azure-FDID against frontDoorId", () => { + const [allowAfd] = buildAutoSeedRules({ vpnGatewayEnabled: true, ...SAMPLE_ARGS }); + const mc = allowAfd.matchConditions[0]; + assert.equal(mc.matchVariables[0].variableName, "RequestHeaders"); + assert.equal(mc.matchVariables[0].selector, "X-Azure-FDID"); + assert.equal(mc.operator, "Equal"); + assert.deepEqual(mc.matchValues, [SAMPLE_ARGS.frontDoorId]); +}); + +test("buildAutoSeedRules: AllowVpn IPMatches vpnClientAddressPool", () => { + const [, allowVpn] = buildAutoSeedRules({ vpnGatewayEnabled: true, ...SAMPLE_ARGS }); + const mc = allowVpn.matchConditions[0]; + assert.equal(mc.matchVariables[0].variableName, "RemoteAddr"); + assert.equal(mc.operator, "IPMatch"); + assert.deepEqual(mc.matchValues, [SAMPLE_ARGS.vpnClientAddressPool]); +}); + +test("buildAutoSeedRules: BlockOther IPMatches 0.0.0.0/0", () => { + const [, , blockOther] = buildAutoSeedRules({ vpnGatewayEnabled: true, ...SAMPLE_ARGS }); + assert.deepEqual(blockOther.matchConditions[0].matchValues, ["0.0.0.0/0"]); +}); + +test("buildMergedCustomRules: VPN on + ops empty → seed only (3 rules)", () => { + const merged = buildMergedCustomRules({ + vpnGatewayEnabled: true, + ...SAMPLE_ARGS, + operatorRules: [], + }); + assert.equal(merged.length, 3); + assert.deepEqual(merged.map((r) => r.name), ["AllowAfd", "AllowVpn", "BlockOther"]); +}); + +test("buildMergedCustomRules: VPN off + ops non-empty → ops only", () => { + const merged = buildMergedCustomRules({ + vpnGatewayEnabled: false, + ...SAMPLE_ARGS, + operatorRules: SAMPLE_OPS, + }); + assert.deepEqual(merged, SAMPLE_OPS); +}); + +test("buildMergedCustomRules: VPN on + ops non-empty → seed then ops, in array order", () => { + const merged = buildMergedCustomRules({ + vpnGatewayEnabled: true, + ...SAMPLE_ARGS, + operatorRules: SAMPLE_OPS, + }); + assert.equal(merged.length, 5); + assert.deepEqual( + merged.map((r) => r.name), + ["AllowAfd", "AllowVpn", "BlockOther", "AllowOperator", "BlockAnon"], + ); +}); + +test("buildMergedCustomRules: VPN off + ops empty → []", () => { + assert.deepEqual( + buildMergedCustomRules({ + vpnGatewayEnabled: false, + ...SAMPLE_ARGS, + operatorRules: [], + }), + [], + ); +}); + +test("buildMergedCustomRules: tolerates undefined operatorRules", () => { + const merged = buildMergedCustomRules({ + vpnGatewayEnabled: true, + ...SAMPLE_ARGS, + operatorRules: undefined, + }); + assert.equal(merged.length, 3); +}); + +// =========================================================================== +// Bicep ↔ JS shape parity (lightweight string-shape guard) +// +// Asserts the bicep file still contains the three rule names + priorities +// AND the literal tokens that make each rule meaningful (header selector, +// match operator, action, IP address parameters, catch-all CIDR). Catches +// a divergent edit to either side that doesn't update both. Not a substitute +// for a full snapshot harness, but it makes the lockstep requirement testable +// at the level the JS helper actually relies on. +// =========================================================================== +test("bicep application-gateway.bicep autoSeed rules match JS helper names + priorities + tokens", async () => { + const { readFileSync } = await import("node:fs"); + const { fileURLToPath } = await import("node:url"); + const { dirname } = await import("node:path"); + const REPO_ROOT = join(dirname(fileURLToPath(import.meta.url)), "..", "..", ".."); + const bicepPath = join(REPO_ROOT, "deploy", "services", "base-infra", "bicep", "application-gateway.bicep"); + const raw = readFileSync(bicepPath, "utf8"); + // Extract the autoSeedRules block (between `var autoSeedRules = vpnGatewayEnabled ? [` and `] : []`). + const m = raw.match(/var autoSeedRules = vpnGatewayEnabled \? \[([\s\S]*?)\] : \[\]/); + assert.ok(m, "could not locate autoSeedRules block in application-gateway.bicep"); + const block = m[1]; + // Names appear in order. + const namesInOrder = ["AllowAfd", "AllowVpn", "BlockOther"]; + let lastIdx = -1; + for (const name of namesInOrder) { + const idx = block.indexOf(`name: '${name}'`); + assert.ok(idx > lastIdx, `bicep autoSeedRules block missing or out-of-order: ${name}`); + lastIdx = idx; + } + // Priorities 90/91/92 appear in order. + const prios = [...block.matchAll(/priority:\s+(\d+)/g)].map((mm) => Number(mm[1])); + assert.deepEqual(prios, [90, 91, 92], "bicep autoSeedRules priorities drifted from JS helper"); + + // Per-rule literal-token guards. Slice the block by name boundaries so a + // token from rule N can't satisfy the assertion for rule N+1. + function sliceForRule(name) { + const start = block.indexOf(`name: '${name}'`); + assert.ok(start >= 0, `rule ${name} not found in autoSeedRules`); + // Find the next rule's name marker (or end of block) to scope the slice. + let end = block.length; + for (const other of namesInOrder) { + if (other === name) continue; + const oi = block.indexOf(`name: '${other}'`, start + 1); + if (oi > start && oi < end) end = oi; + } + return block.slice(start, end); + } + + const allowAfd = sliceForRule("AllowAfd"); + assert.match(allowAfd, /action:\s*'Allow'/, "AllowAfd action must be 'Allow'"); + assert.ok( + allowAfd.includes("'X-Azure-FDID'"), + "AllowAfd must select on the X-Azure-FDID request header", + ); + assert.match(allowAfd, /operator:\s*'Equal'/, "AllowAfd operator must be 'Equal'"); + assert.ok( + /\bfrontDoorId\b/.test(allowAfd), + "AllowAfd must reference the frontDoorId parameter as the match value", + ); + + const allowVpn = sliceForRule("AllowVpn"); + assert.match(allowVpn, /action:\s*'Allow'/, "AllowVpn action must be 'Allow'"); + assert.match(allowVpn, /operator:\s*'IPMatch'/, "AllowVpn operator must be 'IPMatch'"); + assert.ok( + /\bRemoteAddr\b/.test(allowVpn), + "AllowVpn must match on the RemoteAddr variable", + ); + assert.ok( + /\bvpnClientAddressPool\b/.test(allowVpn), + "AllowVpn must reference the vpnClientAddressPool parameter as the match value", + ); + + const blockOther = sliceForRule("BlockOther"); + assert.match(blockOther, /action:\s*'Block'/, "BlockOther action must be 'Block'"); + assert.match(blockOther, /operator:\s*'IPMatch'/, "BlockOther operator must be 'IPMatch'"); + assert.ok( + blockOther.includes("'0.0.0.0/0'"), + "BlockOther must use the 0.0.0.0/0 catch-all CIDR as its match value", + ); +}); + +// =========================================================================== +// File path resolution +// =========================================================================== + +test("resolveAppgwWafCustomRulesFile: unset → null", () => { + assert.equal(resolveAppgwWafCustomRulesFile(undefined), null); + assert.equal(resolveAppgwWafCustomRulesFile(""), null); + assert.equal(resolveAppgwWafCustomRulesFile(" "), null); +}); + +test("resolveAppgwWafCustomRulesFile: missing file → named error mirroring AFD shape", () => { + assert.throws( + () => resolveAppgwWafCustomRulesFile("deploy/envs/local/__does_not_exist__/appgw.json"), + /APPGW_WAF_CUSTOM_RULES_FILE points to a missing file:.*Either unset it or create the JSON array file/s, + ); +}); + +test("resolveAppgwWafCustomRulesFile: absolute path is used verbatim", () => { + const dir = mkdtempSync(join(tmpdir(), "appgw-waf-")); + try { + const file = join(dir, "rules.json"); + writeFileSync(file, JSON.stringify(SAMPLE_OPS)); + const out = resolveAppgwWafCustomRulesFile(file); + assert.equal(out.absPath, file); + assert.deepEqual(out.rules, SAMPLE_OPS); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("resolveAppgwWafCustomRulesFile: malformed JSON → named error", () => { + const dir = mkdtempSync(join(tmpdir(), "appgw-waf-")); + try { + const file = join(dir, "rules.json"); + writeFileSync(file, "{not json"); + assert.throws( + () => resolveAppgwWafCustomRulesFile(file), + /APPGW_WAF_CUSTOM_RULES_FILE is not valid JSON/, + ); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("resolveAppgwWafCustomRulesFile: non-array JSON → named error", () => { + const dir = mkdtempSync(join(tmpdir(), "appgw-waf-")); + try { + const file = join(dir, "rules.json"); + writeFileSync(file, JSON.stringify({ not: "an array" })); + assert.throws( + () => resolveAppgwWafCustomRulesFile(file), + /must contain a JSON array of WAF custom rules/, + ); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +// =========================================================================== +// Alias mapping: frontDoorId → FRONT_DOOR_ID via deploy-bicep.mjs aliasFor() +// (covers the global-infra → base-infra threading used by the AppGw WAF +// AllowAfd guard rule). +// =========================================================================== + +test("aliasFor: frontDoorId → FRONT_DOOR_ID (auto-derived, no explicit OUTPUT_ALIAS needed)", () => { + const { OUTPUT_ALIAS, aliasFor } = _internals; + // Not in the explicit override map … + assert.ok(!("frontDoorId" in OUTPUT_ALIAS), "frontDoorId should rely on the default camelCase rule"); + // … so it flows via the default camelCase → UPPER_SNAKE rule. + assert.equal(aliasFor("frontDoorId"), "FRONT_DOOR_ID"); +}); + +// =========================================================================== +// deploy-bicep.mjs wiring guard (Final-review S-2 follow-up) +// +// resolveAppgwWafCustomRulesFile() validates that APPGW_WAF_CUSTOM_RULES_FILE +// parses as JSON and is an array — but it only matters if the deploy path +// actually invokes it before shelling out to `az`. The deploy code does not +// expose a hook we can unit-test in isolation (deployBicep() shells out to +// `az` and touches the filesystem); guard the wiring with a source scan +// instead, matching the bicep ↔ JS shape parity test above. +// =========================================================================== +test("deploy-bicep.mjs invokes resolveAppgwWafCustomRulesFile in the APPGW_WAF_CUSTOM_RULES_FILE branch", async () => { + const { readFileSync } = await import("node:fs"); + const { fileURLToPath } = await import("node:url"); + const { dirname } = await import("node:path"); + const REPO_ROOT = join(dirname(fileURLToPath(import.meta.url)), "..", "..", ".."); + const src = readFileSync( + join(REPO_ROOT, "deploy", "scripts", "lib", "deploy-bicep.mjs"), + "utf8", + ); + // (1) The helper is imported. + assert.match( + src, + /import\s*\{\s*resolveAppgwWafCustomRulesFile\s*\}\s*from\s*["']\.\/appgw-waf-rules\.mjs["']/, + "deploy-bicep.mjs must import resolveAppgwWafCustomRulesFile from ./appgw-waf-rules.mjs", + ); + // (2) It is invoked inside the APPGW_WAF_CUSTOM_RULES_FILE branch — i.e., + // the helper call sits between the env-var guard and the + // `--parameters appgwWafCustomRules=@...` line that hands the file to az. + const branchMatch = src.match( + /if\s*\(\s*env\.APPGW_WAF_CUSTOM_RULES_FILE\s*\)\s*\{([\s\S]*?)appgwWafCustomRules=@/, + ); + assert.ok( + branchMatch, + "could not locate the APPGW_WAF_CUSTOM_RULES_FILE branch in deploy-bicep.mjs", + ); + assert.ok( + /resolveAppgwWafCustomRulesFile\s*\(/.test(branchMatch[1]), + "APPGW_WAF_CUSTOM_RULES_FILE branch must call resolveAppgwWafCustomRulesFile before invoking az", + ); +}); diff --git a/deploy/scripts/test/common.test.mjs b/deploy/scripts/test/common.test.mjs index bee9adc6..59430163 100644 --- a/deploy/scripts/test/common.test.mjs +++ b/deploy/scripts/test/common.test.mjs @@ -99,3 +99,62 @@ test("redactArgs masks --sp (azcopy / az login service principal)", () => { const out = redactArgs(["azcopy", "login", "--sp", "secret-value"]); assert.deepEqual(out, ["azcopy", "login", "--sp", "***"]); }); + + +// ── parseEnvFile: inline `# comment` stripping for unquoted values ── +// Regression for the VPN gateway deploy failure where VPN_GATEWAY_SKU +// carried its template.env documentation comment into the bicep param, +// triggering "value is not part of the allowed value(s)". + +import { mkdtempSync, writeFileSync, rmSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { parseEnvFile } from "../lib/common.mjs"; + +function writeTmpEnv(body) { + const dir = mkdtempSync(join(tmpdir(), "parseenv-")); + const path = join(dir, ".env"); + writeFileSync(path, body, "utf8"); + return { dir, path }; +} + +test("parseEnvFile strips inline `# comment` after unquoted value", () => { + const { dir, path } = writeTmpEnv("VPN_GATEWAY_SKU=VpnGw1 # @allowed: VpnGw1 / VpnGw2\n"); + try { + const env = parseEnvFile(path); + assert.equal(env.VPN_GATEWAY_SKU, "VpnGw1"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("parseEnvFile preserves `#` inside quoted values", () => { + const { dir, path } = writeTmpEnv('TOKEN="abc#def"\n'); + try { + const env = parseEnvFile(path); + assert.equal(env.TOKEN, "abc#def"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("parseEnvFile preserves `#` in unquoted value when not preceded by whitespace", () => { + const { dir, path } = writeTmpEnv("URL=https://example.com/page#section\n"); + try { + const env = parseEnvFile(path); + assert.equal(env.URL, "https://example.com/page#section"); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); + +test("parseEnvFile leaves full-line `#` comments alone (existing behavior)", () => { + const { dir, path } = writeTmpEnv("# this is a comment\nKEY=value\n"); + try { + const env = parseEnvFile(path); + assert.equal(env.KEY, "value"); + assert.equal(Object.keys(env).length, 1); + } finally { + rmSync(dir, { recursive: true, force: true }); + } +}); diff --git a/deploy/scripts/test/new-env.test.mjs b/deploy/scripts/test/new-env.test.mjs index d6aad519..bc3e9ac3 100644 --- a/deploy/scripts/test/new-env.test.mjs +++ b/deploy/scripts/test/new-env.test.mjs @@ -399,3 +399,500 @@ test("scaffolder accepts private + akv-selfsigned with HOST and PRIVATE_DNS_ZONE cleanup(); } }); + +// ─── VPN Gateway P2S scaffolder UX ──────────────────────────────────────── +// +// VPN questions are asked only after edge/tls have been selected. The +// declarative INPUTS pipeline drives the prompts; a hard combo gate in +// main() refuses with a named error before any .env is written when the +// requested edge/tls combo is incompatible with VPN. CIDR-overlap checks +// reuse validateVpnGatewayCombo (no duplicated arithmetic). +// +// VPN_GATEWAY_SKU and VPN_AAD_AUDIENCE are NOT prompted — they flow +// through unchanged from template.env defaults (VpnGw2AZ, c632b3df-...). +// The unsupported keys VPN_AAD_TENANT_ID, VPN_PRIVATE_DNS_MODE, +// PRIVATE_DNS_ZONE_ID, VPN_AAD_USERS_GROUP_NAME_HINT must NEVER be emitted. + +test("deriveTargets emits VPN_GATEWAY_ENABLED=true with custom pool when vpnEnabled=y", () => { + const t = deriveTargets({ + name: "foo", + location: "westus3", + regionShort: "wus3", + edgeMode: "afd", + tlsSource: "akv", + vpnEnabled: "y", + vpnClientAddressPool: "172.16.222.0/24", + }); + assert.equal(t.VPN_GATEWAY_ENABLED, "true"); + assert.equal(t.VPN_CLIENT_ADDRESS_POOL, "172.16.222.0/24"); + // SKU and AAD audience are NOT in deriveTargets' return — they flow + // from template.env so the operator gets the documented defaults. + assert.ok(!("VPN_GATEWAY_SKU" in t), "VPN_GATEWAY_SKU must come from template.env"); + assert.ok(!("VPN_AAD_AUDIENCE" in t), "VPN_AAD_AUDIENCE must come from template.env"); + // Dropped-from-spec keys must never be threaded. + for (const dropped of [ + "VPN_AAD_TENANT_ID", + "VPN_PRIVATE_DNS_MODE", + "PRIVATE_DNS_ZONE_ID", + "VPN_AAD_USERS_GROUP_NAME_HINT", + ]) { + assert.ok(!(dropped in t), `${dropped} was dropped from the spec — must not be emitted`); + } +}); + +test("deriveTargets emits VPN_GATEWAY_ENABLED=false and omits pool override when vpnEnabled=n", () => { + const t = deriveTargets({ + name: "foo", + location: "westus3", + regionShort: "wus3", + vpnEnabled: "n", + }); + assert.equal(t.VPN_GATEWAY_ENABLED, "false"); + assert.ok(!("VPN_CLIENT_ADDRESS_POOL" in t), + "pool override should be omitted when VPN is disabled (template default flows through)"); +}); + +test("scaffolder writes VPN keys at expected defaults on afd+akv with --vpn-enabled y", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", "y", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const content = readFileSync(TEST_FILE, "utf8"); + // Two prompted keys land at their defaults. + assert.match(content, /^VPN_GATEWAY_ENABLED=true$/m); + assert.match(content, /^VPN_CLIENT_ADDRESS_POOL=172\.16\.200\.0\/24$/m); + // Two template-default keys flow through verbatim. + assert.match(content, /^VPN_GATEWAY_SKU=VpnGw2AZ\b/m); + assert.match(content, /^VPN_AAD_AUDIENCE=c632b3df-fb67-4d84-bdcf-b95ad541b5c8\b/m); + // Dropped-from-spec keys must NOT be present. + for (const dropped of [ + "VPN_AAD_TENANT_ID", + "VPN_PRIVATE_DNS_MODE", + "PRIVATE_DNS_ZONE_ID", + "VPN_AAD_USERS_GROUP_NAME_HINT", + ]) { + assert.doesNotMatch(content, new RegExp(`^${dropped}=`, "m"), + `${dropped} was dropped from the spec but appeared in scaffolded .env`); + } + } finally { + cleanup(); + } +}); + +test("scaffolder refuses VPN on afd+letsencrypt with a named [vpn-incompatible-combo] error", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "letsencrypt", + "--vpn-enabled", "y", + ]); + assert.notEqual(r.status, 0); + assert.match(r.stderr, /\[vpn-incompatible-combo\]/); + assert.match(r.stderr, /TLS_SOURCE must be 'akv'/); + // Hard refusal: no .env written. + assert.ok(!existsSync(TEST_FILE), "scaffolder must not write .env when VPN combo is rejected"); + } finally { + cleanup(); + } +}); + +test("scaffolder refuses VPN on private+akv with a named [vpn-incompatible-combo] error", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "private", + "--tls-source", "akv", + "--host", "portal", + "--private-dns-zone", "pilotswarm.internal", + "--vpn-enabled", "y", + ]); + assert.notEqual(r.status, 0); + assert.match(r.stderr, /\[vpn-incompatible-combo\]/); + assert.match(r.stderr, /EDGE_MODE must be 'afd'/); + assert.ok(!existsSync(TEST_FILE), "scaffolder must not write .env when VPN combo is rejected"); + } finally { + cleanup(); + } +}); + +test("scaffolder accepts a custom non-overlapping VPN_CLIENT_ADDRESS_POOL", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", "y", + "--vpn-client-address-pool", "192.168.99.0/24", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const content = readFileSync(TEST_FILE, "utf8"); + assert.match(content, /^VPN_GATEWAY_ENABLED=true$/m); + assert.match(content, /^VPN_CLIENT_ADDRESS_POOL=192\.168\.99\.0\/24$/m); + } finally { + cleanup(); + } +}); + +test("scaffolder rejects an overlapping VPN_CLIENT_ADDRESS_POOL with a pool-overlap error", () => { + cleanup(); + try { + // Default VNet CIDR is 10.20.0.0/16; this /24 sits inside it. + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", "y", + "--vpn-client-address-pool", "10.20.50.0/24", + ]); + assert.notEqual(r.status, 0); + const out = `${r.stdout}\n${r.stderr}`; + assert.match( + out, + /VPN_CLIENT_ADDRESS_POOL .*overlaps the stamp VNet CIDR/, + `expected pool-overlap error in scaffolder output:\n${out}`, + ); + assert.ok(!existsSync(TEST_FILE), "scaffolder must not write .env when pool overlaps VNet"); + } finally { + cleanup(); + } +}); + +test("scaffolder emits the post-scaffold VPN reminder block on success", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", "y", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const out = `${r.stdout}\n${r.stderr}`; + // Reminder header is unmistakable. + assert.match(out, /VPN Gateway P2S — required out-of-band setup/); + // CA policy: target app id, named-users-group, MFA, do-not-require-compliance. + assert.match(out, /c632b3df-fb67-4d84-bdcf-b95ad541b5c8/); + assert.match(out, /Conditional Access/i); + assert.match(out, /NAMED users group/); + assert.match(out, /require MFA/); + assert.match(out, /do NOT require device compliance/); + // Legacy audience override. + assert.match(out, /41b23e61-6c1e-4545-b367-cd054e0ed4b4/); + // Cost + time disclosure. + assert.match(out, /\$450\/month/); + assert.match(out, /Private DNS Resolver/); + assert.match(out, /45\+ minutes/); + // Docs pointer (referenced by section name in deploying-to-aks.md). + assert.match(out, /docs\/deploying-to-aks\.md/); + assert.match(out, /Optional: VPN Gateway P2S/); + } finally { + cleanup(); + } +}); + +test("scaffolder VPN=no path emits no VPN reminder block (regression guard)", () => { + cleanup(); + try { + // Default for --vpn-enabled is n; we still pass it explicitly to make + // intent obvious to future readers. + const r = runScript([...FULL_ARGS(), "--vpn-enabled", "n"]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const out = `${r.stdout}\n${r.stderr}`; + assert.doesNotMatch(out, /VPN Gateway P2S — required out-of-band setup/); + assert.doesNotMatch(out, /Conditional Access/); + const content = readFileSync(TEST_FILE, "utf8"); + // Template default carries through unchanged. + assert.match(content, /^VPN_GATEWAY_ENABLED=false$/m); + } finally { + cleanup(); + } +}); + +// ─── yes/true normalisation drift ──────────────────────────────────────── +// +// Regression guard: --vpn-enabled accepts y|yes|true (and JS boolean true +// from programmatic callers), but deriveTargets() previously only +// honoured literal "y" / boolean true, so `--vpn-enabled yes` and +// `--vpn-enabled true` silently scaffolded VPN_GATEWAY_ENABLED=false with +// no reminder. main() and deriveTargets() +// must agree, via normaliseYesNo(), on every yes/no input. + +for (const truthy of ["yes", "true"]) { + test(`scaffolder --vpn-enabled ${truthy} enables VPN and emits the reminder block`, () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", truthy, + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const content = readFileSync(TEST_FILE, "utf8"); + assert.match(content, /^VPN_GATEWAY_ENABLED=true$/m, + `--vpn-enabled ${truthy} must scaffold VPN_GATEWAY_ENABLED=true`); + const out = `${r.stdout}\n${r.stderr}`; + assert.match(out, /VPN Gateway P2S — required out-of-band setup/, + `--vpn-enabled ${truthy} must emit the post-scaffold reminder block`); + } finally { + cleanup(); + } + }); +} + +test("deriveTargets honours boolean vpnEnabled=true (programmatic callers)", () => { + const t = deriveTargets({ + name: "foo", + location: "westus3", + regionShort: "wus3", + vpnEnabled: true, + vpnClientAddressPool: "172.16.222.0/24", + }); + assert.equal(t.VPN_GATEWAY_ENABLED, "true"); + assert.equal(t.VPN_CLIENT_ADDRESS_POOL, "172.16.222.0/24"); +}); + +test("deriveTargets honours string 'yes' / 'true' for vpnEnabled (CLI normalisation)", () => { + for (const v of ["yes", "true", "YES"]) { + const t = deriveTargets({ + name: "foo", + location: "westus3", + regionShort: "wus3", + vpnEnabled: v, + }); + assert.equal(t.VPN_GATEWAY_ENABLED, "true", + `vpnEnabled='${v}' must yield VPN_GATEWAY_ENABLED=true`); + } +}); + +test("deriveTargets honours boolean foundryEnabled=true and string 'yes' (audit fix)", () => { + // Sister-key normalisation audit: foundryEnabled exhibited the same + // drift as vpnEnabled. Both now route through normaliseYesNo(). + for (const v of [true, "yes", "true", "y", "YES"]) { + const t = deriveTargets({ + name: "foo", + location: "westus3", + regionShort: "wus3", + foundryEnabled: v, + }); + assert.equal(t.FOUNDRY_ENABLED, "true", + `foundryEnabled='${v}' must yield FOUNDRY_ENABLED=true`); + assert.equal(t.FOUNDRY_DEPLOYMENTS_FILE, "deploy/envs/local/foo/foundry-deployments.json"); + } +}); + +// ─── Reminder block: VPN client profile download ───────────────────────── + +test("VPN reminder block includes Azure portal download path for VPN client profile", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", "y", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const out = `${r.stdout}\n${r.stderr}`; + // Portal navigation path: Resource group → → Point-to-site → Download. + assert.match(out, /VPN client profile/i); + assert.match(out, /Resource group/); + assert.match(out, /Point-to-site configuration/); + assert.match(out, /Download VPN client/); + // CLI alternative is documented for scriptability (the supported subcommand is + // `vpn-client generate --authentication-method EAPTLS`; the helper script in + // deploy/scripts/auth/Get-VpnClientProfile.ps1 is the preferred path). + assert.match(out, /Get-VpnClientProfile\.ps1/); + assert.match(out, /az network vnet-gateway vpn-client generate\b/); + assert.match(out, /--authentication-method EAPTLS/); + } finally { + cleanup(); + } +}); + +// ─── INPUTS prompt-order placement ─────────────────────────────────────── +// +// vpnEnabled / vpnClientAddressPool must come immediately after tlsSource +// so the [vpn-incompatible-combo] gate fires before any secret prompts. +// Previously they sat at the end of INPUTS, which let the operator type +// secrets only to be refused after. + +test("INPUTS places vpnEnabled / vpnClientAddressPool immediately after tlsSource", () => { + const order = INPUTS.map((i) => i.argKey); + const tlsIdx = order.indexOf("tlsSource"); + const vpnIdx = order.indexOf("vpnEnabled"); + const poolIdx = order.indexOf("vpnClientAddressPool"); + assert.ok(tlsIdx >= 0, "tlsSource must exist in INPUTS"); + assert.equal(vpnIdx, tlsIdx + 1, + "vpnEnabled must immediately follow tlsSource so the combo gate fires before secrets"); + assert.equal(poolIdx, tlsIdx + 2, + "vpnClientAddressPool must immediately follow vpnEnabled"); + // And both must come BEFORE foundryEnabled / host / acmeEmail (which + // are downstream of the early gate). + const downstreamKeys = ["host", "privateDnsZone", "portalHostname", "acmeEmail", "foundryEnabled"]; + for (const k of downstreamKeys) { + const idx = order.indexOf(k); + if (idx >= 0) { + assert.ok(idx > poolIdx, `${k} must come after vpnClientAddressPool, got ${idx} <= ${poolIdx}`); + } + } +}); + +// ─── Help/error wording — akv-only ─────────────────────────────────────── + +test("--vpn-enabled help text does not advertise akv-selfsigned as compatible", () => { + const vpn = INPUTS.find((i) => i.argKey === "vpnEnabled"); + assert.ok(vpn, "vpnEnabled INPUT must exist"); + const helpText = Array.isArray(vpn.help) ? vpn.help.join(" ") : String(vpn.help ?? ""); + assert.doesNotMatch(helpText, /akv-selfsigned/, + "spec mandates TLS_SOURCE=akv only; akv-selfsigned must not appear in VPN help text"); + const yDesc = vpn.choiceDescriptions?.y ?? ""; + assert.doesNotMatch(yDesc, /akv\*/, + "menu choice description must not use akv* (which suggested akv|akv-selfsigned)"); + assert.doesNotMatch(yDesc, /akv-selfsigned/); +}); + +test("[vpn-incompatible-combo] error wording does not allow akv-selfsigned", () => { + cleanup(); + try { + // akv-selfsigned forces edge-mode=private (per unsupportedReason), + // so we exercise the akv-selfsigned-incompatible path explicitly. + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "private", + "--host", "portal", + "--private-dns-zone", "pilotswarm.internal", + "--tls-source", "akv-selfsigned", + "--vpn-enabled", "y", + ]); + assert.notEqual(r.status, 0); + assert.match(r.stderr, /\[vpn-incompatible-combo\]/); + // Must NOT advertise akv-selfsigned as a way to satisfy the gate. + assert.doesNotMatch(r.stderr, /akv-selfsigned\)/, + "error message must not say '(or akv-selfsigned)' — spec mandates akv only"); + assert.doesNotMatch(r.stderr, /'akv' or 'akv-selfsigned'/, + "error message must not list akv-selfsigned as an acceptable TLS_SOURCE for VPN"); + } finally { + cleanup(); + } +}); + +// ─── SSL_CERT_DOMAIN_SUFFIX prompt (akv + VPN) ──────────────────────────── +// +// SSL_CERT_DOMAIN_SUFFIX is userRequired on the afd-akv overlay (cert +// subject = .) AND is consumed by the VPN combo +// gate as the managed Private DNS zone name. The scaffolder prompts for +// it whenever tlsSource=akv so operators don't have to hand-edit the +// .env between scaffold and deploy. + +test("--ssl-cert-domain-suffix flows through to SSL_CERT_DOMAIN_SUFFIX in scaffolded .env", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--ssl-cert-domain-suffix", "dev.example.com", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const content = readFileSync(TEST_FILE, "utf8"); + assert.match(content, /^SSL_CERT_DOMAIN_SUFFIX=dev\.example\.com$/m); + } finally { + cleanup(); + } +}); + +test("--ssl-cert-domain-suffix INPUT has lowercase transform (validates interactive UX)", () => { + const ssl = INPUTS.find((i) => i.argKey === "sslCertDomainSuffix"); + assert.ok(ssl, "sslCertDomainSuffix INPUT must exist"); + assert.equal(ssl.transform, "lowercase", + "interactive prompt should lowercase free-form input (matches host/privateDnsZone)"); +}); + +test("--ssl-cert-domain-suffix flows through on afd+akv+VPN combo", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + "--vpn-enabled", "y", + "--ssl-cert-domain-suffix", "stamp.contoso.test", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const content = readFileSync(TEST_FILE, "utf8"); + assert.match(content, /^SSL_CERT_DOMAIN_SUFFIX=stamp\.contoso\.test$/m); + // VPN is also enabled — the suffix doubles as the managed Private + // DNS zone name. Both must coexist in the rendered .env. + assert.match(content, /^VPN_GATEWAY_ENABLED=true$/m); + } finally { + cleanup(); + } +}); + +test("scaffolder warns when tlsSource=akv but no --ssl-cert-domain-suffix is supplied", () => { + cleanup(); + try { + // Non-interactive run without the new flag — promptIf is gated on + // tlsSource=akv but readline is closed, so the value lands empty. + // Contract validation in main() must warn (not throw) so the + // operator gets a .env they can fix by hand if needed. deploy.mjs + // is the hard gate. + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "akv", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const out = `${r.stdout}\n${r.stderr}`; + assert.match(out, /SSL_CERT_DOMAIN_SUFFIX/, + "scaffolder must warn that SSL_CERT_DOMAIN_SUFFIX is unset for tlsSource=akv"); + const content = readFileSync(TEST_FILE, "utf8"); + // Template default flows through unchanged (empty). + assert.match(content, /^SSL_CERT_DOMAIN_SUFFIX=$/m); + } finally { + cleanup(); + } +}); + +test("scaffolder does NOT prompt for ssl-cert-domain-suffix on tlsSource=letsencrypt", () => { + cleanup(); + try { + const r = runScript([ + ...FULL_ARGS(), + "--edge-mode", "afd", + "--tls-source", "letsencrypt", + "--acme-email", "ops@example.com", + ]); + assert.equal(r.status, 0, r.stderr || r.stdout); + const content = readFileSync(TEST_FILE, "utf8"); + // On afd-letsencrypt the overlay contract stubs SSL_CERT_DOMAIN_SUFFIX + // to "unused" — that's a deploy-time concern, not a scaffolder one. + // The key point is that the scaffolder did not prompt for it (which + // would have stalled the non-interactive run waiting for input). + assert.match(content, /^SSL_CERT_DOMAIN_SUFFIX=unused$/m); + // No warning about a missing suffix — letsencrypt doesn't consume it. + const out = `${r.stdout}\n${r.stderr}`; + assert.doesNotMatch(out, /requires SSL_CERT_DOMAIN_SUFFIX/); + } finally { + cleanup(); + } +}); + +test("--ssl-cert-domain-suffix INPUT is gated on tlsSource=akv (promptIf)", () => { + const ssl = INPUTS.find((i) => i.argKey === "sslCertDomainSuffix"); + assert.ok(ssl, "sslCertDomainSuffix INPUT must exist"); + assert.ok(typeof ssl.promptIf === "function", "sslCertDomainSuffix must be gated by promptIf"); + assert.equal(ssl.promptIf({ tlsSource: "akv" }), true); + assert.equal(ssl.promptIf({ tlsSource: "letsencrypt" }), false); + assert.equal(ssl.promptIf({ tlsSource: "akv-selfsigned" }), false); +}); diff --git a/deploy/scripts/test/overlay-contracts.test.mjs b/deploy/scripts/test/overlay-contracts.test.mjs index 224566f0..dafed204 100644 --- a/deploy/scripts/test/overlay-contracts.test.mjs +++ b/deploy/scripts/test/overlay-contracts.test.mjs @@ -16,6 +16,7 @@ import { resolveOverlayKey, getContract, validateRequiredEnv, + validateVpnGatewayCombo, applyStubKeys, EDGE_MODES, TLS_SOURCES, @@ -113,19 +114,21 @@ test("PORTAL_HOSTNAME is tracked as a bicep-output on all three overlays", () => test("validateRequiredEnv passes on a fully-populated afd-akv env", () => { const env = { SSL_CERT_DOMAIN_SUFFIX: "portal.example.com" }; - const missing = validateRequiredEnv({ edgeMode: "afd", tlsSource: "akv", env }); - assert.deepEqual(missing, []); + const result = validateRequiredEnv({ edgeMode: "afd", tlsSource: "akv", env }); + assert.deepEqual(result.missing, []); + assert.deepEqual(result.combo, []); }); test("validateRequiredEnv reports SSL_CERT_DOMAIN_SUFFIX missing on afd-akv", () => { const env = {}; - const missing = validateRequiredEnv({ edgeMode: "afd", tlsSource: "akv", env }); + const { missing, combo } = validateRequiredEnv({ edgeMode: "afd", tlsSource: "akv", env }); assert.ok(missing.includes("SSL_CERT_DOMAIN_SUFFIX")); + assert.deepEqual(combo, []); }); test("validateRequiredEnv reports ACME_EMAIL missing on afd-letsencrypt", () => { const env = {}; - const missing = validateRequiredEnv({ + const { missing } = validateRequiredEnv({ edgeMode: "afd", tlsSource: "letsencrypt", env, @@ -135,7 +138,7 @@ test("validateRequiredEnv reports ACME_EMAIL missing on afd-letsencrypt", () => test("validateRequiredEnv catches malformed ACME_EMAIL", () => { const env = { ACME_EMAIL: "not-an-email" }; - const missing = validateRequiredEnv({ + const { missing } = validateRequiredEnv({ edgeMode: "afd", tlsSource: "letsencrypt", env, @@ -145,7 +148,7 @@ test("validateRequiredEnv catches malformed ACME_EMAIL", () => { test("validateRequiredEnv requires HOST/PRIVATE_DNS_ZONE/AKS_VNET_ID for private-akv", () => { const env = {}; - const missing = validateRequiredEnv({ + const { missing } = validateRequiredEnv({ edgeMode: "private", tlsSource: "akv", env, @@ -211,3 +214,170 @@ test("bicep portal main.bicep tlsSource default matches DEFAULT_TLS_SOURCE", () `bicep tlsSource default '${m[1]}' must match overlay-contracts DEFAULT_TLS_SOURCE '${DEFAULT_TLS_SOURCE}'`, ); }); + +// === validateVpnGatewayCombo ================================================ + +const VPN_BASE_ENV = Object.freeze({ + VPN_GATEWAY_ENABLED: "true", + SSL_CERT_DOMAIN_SUFFIX: "portal.example.com", + VPN_CLIENT_ADDRESS_POOL: "172.16.200.0/24", + AZURE_TENANT_ID: "00000000-0000-0000-0000-000000000000", +}); + +test("validateVpnGatewayCombo: disabled (unset) → empty regardless of other VPN env", () => { + const env = { + EDGE_MODE: "private", + TLS_SOURCE: "letsencrypt", + VPN_CLIENT_ADDRESS_POOL: "10.20.50.0/24", // would overlap default VNet + }; + assert.deepEqual( + validateVpnGatewayCombo({ edgeMode: "private", tlsSource: "letsencrypt", env }), + [], + ); +}); + +test("validateVpnGatewayCombo: disabled (literal 'false') → empty", () => { + const env = { ...VPN_BASE_ENV, VPN_GATEWAY_ENABLED: "false" }; + assert.deepEqual( + validateVpnGatewayCombo({ edgeMode: "private", tlsSource: "letsencrypt", env }), + [], + ); +}); + +test("validateVpnGatewayCombo: valid combo (afd + akv + suffix + non-overlapping pool) → empty", () => { + const env = { ...VPN_BASE_ENV }; + assert.deepEqual( + validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }), + [], + ); +}); + +test("validateVpnGatewayCombo: TLS_SOURCE=letsencrypt → vpn-requires-akv", () => { + const env = { ...VPN_BASE_ENV }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "letsencrypt", env }); + assert.ok(errs.includes("vpn-requires-akv"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: EDGE_MODE=private → vpn-requires-afd", () => { + const env = { ...VPN_BASE_ENV }; + const errs = validateVpnGatewayCombo({ edgeMode: "private", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-requires-afd"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: missing SSL_CERT_DOMAIN_SUFFIX → vpn-requires-domain-suffix", () => { + const env = { ...VPN_BASE_ENV, SSL_CERT_DOMAIN_SUFFIX: "" }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-requires-domain-suffix"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: overlapping pool (10.20.50.0/24 inside default 10.20.0.0/16) → vpn-pool-overlap", () => { + const env = { ...VPN_BASE_ENV, VPN_CLIENT_ADDRESS_POOL: "10.20.50.0/24" }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-pool-overlap"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: non-overlapping pool (172.16.200.0/24) → no overlap error", () => { + const env = { ...VPN_BASE_ENV }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(!errs.includes("vpn-pool-overlap"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: malformed pool CIDR → vpn-pool-overlap (fail-closed)", () => { + const env = { ...VPN_BASE_ENV, VPN_CLIENT_ADDRESS_POOL: "not-a-cidr" }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-pool-overlap"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: respects VNET_CIDR override when supplied", () => { + // If operator overrides VNET_CIDR to 10.99.0.0/16, the default-pool + // 172.16.200.0/24 still doesn't overlap. Sanity: 10.99.50.0/24 would. + const env = { ...VPN_BASE_ENV, VNET_CIDR: "10.99.0.0/16", VPN_CLIENT_ADDRESS_POOL: "10.99.50.0/24" }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-pool-overlap"), `got: ${errs.join(",")}`); +}); + +test("validateRequiredEnv returns vpn combo errors in a separate channel from missing[]", () => { + // afd-akv overlay so the baseline required key (SSL_CERT_DOMAIN_SUFFIX) + // is exercised; with VPN enabled and TLS_SOURCE=letsencrypt we'd switch + // overlays, so use afd-akv + letsencrypt? No — overlay is resolved from + // (edgeMode,tlsSource). Use afd-akv overlay = (afd, akv); but then the + // VPN combo is satisfied. To get a vpn-* error through the validator + // entrypoint, flip EDGE_MODE to 'private' and use the private-akv + // overlay which requires HOST/PRIVATE_DNS_ZONE/AKS_VNET_ID anyway. + const env = { + VPN_GATEWAY_ENABLED: "true", + SSL_CERT_DOMAIN_SUFFIX: "portal.example.com", + AZURE_TENANT_ID: "00000000-0000-0000-0000-000000000000", + HOST: "h", + PRIVATE_DNS_ZONE: "z", + AKS_VNET_ID: "v", + }; + const { missing, combo } = validateRequiredEnv({ edgeMode: "private", tlsSource: "akv", env }); + // Combo errors live in `combo`, NOT in `missing` — that's the contract + // change being guarded here (regression: pre-Phase-2-followup the + // vpn-requires-afd code was pushed onto `missing` and rendered with the + // wrong error string + a misleading scaffolder hint). + assert.deepEqual(missing, [], `combo errors leaked into missing[]: ${missing.join(",")}`); + const codes = combo.map((c) => c.code); + assert.ok(codes.includes("vpn-requires-afd"), `got: ${codes.join(",")}`); + // Each combo entry is a {code, message, hint} object with non-empty + // strings — guards against accidental shape drift. + for (const e of combo) { + assert.equal(typeof e.code, "string"); + assert.ok(e.message && typeof e.message === "string", `empty message on ${e.code}`); + assert.ok(e.hint && typeof e.hint === "string", `empty hint on ${e.code}`); + // The hint MUST NOT direct operators at the scaffolder — re-running + // new-env.mjs would clobber operator edits, and the underlying problem + // isn't an unset key, it's a bad combination of values. + assert.ok( + !/new-env|deploy:new-env/i.test(e.hint), + `combo hint for ${e.code} must not point at the scaffolder: ${e.hint}`, + ); + } +}); + +test("validateRequiredEnv: vpn-requires-tenant-id surfaces a hint pointing at the env file", () => { + // IMPROVE-1: blanked AZURE_TENANT_ID with VPN enabled should fail-closed + // pre-deploy with a clear named error, instead of falling through to + // the bicep `param tenantId string = ''` default. + const env = { + VPN_GATEWAY_ENABLED: "true", + SSL_CERT_DOMAIN_SUFFIX: "portal.example.com", + AZURE_TENANT_ID: " ", // whitespace-only → treated as empty + VPN_CLIENT_ADDRESS_POOL: "172.16.200.0/24", + }; + const { combo } = validateRequiredEnv({ edgeMode: "afd", tlsSource: "akv", env }); + const tenantErr = combo.find((c) => c.code === "vpn-requires-tenant-id"); + assert.ok(tenantErr, `expected vpn-requires-tenant-id; got: ${combo.map((c) => c.code).join(",")}`); + assert.match(tenantErr.message, /AZURE_TENANT_ID/); + assert.match(tenantErr.hint, /\.env/); +}); + +test("validateVpnGatewayCombo: missing AZURE_TENANT_ID → vpn-requires-tenant-id", () => { + const env = { ...VPN_BASE_ENV, AZURE_TENANT_ID: "" }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-requires-tenant-id"), `got: ${errs.join(",")}`); +}); + +test("validateVpnGatewayCombo: whitespace-only AZURE_TENANT_ID → vpn-requires-tenant-id", () => { + const env = { ...VPN_BASE_ENV, AZURE_TENANT_ID: " " }; + const errs = validateVpnGatewayCombo({ edgeMode: "afd", tlsSource: "akv", env }); + assert.ok(errs.includes("vpn-requires-tenant-id"), `got: ${errs.join(",")}`); +}); + +test("VPN combo-error hints never reference the nonexistent deploy/docs/ tree", () => { + // describeVpnComboError is a private function — scan the source as the + // regression guard. Any reintroduction (in a hint or even a comment) of + // a `deploy/docs/` path would be a regression: that directory doesn't + // exist; the canonical operator doc is `docs/deploying-to-aks.md`. + // Final-review C-1 follow-up. + const src = readFileSync( + join(REPO_ROOT, "deploy", "scripts", "lib", "overlay-contracts.mjs"), + "utf8", + ); + assert.ok( + !src.includes("deploy/docs/"), + "overlay-contracts.mjs still contains a deploy/docs/ reference", + ); +}); + diff --git a/deploy/services/base-infra/bicep/application-gateway.bicep b/deploy/services/base-infra/bicep/application-gateway.bicep index adb25a5e..48ce40f3 100644 --- a/deploy/services/base-infra/bicep/application-gateway.bicep +++ b/deploy/services/base-infra/bicep/application-gateway.bicep @@ -44,6 +44,103 @@ param appGwExists bool = false @description('Resource ID of the Log Analytics workspace that receives Application Gateway diagnostic logs (access + firewall) and metrics. Empty string disables the diagnostic setting.') param logAnalyticsWorkspaceResourceId string = '' +@description('Operator-supplied WAF custom rules array (e.g. loaded from APPGW_WAF_CUSTOM_RULES_FILE via deploy-bicep.mjs, mirroring the AFD precedent). Expected to start at priority >= 100; priorities 90-92 are reserved for the auto-seeded VPN guard rules below.') +param appgwWafCustomRules array = [] + +@description('Whether the auto-seeded VPN guard rules (AllowAfd / AllowVpn / BlockOther at priorities 90-92) are constructed and prepended to the WAF custom-rules array. When false, only operator-supplied rules apply (or none, preserving prior behaviour for non-VPN stamps).') +param vpnGatewayEnabled bool = false + +@description('CIDR block assigned to connected VPN clients — used by the auto-seeded AllowVpn rule to allow-list tunnelled traffic. Ignored when vpnGatewayEnabled=false.') +param vpnClientAddressPool string = '' + +@description('AFD frontDoorId GUID — used by the auto-seeded AllowAfd rule to allow-list requests carrying the matching X-Azure-FDID header. Threaded from the global-infra frontDoorId output via the deploy-bicep.mjs alias pipeline. Ignored when vpnGatewayEnabled=false.') +param frontDoorId string = '' + +// --------------------------------------------------------------------------- +// WAF custom rules — hybrid AFD + VPN trusted-bypass. +// --------------------------------------------------------------------------- +// Priority bands: +// * 90-92 — reserved for the auto-seeded VPN guard rules below. +// * >= 100 — operator-supplied rules via APPGW_WAF_CUSTOM_RULES_FILE. +// +// Auto-seeded guard semantics when vpnGatewayEnabled=true: +// * AllowAfd (prio 90): requests with X-Azure-FDID == +// bypass the catch-all block — this is the AFD +// origin-authenticity check Azure documents for +// AFD -> origin pinning. +// * AllowVpn (prio 91): requests whose remote address lives in the VPN +// client pool bypass the catch-all block. +// * BlockOther(prio 92): catch-all explicit block. Belt-and-braces against +// accidentally exposing the AppGw private FE outside +// the AFD or VPN paths if downstream NSG/route +// changes ever shifted topology. +// +// When vpnGatewayEnabled=false AND appgwWafCustomRules is empty, mergedCustomRules +// is [] and the customRules.rules block is a no-op (matches the AFD WAF +// policy precedent in frontdoor-waf-policy.bicep:93-95 for empty arrays). +// --------------------------------------------------------------------------- +var autoSeedRules = vpnGatewayEnabled ? [ + { + name: 'AllowAfd' + priority: 90 + ruleType: 'MatchRule' + action: 'Allow' + matchConditions: [ + { + matchVariables: [ + { + variableName: 'RequestHeaders' + selector: 'X-Azure-FDID' + } + ] + operator: 'Equal' + matchValues: [ + frontDoorId + ] + } + ] + } + { + name: 'AllowVpn' + priority: 91 + ruleType: 'MatchRule' + action: 'Allow' + matchConditions: [ + { + matchVariables: [ + { + variableName: 'RemoteAddr' + } + ] + operator: 'IPMatch' + matchValues: [ + vpnClientAddressPool + ] + } + ] + } + { + name: 'BlockOther' + priority: 92 + ruleType: 'MatchRule' + action: 'Block' + matchConditions: [ + { + matchVariables: [ + { + variableName: 'RemoteAddr' + } + ] + operator: 'IPMatch' + matchValues: [ + '0.0.0.0/0' + ] + } + ] + } +] : [] +var mergedCustomRules = concat(autoSeedRules, appgwWafCustomRules) + // --------------------------------------------------------------------------- // WAF policy // --------------------------------------------------------------------------- @@ -109,6 +206,7 @@ resource wafPolicy 'Microsoft.Network/ApplicationGatewayWebApplicationFirewallPo } ] } + customRules: mergedCustomRules } } diff --git a/deploy/services/base-infra/bicep/base-infra.params.template.json b/deploy/services/base-infra/bicep/base-infra.params.template.json index a89d2f9e..c6894be2 100644 --- a/deploy/services/base-infra/bicep/base-infra.params.template.json +++ b/deploy/services/base-infra/bicep/base-infra.params.template.json @@ -43,6 +43,27 @@ }, "foundrySku": { "value": "${FOUNDRY_SKU}" + }, + "vpnGatewayEnabled": { + "value": ${VPN_GATEWAY_ENABLED} + }, + "vpnGatewaySku": { + "value": "${VPN_GATEWAY_SKU}" + }, + "vpnClientAddressPool": { + "value": "${VPN_CLIENT_ADDRESS_POOL}" + }, + "vpnAadAudience": { + "value": "${VPN_AAD_AUDIENCE}" + }, + "portalResourceName": { + "value": "${PORTAL_RESOURCE_NAME}" + }, + "frontDoorId": { + "value": "${FRONT_DOOR_ID}" + }, + "tenantId": { + "value": "${AZURE_TENANT_ID}" } } } diff --git a/deploy/services/base-infra/bicep/dns-resolver.bicep b/deploy/services/base-infra/bicep/dns-resolver.bicep new file mode 100644 index 00000000..5ed50e0d --- /dev/null +++ b/deploy/services/base-infra/bicep/dns-resolver.bicep @@ -0,0 +1,87 @@ +// ============================================================================== +// PilotSwarm BaseInfra — Azure Private DNS Resolver (inbound endpoint only). +// +// Co-provisioned with the VPN P2S gateway. Solves the "P2S clients can reach +// the VNet but can't resolve Private DNS Zone records" gap: +// +// 1. P2S clients receive an address from `vpnClientAddressPool` and can +// route to the VNet — but cannot query 168.63.129.16 (that magic IP is +// only reachable from inside Azure VMs). +// 2. The classic Microsoft.Network/virtualNetworkGateways resource has no +// DNS-push field of its own (verified against the live ARM schema for +// api-version 2024-01-01 through 2025-07-01). The supported path for +// advertising DNS servers to P2S clients is the parent VNet's +// `dhcpOptions.dnsServers` block, which the gateway pushes at connect +// time. +// 3. Therefore vnet.bicep's `dhcpOptions.dnsServers` MUST point at a DNS +// server that (a) sits on a regular VNet IP P2S clients can reach via +// the tunnel and (b) can answer Private DNS Zone queries (e.g. +// `dev.pilotswarm.azure.com`). +// 4. The Private DNS Resolver inbound endpoint is the supported, managed +// service for exactly this. It exposes a single VNet IP that, on +// receiving a DNS query, consults all Private DNS Zones linked to the +// VNet and falls back to internet resolution otherwise. +// +// Not deployed when VPN is disabled — there is nothing reaching for it. +// +// Cost: roughly $170/mo per inbound endpoint (two ENIs at ~$0.24/hr). Cost +// is fully gated behind the same `vpnGatewayEnabled` flag that gates the VPN +// gateway itself; default stamps incur zero cost. +// ============================================================================== + +@description('Azure region. Must match the parent VNet region.') +param location string + +@description('Naming prefix (resolver + inbound endpoint names derive from this).') +param resourceName string + +@description('Resource ID of the BaseInfra VNet this resolver attaches to. The resolver answers from every Private DNS Zone linked to this VNet.') +param vnetId string + +@description('Resource ID of the dedicated dns-resolver-inbound subnet (must be delegated to Microsoft.Network/dnsResolvers).') +param inboundSubnetId string + +@description('Static private IP for the inbound endpoint within the inbound subnet. Static (not Dynamic) so the VNet `dhcpOptions.dnsServers` value (set in main.bicep) is deterministic across redeploys and does not require a dependent reference cycle. Must sit in the inbound subnet range and not collide with Azure-reserved IPs (.0/.1/.2/.3/last).') +param inboundIpAddress string = '10.20.19.4' + +@description('Resource tags to apply.') +param tags object = {} + +// The resolver itself is a per-VNet anchor object — no compute, no cost on +// its own. The cost lives in the inbound endpoint (ENIs). +resource dnsResolver 'Microsoft.Network/dnsResolvers@2022-07-01' = { + name: '${resourceName}-dns-resolver' + location: location + tags: tags + properties: { + virtualNetwork: { + id: vnetId + } + } +} + +// Single inbound endpoint with one IP config. Static allocation so the IP is +// known at bicep-eval time (we set the same value on the VPN gateway). One +// endpoint is enough for the VPN P2S client population; horizontal scale-out +// would be a future-day concern for thousands of concurrent clients. +resource inboundEndpoint 'Microsoft.Network/dnsResolvers/inboundEndpoints@2022-07-01' = { + parent: dnsResolver + name: 'inbound' + location: location + tags: tags + properties: { + ipConfigurations: [ + { + subnet: { + id: inboundSubnetId + } + privateIpAllocationMethod: 'Static' + privateIpAddress: inboundIpAddress + } + ] + } +} + +output dnsResolverId string = dnsResolver.id +output inboundEndpointId string = inboundEndpoint.id +output inboundIpAddress string = inboundIpAddress diff --git a/deploy/services/base-infra/bicep/main.bicep b/deploy/services/base-infra/bicep/main.bicep index 0350b617..3021e3b6 100644 --- a/deploy/services/base-infra/bicep/main.bicep +++ b/deploy/services/base-infra/bicep/main.bicep @@ -91,6 +91,33 @@ param foundrySku string = 'S0' @description('Array of Foundry model deployments to provision. Each entry: { name, model: { format, name, version }, sku: { name, capacity } }. Threaded by the deploy orchestrator from a per-stamp JSON file (deploy/envs/local//foundry-deployments.json) via `--parameters foundryDeployments=@`. Empty array → account is provisioned with no deployments, useful for incremental opt-in. Ignored when foundryEnabled=false.') param foundryDeployments array = [] +// ----- VPN P2S ingress (additive, optional) --------------------------------- +// All defaults preserve byte-equivalent param shape for non-VPN stamps. + +@description('Whether to provision the Azure VPN Gateway (P2S, AAD-authenticated) as an additive ingress alongside AFD. False by default.') +param vpnGatewayEnabled bool = false + +@description('VPN Gateway SKU. See vpn-gateway.bicep for allowed values; only AZ variants on Generation2 are supported (VpnGw1AZ is Generation1-only and has been observed to silently drop OpenVPN+AAD control-channel handshakes on new deployments). Basic is excluded (no OpenVPN/AAD support).') +param vpnGatewaySku string = 'VpnGw2AZ' + +@description('VPN client address pool (must not overlap the VNet or any reachable on-prem range).') +param vpnClientAddressPool string = '172.16.200.0/24' + +@description('AAD application audience for VPN client tokens. Default is the public Azure VPN client app ID; override only if you have registered a custom Azure VPN app.') +param vpnAadAudience string = 'c632b3df-fb67-4d84-bdcf-b95ad541b5c8' + +@description('Operator-supplied AppGw WAF custom rules (priority >= 100). Threaded by deploy-bicep.mjs from APPGW_WAF_CUSTOM_RULES_FILE; empty by default. The auto-seeded AFD/VPN guard rules at priorities 90-92 are added by application-gateway.bicep when vpnGatewayEnabled=true.') +param appgwWafCustomRules array = [] + +@description('AFD frontDoorId GUID emitted by global-infra/bicep/main.bicep, threaded as FRONT_DOOR_ID via the deploy-bicep.mjs aliasFor() pipeline. Used by the AllowAfd WAF guard rule when VPN ingress is enabled.') +param frontDoorId string = '' + +@description('Microsoft Entra (AAD) tenant ID threaded from the deploy-time AZURE_TENANT_ID env var via the base-infra params template. Currently consumed only by the VPN gateway module; the keyvault/postgres modules keep their own subscription().tenantId defaults. Empty default keeps non-VPN stamps byte-equivalent at the param layer.') +param tenantId string = '' + +@description('Portal short hostname (e.g. pschkrawvpn-wus3-portal), threaded from the deploy-time PORTAL_RESOURCE_NAME env var. Used as the A-record label in the VPN-mode private DNS zone so VPN clients resolve the SAME hostname the AppGw listener serves (matching the AKV cert subject portalResourceName.sslCertificateDomainSuffix set in portal/bicep/main.bicep). Empty default keeps non-VPN stamps byte-equivalent.') +param portalResourceName string = '' + // ============================================================================== // Derived names // ============================================================================== @@ -156,6 +183,17 @@ module Vnet './vnet.bicep' = { params: { location: location resourceNamePrefix: resourceNamePrefix + vpnGatewayEnabled: vpnGatewayEnabled + // When VPN ingress is enabled, advertise the Private DNS Resolver inbound + // endpoint static IP via the VNet's dhcpOptions. P2S clients connecting + // through the VPN gateway inherit this DNS server list (the classic + // Microsoft.Network/virtualNetworkGateways resource has no DNS-push field + // of its own — see vpn-gateway.bicep). The IP is the static address + // hardcoded in dns-resolver.bicep `inboundIpAddress` default (10.20.19.4) + // and is reachable from P2S clients via the tunnel. + dnsServers: vpnGatewayEnabled ? [ + '10.20.19.4' + ] : [] } } @@ -209,6 +247,10 @@ module AppGateway './application-gateway.bicep' = if (edgeMode == 'afd') { userAssignedIdentityId: Uami.outputs.appGwIdentityResourceId appGwExists: ApplicationGatewayExistsCheck!.outputs.exists logAnalyticsWorkspaceResourceId: LogAnalytics.outputs.workspaceId + appgwWafCustomRules: appgwWafCustomRules + vpnGatewayEnabled: vpnGatewayEnabled + vpnClientAddressPool: vpnClientAddressPool + frontDoorId: frontDoorId } } @@ -259,6 +301,71 @@ module LogAnalytics '../../common/bicep/log-analytics.bicep' = { } } +// ============================================================================== +// VPN P2S Gateway (optional, additive ingress). Provisioned after VNet + +// LogAnalytics so the GatewaySubnet exists and diagnostic logs have a sink. +// Skipped entirely when vpnGatewayEnabled=false (default). +// ============================================================================== + +// Private DNS Resolver — provisioned in lockstep with the VPN gateway so P2S +// clients have a reachable DNS server (the Resolver inbound IP) pushed via +// vpnClientConfiguration.customDnsServers. Without this, P2S clients cannot +// resolve Private DNS Zone records (e.g. portal hostname) through the tunnel +// because 168.63.129.16 is only reachable from inside Azure VMs. +module DnsResolver './dns-resolver.bicep' = if (vpnGatewayEnabled) { + name: '${resourceNamePrefix}-dns-resolver-${dTime}' + params: { + location: location + resourceName: resourceNamePrefix + vnetId: Vnet.outputs.vnetId + inboundSubnetId: Vnet.outputs.dnsResolverInboundSubnetId + } +} + +module VpnGateway './vpn-gateway.bicep' = if (vpnGatewayEnabled) { + name: '${resourceNamePrefix}-vpngw-${dTime}' + params: { + location: location + resourceName: resourceNamePrefix + gatewaySubnetId: Vnet.outputs.gatewaySubnetId + vpnGatewaySku: vpnGatewaySku + vpnClientAddressPool: vpnClientAddressPool + // tenantId is threaded from AZURE_TENANT_ID via the base-infra params + // template (deploy-bicep.mjs render pipeline). This aligns the VPN + // gateway's Entra ID authentication with the rest of the stamp's tenant + // assumption. The keyvault/postgres modules continue to default to + // subscription().tenantId — same value today, but their bicep contract + // is unchanged. + tenantId: tenantId + vpnAadAudience: vpnAadAudience + logAnalyticsWorkspaceId: LogAnalytics.outputs.workspaceId + } +} + +// ============================================================================== +// Private DNS zone for portal resolution over the VPN tunnel (managed-only). +// Provisioned after AppGw so the AppGw private FE IP is available. +// Skipped when vpnGatewayEnabled=false or edgeMode != 'afd' (no AppGw to point to). +// ============================================================================== + +module PortalPrivateDns './private-dns-portal.bicep' = if (vpnGatewayEnabled && edgeMode == 'afd') { + name: '${resourceNamePrefix}-portal-pdns-${dTime}' + params: { + vpnGatewayEnabled: vpnGatewayEnabled + dnsZoneName: sslCertificateDomainSuffix + // Must match the AppGw HTTPS listener hostname (= portal AKV cert + // subject), composed in portal/bicep/main.bicep as + // `${resourceName}.${sslCertificateDomainSuffix}` where `resourceName` + // is the portal short name (`--portal`). Threaded + // here as PORTAL_RESOURCE_NAME from the deploy env. Fallback to bare + // resourceNamePrefix is intentionally preserved for older stamps that + // pre-date this thread (cert/DNS would mismatch, but build won't break). + recordName: empty(portalResourceName) ? resourceNamePrefix : portalResourceName + appGatewayPrivateIp: appGatewayPrivateIpAddress + vnetId: Vnet.outputs.vnetId + } +} + module Aks './aks.bicep' = { name: '${resourceNamePrefix}-aks-${dTime}' params: { @@ -536,3 +643,9 @@ output logAnalyticsWorkspaceName string = LogAnalytics.outputs.workspaceName // manifest-staging time (placeholder `__FOUNDRY_ENDPOINT__`). output foundryEndpoint string = foundryEnabled ? Foundry!.outputs.endpoint : '' output foundryAccountName string = foundryEnabled ? Foundry!.outputs.accountName : '' + +// ----- VPN P2S ingress outputs (empty strings when disabled) --------------- +output vpnGatewayId string = vpnGatewayEnabled ? VpnGateway!.outputs.vpnGatewayId : '' +output vpnGatewayPublicIp string = vpnGatewayEnabled ? VpnGateway!.outputs.vpnGatewayPublicIp : '' +output vpnClientAddressPool string = vpnGatewayEnabled ? vpnClientAddressPool : '' +output vpnPrivateDnsZoneId string = (vpnGatewayEnabled && edgeMode == 'afd') ? PortalPrivateDns!.outputs.dnsZoneId : '' diff --git a/deploy/services/base-infra/bicep/private-dns-portal.bicep b/deploy/services/base-infra/bicep/private-dns-portal.bicep new file mode 100644 index 00000000..d3ddbf4b --- /dev/null +++ b/deploy/services/base-infra/bicep/private-dns-portal.bicep @@ -0,0 +1,82 @@ +// ============================================================================== +// PilotSwarm BaseInfra — Private DNS zone for portal resolution over the +// P2S VPN tunnel (managed-only). +// +// When VPN ingress is enabled, off-network employees resolve the portal +// hostname (`.`) to the AppGw private frontend IP +// over the VPN tunnel. The zone is created in the BaseInfra RG and linked to +// the BaseInfra VNet; the Azure Private DNS Resolver (dns-resolver.bicep) +// answers queries from P2S clients via its inbound endpoint IP. That IP is +// advertised to P2S clients via the parent VNet's `dhcpOptions.dnsServers` +// — the classic Microsoft.Network/virtualNetworkGateways resource has no +// DNS-push field of its own, so VNet DHCP options are the supported path +// for P2S DNS distribution. (P2S clients cannot reach the Azure-magic DNS +// IP 168.63.129.16 — that IP only works from inside Azure VMs — which is +// precisely why the Resolver is co-provisioned with the VPN.) +// +// Cross-reference: deploy/services/portal/bicep/main.bicep:239-261 hosts a +// similar private-DNS pattern, but that module is `edgeMode='private'`-only +// and lives in the portal service (not reused here — different lifecycle, +// different RG semantics, different trigger condition). +// +// "Managed-only" — BYO Private DNS path was dropped during planning. If a +// future requirement to support customer-supplied zones surfaces, gate this +// module behind a `vpnPrivateDnsMode` discriminator in main.bicep rather than +// expanding this module. +// ============================================================================== + +@description('Whether to provision the private DNS zone + VNet link + A record. False emits no resources (output is empty string).') +param vpnGatewayEnabled bool + +@description('Private DNS zone name to provision (= SSL_CERT_DOMAIN_SUFFIX so the portal cert SAN resolves over the tunnel).') +param dnsZoneName string + +@description('Hostname label for the portal A record. MUST match the AppGw HTTPS listener hostname (= AKV cert subject), which portal/bicep/main.bicep composes as portalResourceName.sslCertificateDomainSuffix (e.g. pschkrawvpn-wus3-portal). Threaded from main.bicep portalResourceName (PORTAL_RESOURCE_NAME env var).') +param recordName string + +@description('AppGw private frontend IP address that VPN clients should resolve the portal hostname to.') +param appGatewayPrivateIp string + +@description('Resource ID of the BaseInfra VNet to link the private zone to (so VPN clients resolving via 168.63.129.16 hit this zone).') +param vnetId string + +@description('Resource tags to apply.') +param tags object = {} + +// Private DNS zones are global resources; their location must be 'global'. +resource portalVpnPrivateDnsZone 'Microsoft.Network/privateDnsZones@2024-06-01' = if (vpnGatewayEnabled) { + name: dnsZoneName + location: 'global' + tags: tags +} + +resource portalVpnPrivateDnsZoneVnetLink 'Microsoft.Network/privateDnsZones/virtualNetworkLinks@2024-06-01' = if (vpnGatewayEnabled) { + parent: portalVpnPrivateDnsZone + name: 'baseinfra-vnet-link' + location: 'global' + tags: tags + properties: { + registrationEnabled: false + virtualNetwork: { + id: vnetId + } + } +} + +resource portalVpnPrivateDnsARecord 'Microsoft.Network/privateDnsZones/A@2024-06-01' = if (vpnGatewayEnabled) { + parent: portalVpnPrivateDnsZone + name: recordName + properties: { + ttl: 300 + aRecords: [ + { + ipv4Address: appGatewayPrivateIp + } + ] + } +} + +// Empty string when disabled keeps the output shape stable for the parent +// module so existing stamps remain byte-equivalent at the output level. +output dnsZoneId string = vpnGatewayEnabled ? portalVpnPrivateDnsZone.id : '' +output dnsZoneName string = vpnGatewayEnabled ? portalVpnPrivateDnsZone.name : '' diff --git a/deploy/services/base-infra/bicep/vnet.bicep b/deploy/services/base-infra/bicep/vnet.bicep index 8243ed9a..57014c28 100644 --- a/deploy/services/base-infra/bicep/vnet.bicep +++ b/deploy/services/base-infra/bicep/vnet.bicep @@ -6,8 +6,7 @@ // 2. App Gateway subnet — hosts the WAF_v2 Standard_v2 ingress. // 3. App Gateway Private Link subnet — dedicated subnet for the Private // Link Service backing the AppGW private frontend. MUST have -// `privateLinkServiceNetworkPolicies: 'Disabled'` (Spec FR-001 / -// CodeResearch §1). +// `privateLinkServiceNetworkPolicies: 'Disabled'`. // ============================================================================== @description('Azure region.') @@ -28,10 +27,86 @@ param appGatewaySubnetPrefix string = '10.20.16.0/24' @description('Application Gateway Private Link subnet prefix (must be distinct from the App Gateway subnet).') param appGatewayPrivateLinkSubnetPrefix string = '10.20.17.0/24' +@description('Whether to provision a GatewaySubnet (required for the Azure VPN Gateway). False by default — VPN is an additive, optional ingress.') +param vpnGatewayEnabled bool = false + +@description('GatewaySubnet prefix. Azure requires this subnet to be literally named "GatewaySubnet"; the name is fixed and not configurable. /27 minimum for VpnGw1+ (RouteBased).') +param gatewaySubnetPrefix string = '10.20.18.0/27' + +@description('Azure Private DNS Resolver inbound endpoint subnet prefix. Co-provisioned with the VPN gateway: P2S clients are pushed the resolver inbound IP via `customDnsServers` so they can resolve Private DNS Zones (e.g. dev.pilotswarm.azure.com) through the tunnel. /28 minimum per Microsoft.Network/dnsResolvers contract. Subnet is dedicated and delegated to Microsoft.Network/dnsResolvers — no other resource may live in it.') +param dnsResolverInboundSubnetPrefix string = '10.20.19.0/28' + +@description('DNS server IPs to advertise via the VNet `dhcpOptions.dnsServers` block. P2S VPN clients connecting through the gateway inherit these (this is the supported mechanism — the classic `virtualNetworkGateways` resource has no DNS push field of its own). Leave empty for non-VPN stamps so the VNet uses Azure-provided DNS. When VPN is enabled, main.bicep passes the Private DNS Resolver inbound endpoint static IP so clients can resolve Private DNS Zone records through the tunnel.') +param dnsServers array = [] + var vnetName = '${resourceNamePrefix}-vnet' var aksSubnetName = 'aks-subnet' var appGatewaySubnetName = 'appgw-subnet' var appGatewayPrivateLinkSubnetName = 'appgw-pls-subnet' +// Azure requirement: VPN Gateway only attaches to a subnet named exactly +// "GatewaySubnet". Do not parameterise this name. +var gatewaySubnetName = 'GatewaySubnet' +var dnsResolverInboundSubnetName = 'dns-resolver-inbound-subnet' + +var baseSubnets = [ + { + name: aksSubnetName + properties: { + addressPrefix: aksSubnetPrefix + } + } + { + name: appGatewaySubnetName + properties: { + addressPrefix: appGatewaySubnetPrefix + } + } + { + name: appGatewayPrivateLinkSubnetName + properties: { + addressPrefix: appGatewayPrivateLinkSubnetPrefix + privateLinkServiceNetworkPolicies: 'Disabled' + } + } +] + +var gatewaySubnetEntry = [ + { + name: gatewaySubnetName + properties: { + addressPrefix: gatewaySubnetPrefix + } + } +] + +// Private DNS Resolver inbound endpoint subnet. Co-provisioned with the VPN +// gateway because P2S clients cannot reach the Azure-magic DNS IP +// (168.63.129.16) through the tunnel — that IP is only reachable from inside +// Azure VMs. The Resolver inbound endpoint sits on a regular VNet IP that IS +// reachable from P2S clients, and resolves Private DNS Zone records on their +// behalf. +// +// `delegations` is required: Microsoft.Network/dnsResolvers takes exclusive +// ownership of the subnet. No NSG should be applied (the resolver service +// manages its own ingress on UDP/53 + TCP/53). +var dnsResolverInboundSubnetEntry = [ + { + name: dnsResolverInboundSubnetName + properties: { + addressPrefix: dnsResolverInboundSubnetPrefix + delegations: [ + { + name: 'Microsoft.Network.dnsResolvers' + properties: { + serviceName: 'Microsoft.Network/dnsResolvers' + } + } + ] + } + } +] + +var allSubnets = vpnGatewayEnabled ? concat(baseSubnets, gatewaySubnetEntry, dnsResolverInboundSubnetEntry) : baseSubnets resource vnet 'Microsoft.Network/virtualNetworks@2024-01-01' = { name: vnetName @@ -42,27 +117,13 @@ resource vnet 'Microsoft.Network/virtualNetworks@2024-01-01' = { addressSpace ] } - subnets: [ - { - name: aksSubnetName - properties: { - addressPrefix: aksSubnetPrefix - } - } - { - name: appGatewaySubnetName - properties: { - addressPrefix: appGatewaySubnetPrefix - } - } - { - name: appGatewayPrivateLinkSubnetName - properties: { - addressPrefix: appGatewayPrivateLinkSubnetPrefix - privateLinkServiceNetworkPolicies: 'Disabled' - } - } - ] + // dhcpOptions only emitted when DNS servers are supplied. Empty array + // would set "no DNS" on the VNet, which is different from "use Azure + // default" — preserving omission keeps non-VPN stamps unchanged. + dhcpOptions: empty(dnsServers) ? null : { + dnsServers: dnsServers + } + subnets: allSubnets } } @@ -73,3 +134,10 @@ output aksSubnetName string = aksSubnetName output appGatewaySubnetId string = vnet.properties.subnets[1].id output appGatewaySubnetName string = appGatewaySubnetName output appGatewayPrivateLinkSubnetId string = vnet.properties.subnets[2].id +// GatewaySubnet ID only when VPN ingress is enabled — empty string keeps the +// output shape stable for non-VPN stamps. +output gatewaySubnetId string = vpnGatewayEnabled ? vnet.properties.subnets[3].id : '' +// DNS Resolver inbound subnet — only present when VPN is enabled (the +// resolver is exclusively here to give P2S clients a reachable DNS server). +output dnsResolverInboundSubnetId string = vpnGatewayEnabled ? vnet.properties.subnets[4].id : '' +output dnsResolverInboundSubnetName string = dnsResolverInboundSubnetName diff --git a/deploy/services/base-infra/bicep/vpn-gateway.bicep b/deploy/services/base-infra/bicep/vpn-gateway.bicep new file mode 100644 index 00000000..0505026b --- /dev/null +++ b/deploy/services/base-infra/bicep/vpn-gateway.bicep @@ -0,0 +1,196 @@ +// ============================================================================== +// PilotSwarm BaseInfra — Azure VPN Gateway (Point-to-Site, Microsoft Entra ID +// authentication, OpenVPN protocol). +// +// Provisioned conditionally from `main.bicep` when `vpnGatewayEnabled=true`. +// Coexists with the AFD edge mode as an additive, optional ingress — the +// "trusted-bypass" pattern for authenticated tenant users who hold a valid +// Entra ID token but are blocked at the public edge by operator-defined +// AFD WAF allow-lists (e.g. service-tag, IP-range, or header-based rules +// that gate AFD to a known managed-network population). +// +// Note: this module emits the gateway + its public IP + a diagnostic-settings +// sink. The orchestrator does NOT render a vpn-postdeploy hook; client-profile +// distribution to operators is documented out-of-band. +// ============================================================================== + +@description('Azure region. Must match the parent VNet region.') +param location string + +@description('Naming prefix for the gateway + its public IP (e.g. the BaseInfra resourceNamePrefix).') +param resourceName string + +@description('Resource ID of the GatewaySubnet inside the BaseInfra VNet (vnet.bicep emits this only when vpnGatewayEnabled=true).') +param gatewaySubnetId string + +@description('VPN Gateway SKU. VpnGw2AZ is the smallest Generation2 AZ SKU that reliably serves OpenVPN+AAD. VpnGw1AZ (Generation1) is excluded: Microsoft no longer accepts non-AZ SKUs for new gateways, AND the AZ Generation1 SKU has been observed to silently drop OpenVPN HardResetClientV2 packets ~5s after TCP accept, with no diagnostic event other than a P2SDiagnosticLog "Connection Initialized" + 5s-later "Disconnect Successful". Generation2 (VpnGw2AZ+) does not exhibit this. Basic SKU is also excluded — no OpenVPN/AAD support.') +@allowed([ + 'VpnGw2AZ' + 'VpnGw3AZ' + 'VpnGw4AZ' + 'VpnGw5AZ' +]) +param vpnGatewaySku string = 'VpnGw2AZ' + +@description('CIDR block from which VPN clients receive addresses once connected. Must not overlap the VNet address space or any on-prem range reachable via the VPN. /24 gives ~250 concurrent clients.') +param vpnClientAddressPool string = '172.16.200.0/24' + +@description('Microsoft Entra (AAD) tenant ID used for VPN client authentication. Threaded from the deploy-time AZURE_TENANT_ID by main.bicep.') +param tenantId string + +@description('AAD application audience (clientId) the VPN clients request a token for. Default is the public Azure VPN client app ID (c632b3df-...) which is the universally available choice and avoids per-tenant app-registration plumbing. Override only if you have registered a custom Azure VPN app (e.g. legacy a21fb3a6-... / 41b23e61-... values for older client builds).') +param vpnAadAudience string = 'c632b3df-fb67-4d84-bdcf-b95ad541b5c8' + +@description('Resource ID of the stamp Log Analytics workspace that receives gateway tunnel + RouteDiagnostic logs and metrics. Empty string disables the diagnostic setting.') +param logAnalyticsWorkspaceId string = '' + +// Note on P2S client DNS: the classic Microsoft.Network/virtualNetworkGateways +// resource has NO DNS-push property of its own (verified against the live +// ARM schema for api-version 2024-01-01 → 2025-07-01). DNS servers pushed to +// P2S clients come from the parent VNet's `dhcpOptions.dnsServers`. See +// vnet.bicep `dnsServers` param and main.bicep which threads the Private DNS +// Resolver inbound endpoint IP when VPN ingress is enabled. + +@description('Resource tags to apply to gateway resources.') +param tags object = {} + +// --------------------------------------------------------------------------- +// VPN Gateway Public IP frontend. +// +// Zone-redundancy: AZ VPN gateway SKUs (VpnGw2AZ / VpnGw3AZ / ...) +// REQUIRE their associated Public IP to be zone-redundant. Without +// `zones: ['1', '2', '3']` Azure rejects the gateway deployment with +// VmssVpnGatewayPublicIpsMustHaveZonesConfigured. Since the gateway SKU +// list is AZ-only (non-AZ SKUs are deprecated and Generation1 SKUs are +// excluded — see vpnGatewaySku @description), the PIP is unconditionally +// zone-redundant. +// +// SKU: Standard + Static is required for RouteBased gateways (Basic PIPs +// are not supported). +// --------------------------------------------------------------------------- +resource vpnGatewayPip 'Microsoft.Network/publicIPAddresses@2024-01-01' = { + name: '${resourceName}-vpngw-pip' + location: location + tags: tags + sku: { + name: 'Standard' + tier: 'Regional' + } + zones: [ + '1' + '2' + '3' + ] + properties: { + publicIPAllocationMethod: 'Static' + publicIPAddressVersion: 'IPv4' + } +} + +// --------------------------------------------------------------------------- +// Azure VPN Gateway (RouteBased, AAD-authenticated, OpenVPN). +// +// Key choices baked in here (do not parameterise without revisiting Spec): +// - gatewayType=Vpn, vpnType=RouteBased — AAD auth requires RouteBased. +// - vpnGatewayGeneration=Generation2 — Generation1 (VpnGw1AZ) has been +// observed to silently drop OpenVPN+AAD HardResetClientV2 packets ~5s +// after TCP accept; Generation2 (VpnGw2AZ+) is the supported floor. +// - activeActive=false, enableBgp=false — P2S-only deployment. +// - vpnClientProtocols=['OpenVPN'] — only protocol that supports AAD auth. +// - vpnAuthenticationTypes=['AAD'] — no certificate/RADIUS fallback. +// - aadTenant / aadIssuer use the standard v1 issuer format; switching to +// v2 (https://login.microsoftonline.com/${tenantId}/v2.0) requires the +// audience to also be migrated — current default audience is v1. +// --------------------------------------------------------------------------- +resource vpnGateway 'Microsoft.Network/virtualNetworkGateways@2024-01-01' = { + name: '${resourceName}-vpngw' + location: location + tags: tags + properties: { + gatewayType: 'Vpn' + vpnType: 'RouteBased' + vpnGatewayGeneration: 'Generation2' + activeActive: false + enableBgp: false + sku: { + name: vpnGatewaySku + tier: vpnGatewaySku + } + ipConfigurations: [ + { + name: 'vnetGatewayConfig' + properties: { + privateIPAllocationMethod: 'Dynamic' + subnet: { + id: gatewaySubnetId + } + publicIPAddress: { + id: vpnGatewayPip.id + } + } + } + ] + vpnClientConfiguration: { + vpnClientAddressPool: { + addressPrefixes: [ + vpnClientAddressPool + ] + } + vpnClientProtocols: [ + 'OpenVPN' + ] + vpnAuthenticationTypes: [ + 'AAD' + ] + // Public-cloud AAD endpoints. The default vpnAadAudience GUID + // (c632b3df-...) is registered against the public-cloud Azure VPN client + // app and is paired with login.microsoftonline.com + sts.windows.net. + // Sovereign clouds require a different audience GUID and different + // endpoint hosts; switching to environment().authentication.loginEndpoint + // here without also re-deriving the audience would break authentication. + // If sovereign-cloud support is needed, parameterise the full + // (audience, tenantUri, issuerUri) tuple together rather than fan out + // these strings to environment(). + #disable-next-line no-hardcoded-env-urls + aadTenant: 'https://login.microsoftonline.com/${tenantId}/' + aadAudience: vpnAadAudience + #disable-next-line no-hardcoded-env-urls + aadIssuer: 'https://sts.windows.net/${tenantId}/' + } + } +} + +// --------------------------------------------------------------------------- +// Diagnostic settings — ship gateway tunnel + IKE + RouteDiagnostic logs and +// platform metrics to the stamp Log Analytics workspace. Mirrors the AppGw +// diagnostic-settings pattern in application-gateway.bicep:440-458. +// --------------------------------------------------------------------------- +resource vpnGatewayDiagnostics 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = if (!empty(logAnalyticsWorkspaceId)) { + name: 'vpngw-diagnostics' + scope: vpnGateway + properties: { + workspaceId: logAnalyticsWorkspaceId + logs: [ + { + categoryGroup: 'allLogs' + enabled: true + } + ] + metrics: [ + { + category: 'AllMetrics' + enabled: true + } + ] + } +} + +// --------------------------------------------------------------------------- +// Outputs — emitted for anyone scripting around the gateway. The orchestrator +// does not render a postdeploy block, but `vpnGatewayPublicIp` and +// `vpnGatewayId` are useful for ad-hoc verification / client-profile fetch. +// --------------------------------------------------------------------------- +output vpnGatewayId string = vpnGateway.id +output vpnGatewayName string = vpnGateway.name +output vpnGatewayPublicIp string = vpnGatewayPip.properties.ipAddress +output vpnGatewayPublicIpId string = vpnGatewayPip.id diff --git a/deploy/services/global-infra/bicep/frontdoor-profile.bicep b/deploy/services/global-infra/bicep/frontdoor-profile.bicep index 93376f22..4d00313d 100644 --- a/deploy/services/global-infra/bicep/frontdoor-profile.bicep +++ b/deploy/services/global-infra/bicep/frontdoor-profile.bicep @@ -99,6 +99,9 @@ output frontDoorEndpointName string = frontDoorEndpoint.name @description('Front Door endpoint resource ID.') output frontDoorEndpointId string = frontDoorEndpoint.id +@description('Front Door profile ID GUID (the value Azure compares against the X-Azure-FDID origin header to prove the request came from this AFD profile). Surfaced so per-stamp AppGw WAF custom rules can allow-list it when VPN ingress is enabled (hybrid AFD+VPN trusted-bypass pattern).') +output frontDoorId string = frontDoorProfile.properties.frontDoorId + // ============================================================================== // Diagnostic settings — ship Front Door access, health-probe, and WAF logs to // the global Log Analytics workspace so we can correlate edge requests with diff --git a/deploy/services/global-infra/bicep/main.bicep b/deploy/services/global-infra/bicep/main.bicep index 53e38a61..462f9d3a 100644 --- a/deploy/services/global-infra/bicep/main.bicep +++ b/deploy/services/global-infra/bicep/main.bicep @@ -141,3 +141,6 @@ output logAnalyticsWorkspaceId string = LogAnalytics.outputs.workspaceId @description('Global Log Analytics workspace name.') output logAnalyticsWorkspaceName string = LogAnalytics.outputs.workspaceName + +@description('Front Door profile ID GUID (the value Azure compares against the X-Azure-FDID origin header). Threaded into base-infra deploys (as FRONT_DOOR_ID via the default aliasFor() rule in deploy-bicep.mjs) so AppGw WAF custom rules can allow-list AFD when VPN ingress is enabled.') +output frontDoorId string = frontDoorProfile.outputs.frontDoorId diff --git a/docs/deploying-to-aks.md b/docs/deploying-to-aks.md index 9bf805a3..0d8ff736 100644 --- a/docs/deploying-to-aks.md +++ b/docs/deploying-to-aks.md @@ -35,13 +35,16 @@ containers. Public entry points: ### Topology Matrix -The IaC path supports a `(EDGE_MODE × TLS_SOURCE)` matrix. Four -combinations are supported; two are blocked: +The IaC path supports an `(EDGE_MODE × TLS_SOURCE)` matrix plus the +optional `VPN_GATEWAY_ENABLED` axis. Five combinations are supported; +the rest are blocked at preflight (see [Unsupported Combinations](#unsupported-combinations) +and the named diagnostic codes in [Optional: VPN Gateway P2S](#optional-vpn-gateway-p2s-hybrid-afd--vpn)): | `EDGE_MODE` | `TLS_SOURCE` | Edge ingress | Cert source | Notes | |-------------|------------------|-----------------------------------------------|--------------------------------------------------------|------------------------------------------------| | `afd` | `letsencrypt` | AFD → AppGw (Private Link) + AGIC | cert-manager + Let's Encrypt prod (HTTP-01) | OSS default. Zero CA setup. | | `afd` | `akv` | AFD → AppGw (Private Link) + AGIC | OneCertV2-PublicCA via AKV (registered automatically) | enterprise default. BYO public CA. | +| `afd` + VPN | `akv` | AFD → AppGw (Private Link) **and** Azure VPN Gateway P2S → same AppGw private FE | OneCertV2-PublicCA via AKV (shared with AFD path) | Hybrid trusted-bypass; opt-in via `VPN_GATEWAY_ENABLED=true`. AKV-only. See [Optional: VPN Gateway P2S](#optional-vpn-gateway-p2s-hybrid-afd--vpn). | | `private` | `akv` | AKS web-app-routing addon (NGINX) + ILB | OneCertV2-PrivateCA via AKV (registered automatically) | Enterprise / AME. No AFD, no AppGw, no AGIC. | | `private` | `akv-selfsigned` | AKS web-app-routing addon (NGINX) + ILB | AKV `Self` issuer (auto-generated, in-place) | No CA; private-VNet smoke tests. | @@ -169,6 +172,219 @@ zone is **not** publicly resolvable. `new-env.mjs` and `deploy.mjs` both refuse these combos at preflight. +### Optional: VPN Gateway P2S (hybrid AFD + VPN) + +The IaC path supports an optional, additive Azure VPN Gateway +(Point-to-Site, OpenVPN protocol, Microsoft Entra ID authentication) that +coexists with `EDGE_MODE=afd`. The VPN tunnel terminates inside the stamp +VNet and reaches the **same AppGw private listener** as the AFD path, with +the **same AKV cert** — so an allow-listed user (through AFD) and an +off-allow-list authenticated user (through VPN) both reach +`https://.` (e.g. +`pschkrawvpn-wus3-portal.dev.pilotswarm.azure.com`) and observe an identical +cert chain. The Private DNS A record is keyed on `PORTAL_RESOURCE_NAME` so +it matches the AppGw listener hostname and AKV cert subject; +`RESOURCE_PREFIX` alone is only a backwards-compat fallback when +`PORTAL_RESOURCE_NAME` is empty. This is the "trusted-bypass" pattern for tenant users with a +valid Entra ID token who would otherwise be blocked at the public edge by +operator-defined AFD WAF allow-lists (typically service-tag, IP-range, or +header-based rules that gate the public ingress to a known managed-network +population). VPN is **not** a replacement for any existing edge mode. + +#### Architecture + +``` + (allow-listed public user) + ────────────► AFD Premium + │ (Private Link) + ▼ + AppGw v2 (private FE, WAF_v2) + │ (single AKV cert) + ▼ + AKS portal pod + ▲ + │ + (off-allow-list authenticated user) + ────────────► Azure VPN Gateway P2S + Entra ID + MFA (GatewaySubnet, OpenVPN) +``` + +Both paths share one AppGw, one WAF policy, one cert. The AppGw WAF +custom-rules pipeline disambiguates them at L7 (see "WAF guard rules" +below). + +#### Required preconditions + +- `EDGE_MODE=afd` + `TLS_SOURCE=akv` — `validateVpnGatewayCombo()` in + `deploy/scripts/lib/overlay-contracts.mjs` requires AFD (code: + `vpn-requires-afd`) and an AKV-family `TLS_SOURCE` — `akv` or + `akv-selfsigned` (code: `vpn-requires-akv`); `letsencrypt` is + rejected because ACME HTTP-01 cannot reach a VPN-only client. The + `akv-selfsigned` variant is **also** rejected end-to-end on AFD stamps + by `UNSUPPORTED_COMBOS` in `deploy/scripts/deploy.mjs` (AFD rejects + self-signed origin certs at TLS validation), so the only effective + combo for the AFD+VPN trusted-bypass is `EDGE_MODE=afd` + + `TLS_SOURCE=akv`. `private` mode is rejected because the auto-seeded + WAF guards assume AFD as the public ingress. +- `SSL_CERT_DOMAIN_SUFFIX` must be set — the managed Private DNS zone + uses it (code: `vpn-requires-domain-suffix`). +- `VPN_CLIENT_ADDRESS_POOL` must not overlap the VNet `VNET_CIDR` + (default `10.20.0.0/16`); validated against both endpoints (code: + `vpn-pool-overlap`). Default pool is `172.16.200.0/24` (~250 concurrent + clients). +- `AZURE_TENANT_ID` must be set — VPN runs in the **same Entra ID tenant** + as the rest of the stamp (code: `vpn-requires-tenant-id`). +- Tenant admin access to author the Conditional Access policy below. + +#### Env vars + +| Var | Default | Notes | +|---|---|---| +| `VPN_GATEWAY_ENABLED` | `false` | Master switch. `true` provisions the gateway + GatewaySubnet + managed Private DNS zone + Azure Private DNS Resolver inbound endpoint (with its own `/28` subnet), and seeds the WAF guard rules. The Resolver IP is advertised to P2S clients via the parent VNet's `dhcpOptions.dnsServers` (the supported P2S DNS-push path) so the portal hostname resolves through the tunnel without hosts-file edits. | +| `VPN_GATEWAY_SKU` | `VpnGw2AZ` | `VpnGw2AZ`, `VpnGw3AZ`, `VpnGw4AZ`, or `VpnGw5AZ`. Generation2 AZ SKUs only. `VpnGw1AZ` (Generation1) is excluded — silently drops OpenVPN+AAD HardResetClientV2 packets ~5s after TCP accept with no diagnostic event. Non-AZ SKUs (`VpnGw1/2/3`) and Basic are also rejected. | +| `VPN_CLIENT_ADDRESS_POOL` | `172.16.200.0/24` | CIDR assigned to connected clients. MUST NOT overlap the VNet. | +| `VPN_AAD_AUDIENCE` | `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` | Microsoft-registered Azure VPN Client app. Set to `41b23e61-6c1e-4545-b367-cd054e0ed4b4` only on tenants that must interop with older Azure VPN client builds. | +| `APPGW_WAF_CUSTOM_RULES_FILE` | unset | See [AppGw WAF custom rules](#appgw-waf-custom-rules-applies-to-any-afd-stamp) below — a general-purpose facility, not VPN-specific. | + +#### Conditional Access policy (tenant admin, required before first connect) + +Create one CA policy targeting the Azure VPN Client app: + +- **Target app**: `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` (default), or + `41b23e61-6c1e-4545-b367-cd054e0ed4b4` if you set `VPN_AAD_AUDIENCE` to + the legacy override. +- **Assignment**: a NAMED users group. Do not target "all users" — every + Entra principal in your tenant would otherwise inherit VPN reachability + to the AppGw private FE. +- **Grant**: require **MFA**. +- **Grant**: do **NOT** require device compliance — the OpenVPN client + cannot satisfy that grant and connect attempts will fail with an opaque + AAD error. + +The post-scaffold reminder block in `new-env.mjs` re-prints these +requirements when `VPN_GATEWAY_ENABLED=true`. The full skill-side +reference lives in `pilotswarm-new-env-deploy` (`Step 4 → Optional: VPN +Gateway P2S`). + +#### Distributing the VPN client profile + +After the first deploy completes, hand operators the OpenVPN client +profile via one of these paths (in order of preference): + +- **Helper script (recommended)**: + `pwsh -File deploy/scripts/auth/Get-VpnClientProfile.ps1 -EnvName `. + Wraps the `az network vnet-gateway vpn-client generate` call, downloads + the signed zip, and extracts it under the gitignored + `deploy/envs/local//vpn-client/` folder. See the + `pilotswarm-vpn-client-profile` skill for full usage. +- **Azure portal**: `Resource group → → Point-to-site + configuration → Download VPN client`. +- **CLI**: `az network vnet-gateway vpn-client generate + --resource-group --name + --authentication-method EAPTLS`. + +All three emit the same `.zip` that imports directly into the Azure VPN +Client app (Windows / macOS / iOS / Android). The profile embeds the +AAD audience and the gateway public IP — re-issue it (`-Force` on the +helper script) if you rotate `VPN_AAD_AUDIENCE`. The XML carries no +per-user credentials; end users authenticate with their own Entra ID +at connect time. + +#### WAF guard rules (auto-seeded at priorities 90 / 91 / 92) + +When `VPN_GATEWAY_ENABLED=true`, base-infra bicep +(`deploy/services/base-infra/bicep/application-gateway.bicep`) prepends +three custom rules to the AppGw WAF policy's `customRules.rules` array: + +| Priority | Name | Action | Match | +|---|---|---|---| +| 90 | `AllowAfd` | `Allow` | `RequestHeaders[X-Azure-FDID] == ` (threaded from `global-infra`) | +| 91 | `AllowVpn` | `Allow` | `RemoteAddr ∈ VPN_CLIENT_ADDRESS_POOL` (`IPMatch`) | +| 92 | `BlockOther` | `Block` | `RemoteAddr ∈ 0.0.0.0/0` (catch-all) | + +Together: AFD-origin-authenticity check (90) + VPN-pool allow-list (91) + +catch-all block (92). The catch-all is belt-and-braces against future +NSG/route changes accidentally exposing the AppGw private FE outside +both paths. + +The 90–92 priority band is **reserved**. Operator-supplied rules from +`APPGW_WAF_CUSTOM_RULES_FILE` (next section) MUST start at priority ≥ 100; +the bicep concats them after the auto-seeded set. + +When `VPN_GATEWAY_ENABLED=false`, no auto-seeded rules are emitted — a +VPN-off stamp's AppGw WAF policy diff vs a no-VPN baseline is empty. + +#### AppGw WAF custom rules (applies to any AFD stamp) + +`APPGW_WAF_CUSTOM_RULES_FILE` is a **general-purpose** facility, parallel +to the AFD-side `WAF_CUSTOM_RULES_FILE`, and works on any +`EDGE_MODE=afd` stamp regardless of whether VPN is enabled. Point it at a +JSON array file containing AppGw WAF custom rule objects; the orchestrator +resolves the path (relative-to-repo-root or absolute), parses the JSON, +and threads it into the bicep deploy as `appgwWafCustomRules`. + +```bash +# Recommended location (gitignored under deploy/envs/local//): +APPGW_WAF_CUSTOM_RULES_FILE=deploy/envs/local//appgw-waf-custom-rules.json +``` + +```jsonc +// Example contents — operator priorities start at 100 to leave +// 90–92 free for the auto-seeded VPN guards. +[ + { + "name": "BlockExploitUA", + "priority": 100, + "ruleType": "MatchRule", + "action": "Block", + "matchConditions": [ + { + "matchVariables": [ + { "variableName": "RequestHeaders", "selector": "User-Agent" } + ], + "operator": "Contains", + "matchValues": ["sqlmap", "nikto"] + } + ] + } +] +``` + +The merge logic mirrors the bicep exactly in +`deploy/scripts/lib/appgw-waf-rules.mjs` so test cases can assert merged +rule shape without shelling out to `az`. Missing-file or non-JSON-array +inputs fail loudly at preflight with a single named diagnostic. + +#### Cost and time + +- **Cost**: ~$450/month total for a VPN-enabled stamp: + - ~$280/month for `VpnGw2AZ` (Public IP + gateway hours). Higher AZ SKUs + (`VpnGw3AZ` / `VpnGw4AZ` / `VpnGw5AZ`) scale linearly — see Azure VPN + Gateway pricing for current rates. `VpnGw1AZ` (Generation1) and non-AZ + SKUs (`VpnGw1/2/3`) are excluded — see the SKU notes above. + - ~$170/month for the Azure Private DNS Resolver inbound endpoint, which + is co-provisioned with the VPN gateway so P2S clients can resolve + Private DNS Zone records (e.g. the portal hostname) through the tunnel + without hosts-file edits. P2S clients cannot reach the Azure-magic + `168.63.129.16` resolver — that IP only works from inside Azure VMs — + so the Resolver inbound endpoint sits on a regular VNet IP. P2S clients + inherit this IP via the parent VNet's `dhcpOptions.dnsServers` block, + which the gateway pushes at connect time (the classic VPN gateway + resource has no dedicated DNS-push property — VNet DHCP options are the + supported path). +- **First-deploy time**: **45+ minutes**. Gateway provisioning is the + long pole; the rest of the stamp finishes well before the gateway + reports `Succeeded`. Subsequent param-change deploys are minutes, not + 45+. + +#### See also + +- `pilotswarm-new-env-deploy` skill — full step-by-step including the + `EDGE_MODE × TLS_SOURCE × VPN_GATEWAY_ENABLED` matrix, scaffolder + prompts, and the post-scaffold reminder. +- `pilotswarm-aks-deploy` skill — legacy bash deploy path; surfaces VPN + cost / time as operator context but does not orchestrate VPN itself. + ### Model Providers (LLM catalog) The IaC path mounts the worker's model catalog as a kustomize-generated diff --git a/docs/proposals/vpn-access-management.md b/docs/proposals/vpn-access-management.md new file mode 100644 index 00000000..3f08a449 --- /dev/null +++ b/docs/proposals/vpn-access-management.md @@ -0,0 +1,246 @@ +# Proposal: VPN access management via per-stamp custom audience app + +**Status:** Proposed (follow-up to PR #53 "VPN P2S Ingress") +**Author:** chkraw +**Date:** 2026-06-18 +**Related:** [`docs/deploying-to-aks.md`](../deploying-to-aks.md) → "Optional: VPN Gateway P2S", [`docs/portal-entra-app-roles.md`](../portal-entra-app-roles.md), `.github/skills/pilotswarm-portal-auth-assignments/`, `deploy/scripts/auth/Set-PortalAuthAssignments.ps1` + +## Problem + +The VPN P2S ingress shipped in PR #53 uses the **Microsoft-registered Azure VPN Client app** (`c632b3df-fb67-4d84-bdcf-b95ad541b5c8`) as the audience. Access control is therefore delegated entirely to a tenant-level **Conditional Access policy** that targets that app and restricts assignment to a named group + requires MFA. + +This works, but creates two operational gaps: + +1. **Resource owners (deployers) cannot manage who has VPN access without a tenant admin.** Creating or modifying CA policies requires `Conditional Access Administrator` / `Security Administrator` / `Global Administrator`. In locked-down corporate tenants (e.g. Microsoft), deployers do not have these roles. Every "add a new VPN user" request has to round-trip through central IT. +2. **Portal and VPN access drift apart.** The portal is gated by per-stamp Entra app-role assignments (`admin`/`user`), managed by the deployer via `Set-PortalAuthAssignments.ps1`. VPN access is gated by a tenant-admin-owned CA-targeted group. Granting a user "access to PilotSwarm" therefore requires two separate workflows owned by two different actors. There is no symmetry, no mirroring, and no obvious place for an agent or skill to "grant access" atomically. + +The user's exact phrasing: *"doesn't make sense to have either in isolation."* + +## Goal + +Bring VPN access management under the same model and tooling shape as portal access management: + +- **Per-stamp Entra app**, owned by the deployer. +- **Direct user/group assignments** on the service principal, gated by `appRoleAssignmentRequired=true`. +- **No tenant admin required** for ongoing add/remove operations. +- **CA policy becomes truly optional** — tenant admins can still layer MFA / device compliance / location on top, but it is no longer a *gating prerequisite* for the deployment to function. +- **Mirror semantics:** granting portal access also grants VPN access by default (configurable opt-out). + +## Background — verified Microsoft Learn pattern + +Azure VPN Gateway officially supports a **custom audience app** as a first-class alternative to the Microsoft-registered or manually-registered audience values. Source: + +| Doc | What it confirms | +|---|---| +| [Configure P2S VPN gateway for Microsoft Entra ID authentication: Microsoft-registered client](https://learn.microsoft.com/en-us/azure/vpn-gateway/point-to-site-entra-gateway) | Three supported audience types: Microsoft-registered, Manually-registered, **Custom (``)**. The Audience field on the gateway explicitly accepts a custom value. | +| [Scenario: Configure P2S access based on users and groups](https://learn.microsoft.com/en-us/azure/vpn-gateway/point-to-site-entra-users-access) | Documents the exact workflow: register custom app per gateway → expose API + scope → authorize the Azure VPN Client as a known client → `Assignment required = Yes` on the SP → assign users/groups directly. | +| [Create custom app ID for P2S VPN](https://learn.microsoft.com/en-us/azure/vpn-gateway/point-to-site-entra-register-custom-app) | Exact app config: single-tenant, "Expose an API" with a scope, "Add a client application" pointing at `c632b3df-fb67-4d84-bdcf-b95ad541b5c8` as the authorized client. The custom app's Application (client) ID becomes the gateway audience. | + +### Key facts from the docs + +- The Microsoft-registered Azure VPN Client app (`c632b3df-…`) has **global tenant consent** — no per-tenant admin consent is needed to use it as a client of the custom audience app. +- The custom app does **not** itself authenticate users — the Azure VPN Client desktop app is still the OIDC client. The custom app just provides the audience + scope + assignment gate. +- Each VPN gateway can support **exactly one** audience value at a time. +- **Nested groups are not supported** for assignments. Direct user or direct group only. (Same constraint as portal app-role assignments — uniform.) +- Initial app registration requires the **Cloud Application Administrator** role (or higher). This is the same prerequisite as `Setup-PortalAuth.ps1` — deployers who can register the portal app can register the VPN audience app. + +## Design + +### New scripts + +**`deploy/scripts/auth/Setup-VpnAuth.ps1`** — one-shot, per-stamp custom audience app registration. Companion to `Setup-PortalAuth.ps1`. + +Responsibilities: +- Register single-tenant Entra app (`PilotSwarm VPN - `). +- Add an exposed API scope (e.g. `p2s-vpn`). +- Add the Microsoft-registered Azure VPN Client (`c632b3df-…`, or the per-cloud equivalent) as an authorized client application against that scope. +- Create the service principal and set `appRoleAssignmentRequired=true`. +- Set the deploying user as owner. +- Write a per-stamp summary to `deploy/envs/local//vpn-app.json` (mirrors `entra-app.json`). + +Output the custom app's `clientId` so it can be threaded into `VPN_AAD_AUDIENCE` for the bicep layer. + +**`deploy/scripts/auth/Set-VpnAccess.ps1`** — ongoing add/remove/list. Companion to `Set-PortalAuthAssignments.ps1`. + +Responsibilities (identical shape to the portal-assignments script): +- `-EnvName ` auto-discovers the app from `deploy/envs/local//vpn-app.json`. +- `-AppId / -ObjectId` explicit override. +- `-VpnUsers ` to add (UPN, object id, or group display name). +- `-Remove` to remove the same identifiers. +- `-List` to show current assignments. +- Idempotent — already-assigned principals are no-ops. + +### Modified scripts + +**`deploy/scripts/auth/Set-PortalAuthAssignments.ps1`** — gains `-MirrorToVpn` flag. + +- **Default ON** when `VPN_GATEWAY_ENABLED=true` and a `vpn-app.json` exists for the stamp. +- On add: after adding to `admin` / `user` on portal, also adds the principal to the VPN custom app's assignments. +- On remove: removes from VPN assignments **only if** the principal has no remaining portal role (so demoting `admin → user` does not strip VPN). +- Opt-out via `-NoMirror`. + +### Modified bicep / orchestrator + +**`deploy/services/base-infra/bicep/vpn-gateway.bicep`** — already accepts `vpnAadAudience` as a parameter. No bicep change needed beyond what is already shipped. + +**`deploy/scripts/deploy.mjs` / `new-env.mjs` / env templates:** +- New env var `VPN_AAD_CLIENT_ID` (the custom app's `appId`) feeds `VPN_AAD_AUDIENCE` when set. +- When `VPN_AAD_CLIENT_ID` is unset, the deploy keeps the Microsoft-registered audience and the CA-required model (backwards compatible). +- `new-env.mjs` post-scaffold reminder block adapts: when custom audience is configured, the CA-policy guidance becomes "optional MFA layering" instead of "REQUIRED before first connect." + +### New skill + +**`.github/skills/pilotswarm-vpn-access/SKILL.md`** — parallel to `pilotswarm-portal-auth-assignments`. Documents: +- When to use (initial bring-up, ongoing add/remove, listing access). +- When NOT to use (legacy CA-policy-driven stamps, VPN disabled). +- Identifier formats (UPN, object id, group display name). +- Mirror behavior and how it interacts with portal access. +- Worked examples for common workflows. + +### Modified skill + +**`.github/skills/pilotswarm-portal-auth-assignments/SKILL.md`** — adds a "VPN mirror" section, cross-links to `pilotswarm-vpn-access`, documents the `-MirrorToVpn` / `-NoMirror` knobs. + +### Modified agent + +**`.github/agents/pilotswarm-npm-deployer.agent.md`** — gains a step: +- After `Setup-PortalAuth.ps1` (when present), if `VPN_GATEWAY_ENABLED=true` and no `vpn-app.json` exists, run `Setup-VpnAuth.ps1`. +- After `Set-PortalAuthAssignments.ps1`, the `-MirrorToVpn` default-ON behavior ensures VPN access is provisioned in lockstep. +- The existing CA-policy guidance is re-cast as **optional** when a custom audience is configured. + +## Mirror semantics — explicit rules + +| Portal op | VPN-app effect (default `-MirrorToVpn`) | +|---|---| +| Add user to `user` (no existing portal role) | Add user to VPN app | +| Add user to `admin` (no existing portal role) | Add user to VPN app | +| Add user to `admin` (already in `user`) | No change to VPN (already assigned) | +| Remove user from `user` (still in `admin`) | No change to VPN (still has portal role → still gets VPN) | +| Remove user from `admin` (still in `user`) | No change to VPN | +| Remove user from last portal role | Remove user from VPN app | +| `-List` on portal | Also lists VPN assignments (for parity visibility) | + +VPN-only access (someone needs VPN but no portal access) is fully supported — call `Set-VpnAccess.ps1` directly. Mirroring is one-directional (portal → VPN), not the reverse, to keep the rules simple and to preserve the "VPN-only contractor" use case. + +## Backwards compatibility + +| Stamp state | Behavior | +|---|---| +| `VPN_GATEWAY_ENABLED` unset / false | No change. Scripts and agent ignore VPN concerns. | +| `VPN_GATEWAY_ENABLED=true`, `VPN_AAD_CLIENT_ID` unset | No change. Deploy uses Microsoft-registered audience + CA-required model (current PR #53 behavior). | +| `VPN_GATEWAY_ENABLED=true`, `VPN_AAD_CLIENT_ID` set | New path. Custom audience app gates access. CA is optional. | +| Existing CA-required stamp wants to migrate | Run `Setup-VpnAuth.ps1` → re-run `deploy.mjs` to flip the gateway audience → re-issue VPN client profiles → users re-import. Document the migration. | + +## Out of scope + +- **Authoring Conditional Access policies programmatically.** Still tenant-admin only. The proposal explicitly does not try to bypass that — it sidesteps the CA-as-gating-mechanism dependency entirely. +- **Nested group support.** Microsoft does not support it for this scenario; do not pretend to. +- **Cross-cloud (Azure Government, Azure China) audience swap automation.** Document the per-cloud client IDs but do not auto-detect cloud in v1 — operators can pass the right value explicitly. +- **Multiple audiences per gateway.** Not supported by Azure. If a stamp needs distinct access groups (e.g. "admin-only VPN" + "user VPN"), the answer is multiple gateways, not multiple audiences on one gateway — out of scope for v1. + +## Open questions + +1. **Default for `-MirrorToVpn`** when both surfaces exist but the operator did not explicitly opt in. Current proposal: ON. Counter-arg: surprising side-effects. Recommend: ON with a one-line `Write-Host` notice on each mirror action so the operator sees what happened. +2. **What about portal-only contractors?** Today they get portal access via `Set-PortalAuthAssignments` and no VPN. Under default-ON mirror, they would also get VPN. Fix: a `-PortalOnly` flag on the portal-assignments script that suppresses mirroring for this specific call. +3. **Should `Setup-VpnAuth.ps1` be folded into `Setup-PortalAuth.ps1`** as a `-IncludeVpn` flag, instead of being a separate script? Pro: one entry point. Con: cohesion of two distinct app registrations, harder to migrate independently. Recommend: keep separate. +4. **Migration UX for existing PR #53 stamps.** Should `deploy.mjs` detect a stamp that was deployed with the 1P audience and offer to migrate, or should this be a manual operator step documented in the skill? Recommend: manual + documented in v1, automated migration as a v2 enhancement. + +## Implementation phases (preview for follow-up PAW) + +| Phase | Scope | +|---|---| +| 1 | `Setup-VpnAuth.ps1` + `Set-VpnAccess.ps1` + `vpn-app.json` schema | +| 2 | `Set-PortalAuthAssignments.ps1` `-MirrorToVpn` flag + tests | +| 3 | `deploy.mjs` / `new-env.mjs` env threading + scaffolder UX + tests | +| 4 | New skill, modified skill, modified agent — docs | +| 5 | Migration documentation for existing CA-required stamps | +| 6 | **iOS support via dual-auth (certificate)** — see next section | + +--- + +## iOS support requires certificate-based authentication (additive Phase 6) + +**Status:** Confirmed via web research 2026-06-18 — **there is no Azure VPN Client app on the iOS App Store**. Microsoft ships Azure VPN Client for Windows, macOS, and Android only. AAD-interactive authentication from iOS to Azure VPN Gateway is therefore **not possible at all** — not a CA policy issue, not a tenant issue, just absent product. + +This is not a single-user convenience; it is a platform limitation that affects every iOS user who needs P2S access to a PilotSwarm stamp. Any organization with an iPad/iPhone user population that needs portal access from outside corp/SAW will hit this immediately. + +### iOS-supported auth options on Azure VPN P2S + +| Path | iOS-native? | Works with our AAD-gated gateway? | +|---|---|---| +| Native iOS VPN with IKEv2 + **certificate auth** | YES (no app install) | Only if gateway has cert auth enabled alongside AAD | +| Third-party OpenVPN client + cert auth | App Store apps available (e.g. Passepartout) | Same — requires cert auth on gateway | +| Azure VPN Client (AAD) | **Not available on iOS** | N/A | + +### Dual-auth gateway pattern + +Azure VPN Gateway supports **multiple authentication methods simultaneously** in a single `vpnClientConfiguration` block — AAD config and `vpnClientRootCertificates[]` can coexist. Each end user picks which auth method to use when they import the profile. AAD users (Windows/macOS/Android) are completely unaffected; iOS users get a separate cert-auth profile. + +### Scope additions for Phase 6 + +| Piece | Notes | +|---|---| +| `vpn-gateway.bicep` — optional `vpnClientRootCertificates[]` block | Conditional on new `VPN_CERT_AUTH_ENABLED` flag. ~30 lines of bicep. | +| New env vars: `VPN_CERT_AUTH_ENABLED`, `VPN_ROOT_CERT_NAME` (AKV cert) or `VPN_ROOT_CERT_DATA` (raw base64) | Prefer AKV — we already have one for the AppGw cert. | +| `New-VpnRootCertificate.ps1` | Generates self-signed root, uploads to AKV, exports the public cert in the bicep-expected base64 form. Operators with a corp PKI root skip this. | +| `New-VpnClientCertificate.ps1 -UserName ` | Per-user client cert signed by the root, exported as password-protected PFX into `deploy/envs/local//vpn-client/clients/.pfx`. | +| `Get-VpnClientProfile.ps1 -AuthMethod cert -UserName ` | Extends the existing helper. Bundles the cert-auth profile zip + the per-user PFX in one drop. | +| **iOS sugar: `New-VpnIosMobileconfig.ps1 -UserName `** | Generates a signed Apple Configuration Profile (`.mobileconfig`) bundling the IKEv2 VPN config + client cert + CA root. End user taps once on the iPhone — VPN installed and ready. Without this, iOS cert VPN setup is a 15-step manual process and not appropriate for non-technical users. | +| `pilotswarm-vpn-client-profile` skill update | Document `-AuthMethod cert` and the iOS `.mobileconfig` path. | +| Cost | $0 additional Azure cost; PKI is operational overhead only. | + +### Critical access-control implications + +**Certificate-auth users completely bypass Entra ID.** This must be flagged loudly in every operator-facing surface: + +- No MFA challenge +- No Conditional Access enforcement +- No tenant audit trail of who connected when +- No revocation if the device is lost without re-issuing PKI (or maintaining a revocation list) + +This is by design and unavoidable for cert-based VPN — but it inverts the access-control story we built in Phase 1. **Cert auth should be the exception, not the default.** + +Required mitigations bundled with Phase 6: + +- **Short cert validity**: 30–90 day default, never indefinite +- **Per-device certificates**, not per-user — so loss of one device does not invalidate everyone with the same identity +- **Revocation list maintained in bicep** via `vpnClientRevokedCertificates`, with a `Revoke-VpnClientCertificate.ps1` helper to add a cert thumbprint and trigger a `deploy.mjs base-infra` step +- **AKV-stored root with HSM-protected private key** — losing the root would compromise every issued client cert +- **`pilotswarm-vpn-client-profile` skill warning block** rendered prominently whenever `-AuthMethod cert` is used + +### Phase 6 backwards compatibility + +| Stamp state | Behavior | +|---|---| +| `VPN_CERT_AUTH_ENABLED` unset / false | No change. AAD-only behavior from current PR + custom-audience proposal. | +| `VPN_CERT_AUTH_ENABLED=true`, `VPN_CERT_AUTH_ENABLED` env added | Bicep emits dual-auth gateway. Existing AAD profiles continue working untouched. | +| Add cert auth to a stamp deployed AAD-only | One `deploy.mjs base-infra` re-run (~minutes against an existing gateway, not 45+) flips the gateway into dual-auth mode. | + +### Why this is Phase 6 (last) and not Phase 1 + +- Phases 1–5 deliver the deployer-owned access-control story for the majority case (Windows/macOS/Android) which is what most PilotSwarm deployments hit. +- Phase 6 is genuinely additive — no changes to anything in Phases 1–5. +- The PKI operational story (cert issuance, rotation, revocation) is meaningful additional surface area that deserves its own focused PR with its own tests, rather than being bundled. +- The iOS mobileconfig polish in particular needs care — a malformed mobileconfig is a security incident, not a UX bug. + + +## Acceptance criteria + +- A deployer with **no tenant admin role** (only the same permissions they currently use for `Setup-PortalAuth.ps1`) can: + 1. Bring up a fresh VPN-enabled stamp with custom audience access control. + 2. Add and remove VPN users on demand without filing a tenant-admin ticket. +- Granting a user `admin` or `user` portal access by default also grants them VPN access (single command). +- An operator can explicitly grant VPN-only access via the dedicated script. +- Existing PR #53 stamps continue to work unchanged. +- CA-policy guidance is correctly re-cast as optional when custom audience is in play. + +## References + +- [Configure P2S VPN gateway for Microsoft Entra ID authentication: Microsoft-registered client](https://learn.microsoft.com/en-us/azure/vpn-gateway/point-to-site-entra-gateway) +- [Scenario: Configure P2S access based on users and groups](https://learn.microsoft.com/en-us/azure/vpn-gateway/point-to-site-entra-users-access) +- [Create custom app ID for P2S VPN Microsoft Entra ID authentication](https://learn.microsoft.com/en-us/azure/vpn-gateway/point-to-site-entra-register-custom-app) +- PR #53 — VPN P2S Ingress (Phases 1–4) +- `deploy/scripts/auth/Setup-PortalAuth.ps1` +- `deploy/scripts/auth/Set-PortalAuthAssignments.ps1` +- `.github/skills/pilotswarm-portal-app-reg/SKILL.md` +- `.github/skills/pilotswarm-portal-auth-assignments/SKILL.md` diff --git a/package.json b/package.json index 0eb55293..caba03ff 100644 --- a/package.json +++ b/package.json @@ -9,7 +9,7 @@ "scripts": { "deploy": "node deploy/scripts/deploy.mjs", "deploy:new-env": "node deploy/scripts/new-env.mjs", - "test:deploy-scripts": "node --test --experimental-test-module-mocks --no-warnings=ExperimentalWarning deploy/scripts/test/substitute-env.test.mjs deploy/scripts/test/alias-map.test.mjs deploy/scripts/test/all-mode.test.mjs deploy/scripts/test/approve-pe.test.mjs deploy/scripts/test/common.test.mjs deploy/scripts/test/local-env.test.mjs deploy/scripts/test/new-env.test.mjs deploy/scripts/test/overlay-contracts.test.mjs deploy/scripts/test/publish-manifests.test.mjs deploy/scripts/test/services-manifest.test.mjs deploy/scripts/test/bicep-outputs-cache.test.mjs deploy/scripts/test/deploy-marker.test.mjs deploy/scripts/test/dockerfile-lockfile.test.mjs deploy/scripts/test/force-module.test.mjs deploy/scripts/test/foundry-substitute.test.mjs deploy/scripts/test/spc-keys-hash.test.mjs deploy/scripts/test/render-params.test.mjs deploy/scripts/test/stage-manifests.test.mjs deploy/scripts/test/test-discovery.test.mjs deploy/scripts/test/validate-foundry-deployments.test.mjs deploy/scripts/test/compose-env.test.mjs", + "test:deploy-scripts": "node --test --experimental-test-module-mocks --no-warnings=ExperimentalWarning deploy/scripts/test/substitute-env.test.mjs deploy/scripts/test/alias-map.test.mjs deploy/scripts/test/all-mode.test.mjs deploy/scripts/test/approve-pe.test.mjs deploy/scripts/test/appgw-waf-rules.test.mjs deploy/scripts/test/common.test.mjs deploy/scripts/test/local-env.test.mjs deploy/scripts/test/new-env.test.mjs deploy/scripts/test/overlay-contracts.test.mjs deploy/scripts/test/publish-manifests.test.mjs deploy/scripts/test/services-manifest.test.mjs deploy/scripts/test/bicep-outputs-cache.test.mjs deploy/scripts/test/deploy-marker.test.mjs deploy/scripts/test/dockerfile-lockfile.test.mjs deploy/scripts/test/force-module.test.mjs deploy/scripts/test/foundry-substitute.test.mjs deploy/scripts/test/spc-keys-hash.test.mjs deploy/scripts/test/render-params.test.mjs deploy/scripts/test/stage-manifests.test.mjs deploy/scripts/test/test-discovery.test.mjs deploy/scripts/test/validate-foundry-deployments.test.mjs deploy/scripts/test/compose-env.test.mjs", "test:mcp-server": "npm test --workspace=pilotswarm-mcp-server", "test:mcp-server:integration": "npm run test:integration --workspace=pilotswarm-mcp-server", "test:mcp-server:integration:all": "npm run test:integration:all --workspace=pilotswarm-mcp-server",