feat(azure): delegate-domain support — per-service DNS records + cert provisioning#278
feat(azure): delegate-domain support — per-service DNS records + cert provisioning#278edwardrf wants to merge 7 commits into
Conversation
… provisioning Companion to defang/defang#2136. With that CLI change creating a public DNS zone for the project delegate domain and forwarding it to the CD task via the DOMAIN env var, this side adds the per-service plumbing inside the zone and the cert flow that runs in the CD task (cloud-side, surviving CLI Ctrl-C). Provider: - ProjectInputs.Domain (defang-azure:index:Project) carries the delegate domain through to the project component, mirroring defang-gcp's same input. SharedInfra.Domain propagates it to the per-service path. - provider/defangazure/azure/customdomain.go: per-ingress-service Pulumi-managed RecordSets in the existing CLI-created zone — a CNAME to the ContainerApp's stable env-default FQDN (NOT LatestRevisionFqdn, which moves with revisions), and an `asuid.<service>` TXT carrying the CA's customDomainVerificationId for ACA's hostname-ownership check. - containerapp.go keeps IgnoreChanges on configuration.ingress.customDomains because the binding itself is patched imperatively by aca.IssueCert (out-of-band from Pulumi) — the cycle CA → cert → asuid TXT → CA.verificationId can't be expressed in a single Pulumi resource graph. See the inline comments + defang/defangdomain.md for the analysis. CD task: - program/azure.go now reads `defang:domain` Pulumi config (set by the CD env var) and passes it through to defangazure.NewProject. After the Pulumi resources are created, an ApplyT chained off project.Endpoints iterates ingress services and calls aca.IssueCert (pkg/clouds/azure/aca in defang/src) to register the customDomain, issue the ManagedCertificate, bind SniEnabled, and wait for TLS. Putting this step inside the CD task means it survives the user disconnecting after `defang compose up` — production ACA-Job mode finishes the cert flow regardless of CLI lifetime. - cd/go.mod pins defang/src at the merged commit v0.0.0-20260601205120-0db5d59ad49b (defang/defang@0db5d59ad49b), which introduces pkg/clouds/azure/aca with the standalone IssueCert this CD code calls. Verified end-to-end on Azure westus3 with the nextjs sample: clean state → DNS zone (CLI) → records + cert + binding (CD) → HTTPS serves a DigiCert-issued cert for app.<delegate-domain> in ~0.6s. Total deploy including cert: 9m13s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds optional delegate-domain support across schema, SDK inputs, shared infra, Container App custom-domain DNS (CNAME + asuid TXT), preserves live bindings, and threads domain into CD to request managed certificates per ingress-enabled service. ChangesDelegate Domain & Custom Domain Provisioning
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cd/go.mod`:
- Line 17: The go.mod entry for github.com/DefangLabs/defang/src is pinned to a
pseudo-version (v0.0.0-20260601205120-0db5d59ad49b) that isn't present in the
fetched upstream history; replace that pseudo-version with a stable upstream ref
(an official tag or the real merge commit SHA) or update it to the final merge
SHA for github.com/DefangLabs/defang/src so module resolution is deterministic
and won't break when fetching.
In `@cd/program/azure.go`:
- Around line 102-118: provisionDelegateDomainCerts currently forwards the outer
ctx and silently proceeds when subscriptionID is empty; change it to first check
if subscriptionID == "" and log+return early, and for each service that needs
certs wrap the call in a bounded context like ctxTimeout, cancel :=
context.WithTimeout(ctx, 15*time.Minute); defer cancel() (or call cancel
immediately after the IssueCert call) and pass ctxTimeout into aca.IssueCert so
the ARM pollers can't run forever; reference function
provisionDelegateDomainCerts, parameters ctx, subscriptionID, resourceGroup,
loop variables name and svc, and call site aca.IssueCert to apply the new
timeout and ensure cancel is invoked.
In `@provider/defangazure/azure/customdomain.go`:
- Around line 26-35: Update the module wiring in examples/aws-go/go.mod so Go
can resolve the nested module
github.com/DefangLabs/pulumi-defang/sdk/v2/go/defang-aws: add a require for
github.com/DefangLabs/pulumi-defang/sdk/v2/go/defang-aws at the same version you
already require for github.com/DefangLabs/pulumi-defang/sdk/v2 (e.g., v2.1.1)
and add a corresponding replace pointing to the local path used in other
examples (follow the same require/replace pattern used by
examples/multi-cloud/go.mod and examples/config-interpolation/go.mod) so imports
in examples/aws-go/main.go can resolve the defang-aws package.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a5c75d00-62d0-4d59-a659-e389d12c7805
⛔ Files ignored due to path filters (2)
cd/go.sumis excluded by!**/*.sumgo.sumis excluded by!**/*.sum
📒 Files selected for processing (11)
cd/go.modcd/program/azure.gocd/program/program.gogo.modprovider/cmd/pulumi-resource-defang-azure/schema.jsonprovider/defangazure/azure/azure.goprovider/defangazure/azure/containerapp.goprovider/defangazure/azure/customdomain.goprovider/defangazure/azure/customdomain_test.goprovider/defangazure/project.gosdk/v2/go/defang-azure/project.go
|
Depends on: DefangLabs/defang#2136 |
Two changes responding to coderabbit's review on PR #278: 1. cd/program/azure.go: bound the cert-provisioning context per service and guard empty subscriptionID before reaching ARM. aca.IssueCert already bounds its DNS/TXT/TLS polling loops internally (dnsWaitTimeout=30m, tokenDeadline=5m, tlsWaitTimeout=10m), but the ARM long-running steps (addHostnameDisabled, submitManagedCert, bindHostnameSniEnabled) run `poller.PollUntilDone(ctx, nil)` and inherit only the passed ctx — so a stuck PATCH (throttled, busy resource) would block the Pulumi run until the outer timeout fires. A 45m per-service timeout fits the documented worst-case sum and trips faster than a stuck poll. Separately, deployAzure forwards config.GetSubscriptionId(ctx) unconditionally; when Pulumi config doesn't carry it, the ARM SDK clients we'd construct fail with an opaque URL parse error. Surface that early with a clear log instead of letting it surface at the wrong layer. 2. examples/aws-go/go.mod: align with the other Go examples' replace + require pattern so the example actually resolves. The module previously required `sdk/v2 v2.1.1` (the legacy aggregate module), but main.go imports the nested `sdk/v2/go/defang-aws` module. `go build ./...` was failing on main with "missing go.sum entry for module providing package …/sdk/v2/go/defang-aws" — pre-existing, not caused by this PR. Fixed by adopting the same in-tree replace directive used by examples/multi-cloud, examples/config-interpolation, and cd/. Verified `go mod tidy` + `go build ./...` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pushed
Build, test, and lint all clean across Still draft pending defang/defang#2136 merge → re-pin of the pseudo-version. |
defang/defang#2136 merged as commit 3dfee366. Update the pseudo-version in cd/go.mod from the feature-branch SHA (0db5d59ad49b) to the real main SHA so module resolution is deterministic and the regen CI doesn't have to chase an unmerged ref. Same code surface — aca.IssueCert hasn't changed since this PR was opened, just lives at a stable main commit now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
examples/aws-go/go.mod (1)
11-11: ⚡ Quick winUpdate: the
yaml.v3replace target is valid/resolvable
go mod downloadforgo.yaml.in/yaml/v3@v3.0.1succeeds (andgopkg.in/yaml.v3@v3.0.1also succeeds), with both resolving to the same upstream (https://github.com/yaml/go-yaml). So this isn’t a module-resolution break; only consider switching togopkg.in/yaml.v3for consistency/readability.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/aws-go/go.mod` at line 11, The replace directive in go.mod currently maps gopkg.in/yaml.v3 to go.yaml.in/yaml/v3 v3.0.1—update the replace target to the canonical module path gopkg.in/yaml.v3 v3.0.1 to improve readability/consistency; locate the replace line in examples/aws-go/go.mod (the mapping for gopkg.in/yaml.v3) and change the RHS from go.yaml.in/yaml/v3 v3.0.1 to gopkg.in/yaml.v3 v3.0.1.cd/go.mod (1)
31-31: ⚡ Quick winReconsider
go.yaml.in/yaml/v4 v4.0.0-rc.4incd/go.mod
cd/go.modpinsgo.yaml.in/yaml/v4 v4.0.0-rc.4, and the Go module proxy for this module lists onlyv4.0.0-rc.*versions (no non-RC v4 releases). If v4 RC-specific features aren’t required, consider switching back to stablegopkg.in/yaml.v3; otherwise keep the RC pin but document why v4 RC is needed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cd/go.mod` at line 31, The go.mod entry pins the release-candidate module "go.yaml.in/yaml/v4 v4.0.0-rc.4"; either revert to the stable v3 module or justify keeping the RC: replace the dependency with "gopkg.in/yaml.v3" if no v4 RC features are required (update any import paths/usages of yaml to match v3), or retain the "go.yaml.in/yaml/v4 v4.0.0-rc.4" pin but add a short comment in the repository (e.g., in README or a top-level docs/notes file) explaining which v4 RC features are required and why the RC is necessary; locate the module line "go.yaml.in/yaml/v4" in go.mod and update accordingly and ensure imports and tests compile after the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@cd/go.mod`:
- Line 31: The go.mod entry pins the release-candidate module
"go.yaml.in/yaml/v4 v4.0.0-rc.4"; either revert to the stable v3 module or
justify keeping the RC: replace the dependency with "gopkg.in/yaml.v3" if no v4
RC features are required (update any import paths/usages of yaml to match v3),
or retain the "go.yaml.in/yaml/v4 v4.0.0-rc.4" pin but add a short comment in
the repository (e.g., in README or a top-level docs/notes file) explaining which
v4 RC features are required and why the RC is necessary; locate the module line
"go.yaml.in/yaml/v4" in go.mod and update accordingly and ensure imports and
tests compile after the change.
In `@examples/aws-go/go.mod`:
- Line 11: The replace directive in go.mod currently maps gopkg.in/yaml.v3 to
go.yaml.in/yaml/v3 v3.0.1—update the replace target to the canonical module path
gopkg.in/yaml.v3 v3.0.1 to improve readability/consistency; locate the replace
line in examples/aws-go/go.mod (the mapping for gopkg.in/yaml.v3) and change the
RHS from go.yaml.in/yaml/v3 v3.0.1 to gopkg.in/yaml.v3 v3.0.1.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ea3139dd-b1c0-423a-998e-bd10c579cc79
⛔ Files ignored due to path filters (2)
cd/go.sumis excluded by!**/*.sumexamples/aws-go/go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
cd/go.modcd/program/azure.gocd/program/program.goexamples/aws-go/go.mod
🚧 Files skipped from review as they are similar to previous changes (2)
- cd/program/program.go
- cd/program/azure.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
provider/defangazure/azure/containerapp.go (1)
370-409: 💤 Low valueConsider logging when skipping ARM read due to missing subscription.
When
subscriptionIDis empty (lines 373-375), the function silently returnsnil. This is correct behavior but can complicate debugging during previews or misconfigured runs. A debug-level log could help operators understand why custom domains weren't preserved.That said, the existing code comment at lines 354-369 documents this behavior thoroughly, so this is a minor suggestion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@provider/defangazure/azure/containerapp.go` around lines 370 - 409, In readLiveCustomDomains, when subscriptionID == "" the ARM read is silently skipped; add a debug/info log inside the infra.ResourceGroup.Name.ApplyT closure before the early return to record that the custom-domain preserve read was skipped due to missing subscription (include serviceName and that subscriptionID is empty so previews/misconfigurations are evident); use the existing pulumi context logging API (e.g., ctx.Log.Debug/Info or the appropriate method available on pulumi.Context) and keep the subsequent nil return unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@provider/defangazure/azure/containerapp.go`:
- Around line 370-409: In readLiveCustomDomains, when subscriptionID == "" the
ARM read is silently skipped; add a debug/info log inside the
infra.ResourceGroup.Name.ApplyT closure before the early return to record that
the custom-domain preserve read was skipped due to missing subscription (include
serviceName and that subscriptionID is empty so previews/misconfigurations are
evident); use the existing pulumi context logging API (e.g., ctx.Log.Debug/Info
or the appropriate method available on pulumi.Context) and keep the subsequent
nil return unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 18def5d8-b46c-4c1b-8068-4e5eee61a109
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (2)
go.modprovider/defangazure/azure/containerapp.go
| github.com/pulumi/pulumi-azure-native-sdk/app/v3 v3.17.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/authorization/v3 v3.17.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/cognitiveservices/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/containerregistry/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/dbforpostgresql/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/dns/v3 v3.17.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/keyvault/v3 v3.17.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/managedidentity/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/network/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/privatedns/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/redisenterprise/v3 v3.16.0 // indirect | ||
| github.com/pulumi/pulumi-azure-native-sdk/resources/v3 v3.16.0 // indirect |
There was a problem hiding this comment.
some are v3.17.0 some are v3.16.0
| github.com/pulumi/pulumi-azure-native-sdk/cognitiveservices/v3 v3.16.0 | ||
| github.com/pulumi/pulumi-azure-native-sdk/containerregistry/v3 v3.16.0 | ||
| github.com/pulumi/pulumi-azure-native-sdk/dbforpostgresql/v3 v3.16.0 | ||
| github.com/pulumi/pulumi-azure-native-sdk/dns/v3 v3.17.0 | ||
| github.com/pulumi/pulumi-azure-native-sdk/keyvault/v3 v3.17.0 | ||
| github.com/pulumi/pulumi-azure-native-sdk/managedidentity/v3 v3.16.0 | ||
| github.com/pulumi/pulumi-azure-native-sdk/network/v3 v3.16.0 |
What
Companion to DefangLabs/defang#2136. With that CLI change creating a public Azure DNS zone for the project delegate domain and forwarding it to the CD task via the
DOMAINenv var, this PR adds the per-service plumbing inside that zone and the cert flow that runs in the CD task (cloud-side, surviving CLI Ctrl-C).Provider side
ProjectInputs.Domainon thedefang-azure:index:Projectcomponent carries the delegate domain through to the project component, mirroringdefang-gcp's same input.SharedInfra.Domainpropagates it to the per-service path.provider/defangazure/azure/customdomain.go— per-ingress-service Pulumi-managedRecordSets in the existing CLI-created zone:CNAME <service>→<containerApp.Name>.<env.DefaultDomain>(the stable env-default FQDN — NOTLatestRevisionFqdn, which moves on every revision).TXT asuid.<service>carrying the CA'scustomDomainVerificationIdfor ACA's hostname-ownership check.containerapp.gokeepsIgnoreChangesonconfiguration.ingress.customDomainsbecause the binding itself is patched imperatively byaca.IssueCert(out-of-band from Pulumi) — the cycle CA → cert → asuid TXT → CA.verificationId can't be expressed in a single Pulumi resource graph. See the inline comments +defang/defangdomain.mdfor the analysis.CD task side
cd/program/azure.gonow readsdefang:domainPulumi config (set by the CD env var) and passes it through todefangazure.NewProject. After the Pulumi resources are created, anApplyTchained offproject.Endpointsiterates ingress services and callsaca.IssueCert(frompkg/clouds/azure/acain defang/src) to register thecustomDomain, issue theManagedCertificate, bindSniEnabled, and wait for TLS.defang compose up— production ACA-Job mode finishes the cert flow regardless of CLI lifetime.cd/go.modpinsdefang/srcat the merged commitv0.0.0-20260601205120-0db5d59ad49b(defang/defang@0db5d59ad49b), which introducespkg/clouds/azure/acawith the standaloneIssueCertthis CD code calls. Before merging this PR, that pseudo-version should be re-pinned to whatever SHA #2136 lands at onmain(squash-merge will produce a new SHA; rebase preserves the current one).Blocked on
build_sdksjob for this PR will need to resolve the pseudo-version from the unmerged feature branch. The Go module proxy can fetch any reachable commit, so CI should still pass.Verification
E2E on Azure westus3 with the nextjs sample, fully fresh state:
Result:
https://app.nextjs-azurewestus3.edwardrf.defang.app/→ HTTP 200, valid DigiCert-issued cert, ~0.6s response. Total deploy including cert: 9m13s.🤖 Generated with Claude Code
Summary by CodeRabbit