fix(deploy): split wrangler config into two files to isolate staging/prod assets#88
Merged
Merged
Conversation
…assets Workers Assets shares uploaded asset content between two Workers in the same account when their compiled script etags match. A single multi-env wrangler.jsonc (top-level + env.staging block) produces byte-identical scripts for both Workers because the script source is the same and the binding values are not part of the script bytes. Result: a staging deploy that changes a file in dist/ silently overwrites what production serves at the same URL path, even though the production Worker has no new deploy. Filed upstream at cloudflare/workers-sdk#13925 with the full reproduction (version IDs, asset etags, md5s, sequence of events). Workaround until upstream lands a fix: deploy from two fully independent wrangler configs, and force a single-character divergence in the script bytes via wrangler's `define` so the two compiled scripts have distinct script etags. CF's asset deduplication appears to key on the script etag, so distinct etags give distinct asset namespaces. Changes - wrangler.jsonc: drop the env.staging block; this file now configures only the production agentnative-site Worker. Add `define` to substitute `__BUILD_ENV__` with the literal string "production". - wrangler.staging.jsonc: new file. Fully independent config for the agentnative-site-staging Worker. `define` substitutes `__BUILD_ENV__` with "staging". - src/worker/index.ts: declare `__BUILD_ENV__` and a `BUILD_ENV` constant with a `development` fallback for the bun test runner (where wrangler's define is not in effect). Pass through to applyHeaders. - src/worker/headers.ts: accept a `buildEnv` option and set the `X-Build-Env` response header on every response. The header is diagnostic (visible in every fetch) and also forces the script bytes to actually USE the substituted value, which prevents tree-shaking from collapsing the two environments back to byte-identical output. - tests/worker.test.ts: every applyHeaders call site updated with `buildEnv: 'development'`. Three new tests cover the X-Build-Env header for each env value. - .github/workflows/deploy.yml: staging job uses `--config wrangler.staging.jsonc`. Production job is unchanged (`wrangler.jsonc` is the default). - .github/workflows/ci.yml: image-pin-existence + equality guard now reads both configs via `--config`. - scripts/hooks/pre-push: wrangler dry-run stage covers both configs. - RELEASES.md: deploy table updated. New subsection "Why two separate wrangler configs (not a single multi-env file)" explains the workaround and links to workers-sdk#13925. Sandbox image release procedure now references the two filenames instead of env blocks. - docker/sandbox/README.md: pin location updated. - styles/config/vocabularies/site/accept.txt: legitimate technical terms (`configs`, `etag`, `etags`, `deduplication`, `namespaces`, `envs`, `CF's`) added so the new RELEASES.md prose passes Vale. Verification - 318 unit + regression tests pass. - Both `bun x wrangler deploy --dry-run --config wrangler.jsonc` and `--config wrangler.staging.jsonc` validate end-to-end. Bindings list the correct per-env resources (anc-score-cache vs anc-score-cache-staging, etc.). The two outputs differ by a single gzip byte, evidence that the compiled scripts now have distinct content. - prose-check: 0 blocking, 1108 warning after the vocab additions. - Pre-push gate passes end-to-end.
3 tasks
brettdavies
added a commit
that referenced
this pull request
May 15, 2026
## Summary Reverts PR #88. The split-config + `__BUILD_ENV__` substitution shipped as an asset-sharing fix, but the underlying observation it was meant to fix had a different cause. What actually triggered the revert: PR #88's staging deploy failed with `DURABLE_OBJECT_ALREADY_HAS_APPLICATION` (the new config produced a different container-app name than the existing app's, and a DO can only bind to one app). During that failed deploy, anc.dev's custom-domain binding moved from `agentnative-site` (prod) to `agentnative-site-staging`. Restoring anc.dev to prod via the CF API immediately reverted the served content to PR #85's deployed baseline, which suggests the "asset overlay" symptoms we saw earlier in the day were actually anc.dev routing drift to staging, not cross-environment asset sharing. Concretely, this revert restores: - Single `wrangler.jsonc` with `env.staging` block. The container app naming returns to `agentnative-site-staging-sandbox-staging` (env-suffixed), which matches the existing CF resource so deploys succeed again. - `src/worker/index.ts` and `src/worker/headers.ts` lose the `__BUILD_ENV__` / `X-Build-Env` plumbing. No diagnostic header on responses; tests stop asserting it. - `deploy.yml`, `ci.yml`, `scripts/hooks/pre-push`, `RELEASES.md`, `docker/sandbox/README.md` all go back to the multi-env shape. - `styles/config/vocabularies/site/accept.txt` loses the seven technical terms added for PR #88's prose (`configs`, `etag`, `etags`, `deduplication`, `namespaces`, `envs`, `CF's`). These were only added because PR #88's RELEASES.md prose needed them; without that prose, the additions are noise. The upstream issue at [cloudflare/workers-sdk#13925](cloudflare/workers-sdk#13925) needs a follow-up correction comment noting the routing-drift alternative explanation. That's a follow-up, not part of this revert. ## Changelog ### Changed - Roll back the two-config wrangler split. Production and staging deploys continue to use a single `wrangler.jsonc` with an `env.staging` block; deploys use `wrangler deploy` (prod) and `wrangler deploy --env staging` (staging) respectively. ## Type of Change - [x] `revert`: Reverting a previous change ## Related Issues/Stories - Story: Failed staging deploy for PR #88, plus the recurrence of the anc.dev routing-drift bug during that deploy. The routing-drift explanation is a better fit for the asset-overlay symptoms we saw earlier in the day than the asset-sharing theory PR #88 was built on. - Issue: [cloudflare/workers-sdk#13925](cloudflare/workers-sdk#13925) (will be updated with a follow-up comment). - Architecture: None. - Related PRs: #88 (the PR being reverted). ## Files Modified **Modified:** - `wrangler.jsonc`: restore `env.staging` block, drop the `define` for `__BUILD_ENV__`. - `.github/workflows/deploy.yml`: staging job back to `wrangler deploy --env staging`; production job back to `deploy --env=""`. - `.github/workflows/ci.yml`: image-pin guard reads both pins via the multi-env shape again. - `scripts/hooks/pre-push`: dry-runs go back to `--env=""` and `--env staging`. - `src/worker/index.ts`: `__BUILD_ENV__` declaration and `BUILD_ENV` constant removed; `applyHeaders` call loses the `buildEnv` field. - `src/worker/headers.ts`: `ApplyHeadersOptions` loses `buildEnv`; `X-Build-Env` header no longer emitted. - `tests/worker.test.ts`: all `applyHeaders` call sites have `buildEnv` removed; the three `X-Build-Env` tests are removed. - `RELEASES.md`: deploy table and sandbox-image-release procedure revert to the multi-env wording. - `docker/sandbox/README.md`: pin location reverts to `env.staging.containers[0].image`. - `styles/config/vocabularies/site/accept.txt`: seven added terms removed. **Created:** - None. **Renamed:** - None. **Deleted:** - `wrangler.staging.jsonc` (the standalone staging config introduced by PR #88). ## Testing - [x] Unit tests added/updated - [x] All tests passing **Test Summary:** - 315 unit and regression tests pass (back to the pre-PR-#88 count of 315, after the three `X-Build-Env` tests are removed by the revert). - `bun x wrangler deploy --dry-run --env=""`: clean, prod bindings (anc-score-cache R2, rate-limit namespace 1001). - `bun x wrangler deploy --dry-run --env staging`: clean, staging bindings (anc-score-cache-staging R2, rate-limit namespace 1002). - Pre-push gate (lint, build, tests, both dry-runs, pack-README, banned-fonts, prose-check) passes end-to-end. **Post-merge plan:** Once the staging deploy on this revert lands cleanly (it should: the multi-env shape matches the existing CF resources), curl staging and prod and verify both still respond. Then file a follow-up comment on workers-sdk#13925 explaining the routing-drift alternative explanation and what we learned today.
Merged
3 tasks
brettdavies
added a commit
that referenced
this pull request
May 15, 2026
…eritance from top-level (#90) ## Summary Explicitly override two inheritable keys in `env.staging` so they stop silently inheriting destructive values from the top-level config: `routes` (which has been quietly stealing anc.dev's custom-domain binding away from production on every dev push since the 2026-04-30 v0.1 launch) and `triggers` (prophylactic; no current scheduled triggers, but the same trap shape). The "routing-drift bug" we have been chasing for two weeks turns out to be documented Wrangler behavior. Per the [Inheritable keys list](https://developers.cloudflare.com/workers/wrangler/configuration/), `routes` is an inheritable key. The top-level config declares `routes: [{ pattern: "anc.dev", custom_domain: true }]`. `env.staging` had no `routes` field, so it silently inherited that array. Every `wrangler deploy --env staging` ran with `routes: [{anc.dev}]` in scope and re-attached anc.dev to `agentnative-site-staging`, transferring the custom-domain binding away from `agentnative-site`. The deployment log includes a `Deployed agentnative-site-staging triggers ... anc.dev (custom domain)` line on every staging deploy that nobody had read as "the prod custom domain just moved". Explicit empty arrays break the inheritance without changing any other behavior. Today's filed-then-retracted upstream issue at [cloudflare/workers-sdk#13925](cloudflare/workers-sdk#13925) (rewritten to describe this trap honestly) suggests Wrangler add a deploy-time warning when an env block inherits a `routes` array containing custom domains. ## Changelog ### Fixed - Stop staging deploys from re-attaching `anc.dev` to the staging Worker on every dev push. The "routing-drift bug" tracked since 2026-04-30 was caused by `env.staging` silently inheriting the top-level `routes` array. Explicit `routes: []` override on `env.staging` makes the staging Worker's deployment stop asserting ownership of `anc.dev`. ### Changed - Added explicit `triggers: { crons: [] }` override on `env.staging` as a prophylactic against the same inheritance pattern firing on a future scheduled-trigger addition. ## Type of Change - [x] `fix`: Bug fix (non-breaking change which fixes an issue) ## Related Issues/Stories - Story: Closes the chronic routing-drift bug. anc.dev should now stay on the production Worker across staging deploys. - Issue: [cloudflare/workers-sdk#13925](cloudflare/workers-sdk#13925) (rewritten to document the actual inheritance trap). - Architecture: None. - Related PRs: #85 (manual routing-drift fix that this PR addresses structurally), #88 (asset-sharing fix that was reverted by #89 once the routes-inheritance explanation surfaced), #89 (the revert of #88). ## Files Modified **Modified:** - `wrangler.jsonc`: two explicit overrides added on `env.staging` (`routes: []` and `triggers: { crons: [] }`) with inline comments documenting why. No other config changed. **Created:** - None. **Renamed:** - None. **Deleted:** - None. ## Testing - [x] Unit tests added/updated - [x] All tests passing **Test Summary:** - 315 unit and regression tests pass (unchanged from pre-PR baseline). - `bun x wrangler deploy --dry-run --env=""`: clean. Production-side bindings unchanged. - `bun x wrangler deploy --dry-run --env staging`: clean. Staging-side bindings unchanged. - The deployment log's `Deployed agentnative-site-staging triggers` section is now empty (dry-run output suppresses the section entirely when there are no triggers, which matches the expected post-fix behavior). - Pre-push gate passes end-to-end. **Audit of every inheritable key:** | Key | Top-level | env.staging | Status | |---|---|---|---| | `name` | `agentnative-site` | `agentnative-site-staging` | Explicit override | | `main` | `src/worker/index.ts` | inherited | Safe (same source by design) | | `compatibility_date` | `2026-04-01` | inherited | Safe | | `compatibility_flags` | `["nodejs_compat"]` | inherited | Safe | | `account_id` | via env var | via env var | Same account | | `workers_dev` | `false` | `true` | Explicit override | | `routes` | `[{anc.dev}]` | `[]` | **Explicit override (this PR)** | | `triggers` | not set | `{ crons: [] }` | **Explicit override (this PR, prophylactic)** | | `observability` | `{enabled: true, ...}` | inherited | Safe (same intent) | | `assets` | `{directory: "./dist", ...}` | inherited | Safe (per-Worker asset stores; the "asset overlay" symptoms were routing drift) | | `send_metrics` | `false` | inherited | Safe (both opt out) | | `migrations` | `[{tag: v1, ...}]` | explicitly declared | Already overridden | | `preview_urls`, `route` (singular), `tsconfig`, `rules`, `build`, `no_bundle`, `find_additional_modules`, `base_dir`, `preserve_file_names`, `minify`, `keep_names`, `logpush`, `limits`, `placement` | not set | not set | n/a | If any of the currently-unset inheritable keys gets set at the top level later, re-audit `env.staging` and add an explicit override if the inherited value would be destructive for staging. **Post-merge plan:** After this PR's staging deployment on dev, verify: - Staging deploy log's `Deployed agentnative-site-staging triggers` section is empty (no `anc.dev (custom domain)` line). - CF API: `anc.dev` custom-domain binding stays on `service: agentnative-site` after the deployment completes. - `curl https://anc.dev/` returns 200 from the production Worker (no `X-Robots-Tag` header, content matches main's deployed assets). - `curl https://agentnative-site-staging.brettdavies.workers.dev/` returns 200 with `X-Robots-Tag: noindex`. If anc.dev binding moves to staging after this deployment, the fix is wrong and we need a different mechanism.
3 tasks
brettdavies
added a commit
that referenced
this pull request
May 15, 2026
## Summary Third production release of the day. Bundles all post-PR-#85 dev work that survived today's incident, namely the three PRs that are net additive after the failed split-config experiment (PR #88) and its revert (PR #89) cancel out: PR #86 (explicit Wrangler env target + dual-env pre-push dry-run), PR #87 (corrected `cf` registry entry pointing at its npm distribution surface), and PR #90 (the actual fix for the 2-week routing-drift bug: explicit `routes: []` override on `env.staging` to break Wrangler's inheritable-keys inheritance, plus prophylactic `triggers: { crons: [] }` override for the same trap shape). The headline is PR #90's wrangler.jsonc change. anc.dev is currently bound to the production Worker (after a manual rebind earlier today), and PR #90's staging-deploy verification confirmed the routing fix holds: the `Deployed agentnative-site-staging triggers` block no longer lists `anc.dev (custom domain)`, and the CF API confirms the binding stays on `agentnative-site` through staging deploys. Promoting that wrangler.jsonc state to main means the routing fix is in the committed source-of-truth, not just in dev's history. User-facing effects on anc.dev after this deploy: the corrected `/registry-index.json` and `/score/cf` content from PR #87 reach production (today they live on staging only); the env-target explicitness from PR #86 removes the wrangler ambiguity warning from the next production deploy log. The Worker code itself, bindings, and DO migrations are unchanged. ## Changelog ### Fixed - Stop staging deploys from re-attaching `anc.dev` to the staging Worker on every dev push. The routing-drift bug tracked since 2026-04-30 was caused by `env.staging` silently inheriting the top-level `routes` array. Explicit `routes: []` override on `env.staging` breaks the inheritance. anc.dev now stays on the production Worker across staging deploys. - Correct the `cf` entry in `registry.yaml`. The previous entry claimed `repo: cloudflare/workers-sdk`, but `cf` is not in that repo and the npm package itself declares no repository. Replaced with a `url:` pointing at the npm distribution page. Side effect: the build's reverse-lookup map for `cloudflare/workers-sdk` now correctly resolves to `wrangler` instead of `cf`. ### Changed - Pass `--env=""` explicitly on the production wrangler-action deploy command. Removes the "Multiple environments are defined" ambiguity warning from production deploy logs. - Pre-push hook now runs `wrangler deploy --dry-run` for both the production and staging environments instead of one bare invocation. Catches binding mistakes in either environment before push. - Added explicit `triggers: { crons: [] }` override on `env.staging` as a prophylactic against the same inheritance trap shape on scheduled triggers. Currently no scheduled triggers; the override forces a deliberate decision when adding any. ## Type of Change - [x] `fix`: Bug fix (non-breaking change which fixes an issue) The release is multi-typed (one fix headline plus two ride-along changes) but `fix` headlines because PR #90's routes-inheritance fix is the durable resolution of the 2-week routing-drift incident. ## Related Issues/Stories - Story: Closes the production side of the routing-drift fix arc. PR #85 brought `agentnative-site` (named-prod) current and manually rebound anc.dev to prod, but the fix was not durable because the underlying wrangler.jsonc still inherited the prod route into `env.staging` on every staging deploy. PR #90 fixed the inheritance in source; this release ships that to main. - Issue: [cloudflare/workers-sdk#13925](cloudflare/workers-sdk#13925), rewritten today as a docs/UX bug describing the inheritance trap and recommending a deploy-time warning when env blocks silently inherit a `routes` array containing custom domains. - Architecture: `docs/solutions/integration-issues/wrangler-routes-inheritance-staging-custom-domain-drift-2026-05-15.md` (dev-only) for the full investigation writeup. - Related PRs: #85, #86, #87, #88 (reverted), #89 (revert), #90. ## Files Modified **Modified:** - `wrangler.jsonc`: env.staging block now explicitly overrides `routes: []` and `triggers: { crons: [] }` to break Wrangler's inheritable-keys inheritance. Top-level production config unchanged (`routes: [{ pattern: "anc.dev", custom_domain: true }]`, `workers_dev: false`). - `.github/workflows/deploy.yml`: production job's wrangler-action command is `deploy --env=""` (was bare `deploy`); staging job's command is unchanged (`deploy --env staging`). - `scripts/hooks/pre-push`: replaces the single bare `wrangler deploy --dry-run` step with two dry-runs, one per environment (`--env=""` and `--env staging`). - `registry.yaml`: `cf` entry replaces `repo: cloudflare/workers-sdk` with `url: https://www.npmjs.com/package/cf` and an inline comment explaining why. **Created:** - None. **Renamed:** - None. **Deleted:** - None. ## Testing - [x] Unit tests added/updated - [x] All tests passing **Test Summary:** - 315 unit and regression tests pass. - `bun x wrangler deploy --dry-run --env=""`: clean, lists the production-side bindings (Sandbox DO, R2 bucket `anc-score-cache`, rate-limit namespace 1001, ASSETS). Container image pinned at `:30f61f1`. - `bun x wrangler deploy --dry-run --env staging`: clean, lists the staging-side bindings (R2 `anc-score-cache-staging`, rate-limit namespace 1002). The `Deployed triggers` section in dry-run output is suppressed. - Pre-push hook (lint, build, both wrangler dry-runs, pack-README, banned-fonts, prose-check) passes end-to-end. - prose-check: 0 blocking. **Post-merge verification plan** (after the production deploy on this PR's merge): - Production deploy log's `Deployed agentnative-site triggers` section lists only `anc.dev (custom domain)`. The wrangler ambiguity warning is gone from the log thanks to `--env=""`. - `curl https://anc.dev/registry-index.json` returns the corrected mapping (`cloudflare/workers-sdk` → `wrangler`, md5 should match the dev-side build, which is `f50579f244013d2b76e999a9502f4e46`). - `curl -sI https://anc.dev/` returns 200 with no `X-Robots-Tag` header (production Worker). - CF API confirms `anc.dev` Custom Domain record stays on `service: agentnative-site` after the deployment completes. - A subsequent dev push (next staging deploy) leaves `anc.dev` on production. This is the durability test: pre-PR-#90, every staging deploy would have flipped the binding; post-PR-#90, it should not.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Split the wrangler configuration into two fully independent files (
wrangler.jsoncfor production,wrangler.staging.jsoncfor staging) and force a single-character byte divergence between the two compiled Worker scripts. Workaround for cloudflare/workers-sdk#13925.The bug we hit: in a single-config multi-env shape (one wrangler.jsonc with a top-level block plus
env.staging), staging deploys silently overwrite what production serves for any URL path whose asset content differs between the two builds. Cloudflare's asset deduplication appears to key on the compiled Worker script's etag; the multi-env shape produces byte-identical scripts (the bindings differ but they're not part of the script bytes), so the two Workers share an asset namespace.This was caught during a post-release log audit on PR #85. anc.dev was observed serving the post-PR-#87 content for
/registry-index.json,/score/cf.html, and/score/cf.md, despite production having no new deploy since PR #85's merge. Production-deployed assets had been silently overlaid by the subsequent staging deploy. Full reproduction with version IDs, etags, and content hashes is in the upstream issue.Changelog
Changed
wrangler.jsoncfor production on anc.dev,wrangler.staging.jsoncfor staging on the workers.dev subdomain). The previous single-file multi-env shape (top-level plusenv.staging) is gone. Adding new bindings or changing observability now requires editing both files; the trade-off is real asset isolation between environments.X-Build-Envheader set toproduction,staging, ordevelopment(the development value surfaces only when the Worker code is exercised by the bun test runner). The header is diagnostic and also the mechanism that keeps the two compiled Worker scripts at distinct etags: each config'sdefinesubstitutes a different literal string intosrc/worker/index.ts's__BUILD_ENV__reference.Fixed
Documentation
RELEASES.mddeploy section documents the two-config layout and links to cloudflare/workers-sdk#13925.RELEASES.mdsandbox image release procedure points atwrangler.jsoncandwrangler.staging.jsoncinstead of the oldenv.stagingblock.docker/sandbox/README.mdupdated to match.Type of Change
fix: Bug fix (non-breaking change which fixes an issue)Related Issues/Stories
/registry-index.jsoncontent.Files Modified
Modified:
wrangler.jsonc: dropped theenv.stagingblock, added adefinefor__BUILD_ENV__: "production"..github/workflows/deploy.yml: staging job uses--config wrangler.staging.jsonc; production job retains the defaultwrangler deploy..github/workflows/ci.yml: sandbox image-pin guard reads both configs via--config.scripts/hooks/pre-push: wrangler dry-run step now covers both configs.src/worker/index.ts: declared__BUILD_ENV__and theBUILD_ENVconstant with adevelopmentfallback for tests.src/worker/headers.ts:applyHeadersaccepts abuildEnvoption and emitsX-Build-Envon every response.tests/worker.test.ts: everyapplyHeaderscall site updated withbuildEnv: 'development'; three new tests cover theX-Build-Envheader for each env value.RELEASES.md: deploy table updated, new subsection on the two-config layout, sandbox image release procedure rewritten.docker/sandbox/README.md: pin location updated.styles/config/vocabularies/site/accept.txt: legitimate technical terms (configs,etag,etags,deduplication,namespaces,envs,CF's) added so the new prose passes Vale at the error tier.Created:
wrangler.staging.jsonc: standalone staging config with all the same fields production needs, plusdefinesetting__BUILD_ENV__: "staging".Renamed:
Deleted:
Testing
Test Summary:
X-Build-Envcases).bun x wrangler deploy --dry-run --config wrangler.jsonc: clean, lists the production-side bindings (anc-score-cache R2, rate-limit namespace 1001, prod container app).bun x wrangler deploy --dry-run --config wrangler.staging.jsonc: clean, lists the staging-side bindings (anc-score-cache-staging R2, rate-limit namespace 1002, staging container app).definesubstitution.Post-merge verification plan:
After the staging deploy on this PR's merge to dev:
curl -sI https://agentnative-site-staging.brettdavies.workers.dev/returns 200 withX-Build-Env: stagingandX-Robots-Tag: noindex.After the next release PR to main:
curl -sI https://anc.dev/returns 200 withX-Build-Env: productionand noX-Robots-Tagheader.