fix(ci): pass CLOUDFLARE_API_TOKEN to deep-check wrangler dev steps#92
Merged
Conversation
Both jobs in deep-check.yml (e2e + lhci) start the test web server via `wrangler dev --local`. In wrangler 4.x, dev mode authenticates to the Cloudflare managed registry to read the container image manifest from `registry.cloudflare.com/<acct>/anc-sandbox:<tag>` even under `--local`. Without CLOUDFLARE_API_TOKEN in env, wrangler exits with "Not logged in" before Playwright can connect, and the deep-check nightly fails 100% of the time. The regression landed silently in PR #84 (U3-followup), which migrated the sandbox container image off Docker Hub. Docker Hub is anonymous pullable; the CF managed registry isn't. ci.yml didn't surface this because its tests don't invoke `wrangler dev` (only `bun test` and `wrangler --dry-run`, both of which work without dev-mode auth or already have the token plumbed). Fix: pass the same CF_API_TOKEN + CF_ACCOUNT_ID secrets that deploy.yml uses, on both the e2e step and the lhci step. No new secret provisioning needed; the secrets exist in the repo already. Header comment updated to reflect the new dependency and explain why. Verification path: post-merge, manually dispatch deep-check via `gh workflow run deep-check.yml` against the merge commit to confirm the auth fix lands. Cannot fully verify pre-merge because the failure mode only surfaces in the GHA runner environment (local dev has CLOUDFLARE_API_TOKEN in shell env, so local `bun run test:e2e` works fine). Related: cloudflare/workers-sdk#13925 (today's other CF Workers gotcha), but mechanically unrelated.
brettdavies
added a commit
that referenced
this pull request
May 15, 2026
Deferring the deep-check Dockerfile-EXPOSE fix to U6, where the image is being rebuilt anyway as part of replacing the Sandbox DO stub. Captured in three places under U6: - Files: explicit Modify line for docker/sandbox/Dockerfile with the EXPOSE directive requirement, including the port-3000 reservation note, the surfacing context (today's deep-check investigation), and the rationale for amortizing the image-rebuild ceremony into U6 rather than spinning a separate image release. - Test scenarios: new Integration (CI) item asserting deep-check.yml goes green on the U6 merge SHA. Both jobs (e2e + lhci) use wrangler dev --local as their test webServer, so they prove the EXPOSE is correct end-to-end. - Verification: deep-check passing becomes part of U6 acceptance alongside the existing two-phase-egress and anc_version invariants. Background: PR #92 (today) added CLOUDFLARE_API_TOKEN to deep-check's e2e + lhci jobs, which resolved the first failure mode (Not logged in) but surfaced the second (container ports). The fix is a one-line Dockerfile change; the cost is the image rebuild + push + lockstep pin bump. Bundling with U6 is cheaper than a standalone image release for a stub container with no functional runtime yet.
15 tasks
brettdavies
added a commit
that referenced
this pull request
May 15, 2026
## Summary Implements plan U5 of the live-scoring rollout: wires `/api/score` into the Worker with content negotiation, response shape with the R11 triad, registry-fast-path (unmetered) for known tools, and the bot-defense / abuse-mitigation stack (Turnstile siteverify, signed HMAC session cookie, KV kill switch, dual rate-limits). The DO stub still returns `sandbox_stub_until_u6`; this PR exercises the route end-to-end as far as the DO boundary so U6 can drop in the real sandbox without further plumbing. Also adds a regression test for the wrangler.jsonc inheritance fix (PR #90) so that the `env.staging.routes: []` and `env.staging.triggers: { crons: [] }` overrides cannot be silently removed without CI catching it. Driven by the recent compounded learning at `docs/solutions/integration-issues/wrangler-routes-inheritance-staging-custom-domain-drift-2026-05-15.md`. ## Changelog ### Added - Add `/api/score` endpoint that returns registry-known tool scorecards (unmetered fast path pointing at the existing `/score/<slug>` page) and runs the full bot-defense pipeline for unknown tools, returning a `sandbox_stub_until_u6` envelope until U6 ships. - Add `SCORE_KV` namespace binding for the operator-flippable `scoring_disabled` kill switch (flip with `wrangler kv key put --binding=SCORE_KV scoring_disabled true`). - Add `SCORE_LIMITER_IP` secondary rate-limit binding (30 req / 60 s / IP) as a coarse fallback when a session cookie is swapped to dodge the primary session-keyed limiter. ### Changed - Augment `dist/registry-index.json` entries with `version`, `anc_version`, and `scorecard_url` for every tool that has a committed scorecard, so the Worker can build the R11 triad and route registry hits to the existing per-tool page without fetching the scorecard payload. ## Type of Change - [x] `feat`: New feature (non-breaking change which adds functionality) ## Related Issues/Stories - Story: n/a - Issue: n/a - Architecture: docs/plans/2026-04-28-002-feat-live-scoring-cf-sandbox-plan.md (plan U5) - Related PRs: #78 (U1), #79 (U2), #80 (U4), #81 (U3), #84 (U3-fu), #90 (env.staging routes override), #92 (deep-check auth fix) ## Testing - [x] Unit tests added/updated - [x] Integration tests added/updated - [x] Manual testing completed - [x] All tests passing **Test Summary:** - Unit tests: 377 passing (315 baseline + 62 new across 4 files) - Integration tests: covered by `tests/score-handler.test.ts` exercising the full pipeline end-to-end against stubbed bindings - Coverage: every `ScoreError` discriminated-union variant has an HTTP-status mapping test; q-value content negotiation has the regression test required by the plan (`Accept: text/markdown;q=0.1, application/json;q=0.9` resolves to JSON, not markdown) Locally verified: - `bun test` (377/377 pass) - `bunx biome check src/ tests/` (clean) - `bunx tsc --noEmit` (31 pre-existing errors, none in U5 code; baseline before this PR was 39) - `bun run build` (registry-index emits with enriched entries; ripgrep entry verified to carry `version: 15.1.0`, `anc_version: 0.3.0`, `scorecard_url: /score/ripgrep`) ## Files Modified **Modified:** - `src/build/build.mjs`, `src/build/registry-index.mjs`: reorder build pipeline so scorecards load before indexes emit; pass an enrichment map keyed by tool name into registry-index emission. - `src/worker/index.ts`: add the `/api/score` path-prefix branch above the asset call; loosen Env to permit narrow asset-only test stubs. - `src/worker/accept.ts`: add `detectScorePreference` for the JSON/markdown surface; keep `detectPreference` unchanged for site-side paths. - `src/worker/score/registry-lookup.ts`: extend `RegistryEntry` with optional `version`, `anc_version`, `scorecard_url`. - `wrangler.jsonc`: add `kv_namespaces: SCORE_KV` and `ratelimits: SCORE_LIMITER_IP` to both prod and staging. - `src/worker-configuration.d.ts`: regenerated via `wrangler types`. - `tests/worker.test.ts`: add a `/api/score routing` describe block with the plan-required q-value test. **Created:** - `src/worker/score/response-shape.ts`: `ScoreError` discriminated union (18 variants), `statusForError`, `shapeScoreSuccess`, `shapeScoreError` with R11 triad enforcement. - `src/worker/score/content-negotiation.ts`: URL-suffix + q-value content detection for `/api/score(.json|.md)`. - `src/worker/score/kill-switch.ts`: SCORE_KV `scoring_disabled` reader with 30 s in-memory cache. - `src/worker/score/session.ts`: signed `__Host-anc-session` cookie issue / parse via Web Crypto HMAC-SHA256. - `src/worker/score/turnstile.ts`: Turnstile siteverify wrapper. - `src/worker/score/handler.ts`: orchestrates the full pipeline. - `src/worker/spec-version.gen.ts`: placeholder; U8 wires the build-emit. - `tests/score-response-shape.test.ts`: 27 tests covering every ScoreError variant + R11 triad enforcement. - `tests/score-handler.test.ts`: 23 tests covering registry-fast-path, kill switch, Turnstile, rate-limits, DO passthrough, session cookie, service-misconfigured fail-fast, content negotiation. - `tests/wrangler-config.test.ts`: 9 tests guarding the wrangler.jsonc inheritance fix. **Renamed:** - None. **Deleted:** - None. ## Key Features - Registry-fast-path returns cache-hit-shaped JSON (no live sandbox spin) for any tool with a committed scorecard, including a GET surface (`GET /api/score?input=ripgrep`) for paste-and-share without form interaction. - R11 response contract is structurally enforced: every success response includes `spec_version`, `anc_version`, `checker_url`; a missing `anc_version` returns 500 `incomplete_response_contract` rather than a silent partial. - ScoreError exhaustiveness: adding a new variant without updating `statusForError` is a compile error. - Operator kill switch: `wrangler kv key put --binding=SCORE_KV scoring_disabled true` propagates to all isolates within 30 s. - Dual rate-limit: primary session-keyed limiter (cache-friendly: same-tool requests in a session don't burn budget); coarse per-IP fallback catches cookie-swappers. ## Benefits - Unblocks U6 (real sandbox) and U7 (R2 cache). The route surface is stable; U6 swaps the DO stub for the real implementation behind the existing envelope. - Cost ceiling defenses are in place at launch: Turnstile blocks headless / no-JS attackers; rate-limits cap distinct-tool requests per session and per IP; KV kill switch gives the operator a seconds-latency disable lever. ## Breaking Changes - [x] No breaking changes - [ ] Breaking changes described below: ## Deployment Notes - [ ] No special deployment steps required - [x] Deployment steps documented below: Two operator-managed secrets must be set out of band before the route accepts POST traffic: - `wrangler secret put TURNSTILE_SECRET` (and `--env staging`) - `wrangler secret put SESSION_HMAC_SECRET` (and `--env staging`) Until those are set, `/api/score` POST requests for non-registry inputs return 500 `service_misconfigured` (fail-fast by design). GET requests for registry-known tools work without either secret. Pull both values from the 1Password vault entry for the production Cloudflare account before promoting to prod. KV namespaces are already created and the IDs committed in wrangler.jsonc. Secondary rate-limit namespaces use `1003` (prod) and `1004` (staging); no operator action needed for those. ## Screenshots/Recordings n/a (server-side surface; UI form lands in U8). ## Checklist - [x] Code follows project conventions and style guidelines - [x] Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) - [x] Self-review of code completed - [x] Tests added/updated and passing - [x] No new warnings or errors introduced - [x] Changes are backward compatible (or breaking changes documented) ## Additional Context The DO stub at `src/worker/score/do.ts` still returns `{error: sandbox_stub_until_u6}`. The handler surfaces that as a 503 with the structured `sandbox_stub_until_u6` ScoreError so the route is honest about its state. U6 replaces the DO stub with the real `@cloudflare/sandbox`-extending class and the install + score flow. `src/worker/spec-version.gen.ts` is a hand-edited placeholder for U5; U8 wires `src/build/build.mjs` to regenerate it from `src/data/spec/VERSION` and `content/principles/VERSION` at build time.
brettdavies
added a commit
that referenced
this pull request
May 25, 2026
First production cut of the live-scoring stack. Promotes everything on dev since PR #91 (the prior release): plan U5-U10 worker code (handler, sandbox DO, container, R2 cache, rate limits, telemetry, kill switch, homepage form, shareable result URLs, monitoring runbook), the docker sandbox image with anc v0.4.0 baked in, deploy split + routing-drift follow-ups (#92 CI token plumbing), the contributor surface (nav, footer Source row, intake template, README rewrite), the build SRP refactor, the SEO JSON-LD @graph fix, the cross-migration rollback rehearsal evidence and recipe correction, and the env.SCORE handler guard that converts a mid-rollback CF 1101 into a typed 503 sandbox_unavailable. Lockstep image bump: both top-level containers[0].image and env.staging.containers[0].image at :9aed5c3 (anc-cli v0.4.0, sha256:dae72c56afe2f332e8745c0517f1ed5d21993470de663409dfc9b3973cdfe4c1). The image cleared staging deploy on dev push 26384622721; soak was skipped per release-cut decision. Triple-diff verification clean (134 files changed; no guarded-path leaks; expected B-diff is the prod-pin bump only).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pass
CLOUDFLARE_API_TOKEN+CLOUDFLARE_ACCOUNT_IDto the twodeep-check.ymljobs (Playwright + Lighthouse CI) that spin upwrangler devas the test web server. Without these env vars, wrangler 4.x exits withNot logged inbefore Playwright can connect, and the nightly fails 100% of the time.The regression landed silently in PR #84 (U3-followup) when the sandbox container image moved off Docker Hub onto the Cloudflare managed registry. Anonymous pulls work for
docker.io/...; the CF managed registry requires auth, and wrangler 4.x authenticates to read the container image manifest even under--local. ci.yml did not catch it because nothing in the ci.yml pipeline invokeswrangler dev. Today's first deep-check after PR #91 merged surfaced it as the Playwright + Lighthouse CI jobs failed simultaneously with the sameNot logged inerror fromwrangler dev.The same secrets
deploy.ymlalready uses are passed here; no new provisioning needed. Header comment updated to document the new dependency.Changelog
Fixed
deep-check.yml'se2eandlhcijobs no longer fail 100% withNot logged infromwrangler dev. Both jobs now haveCLOUDFLARE_API_TOKENandCLOUDFLARE_ACCOUNT_IDplumbed through, which wrangler 4.x needs in order to read the container image manifest from the CF managed registry even under--local.Type of Change
fix: Bug fix (non-breaking change which fixes an issue)Related Issues/Stories
25905730897, today 07:20 UTC, scheduled trigger immediately after PR release: routes-inheritance fix + post-#85 promotion #91 merged to main) surfaced both jobs failing identically.Files Modified
Modified:
.github/workflows/deep-check.yml: header comment under "Secrets" now documents theCLOUDFLARE_API_TOKEN/CLOUDFLARE_ACCOUNT_IDdependency and the reason. Thee2ejob'sEnd-to-end testsstep and thelhcijob'sLighthouse CIstep each gained anenv:block settingCLOUDFLARE_API_TOKENandCLOUDFLARE_ACCOUNT_IDfrom the existing repo secrets.Created:
Renamed:
Deleted:
Testing
Test Summary:
gh workflow run deep-check.yml --ref mainand confirm both jobs reach the test phase (no longer fail atNot logged in). Cannot fully verify pre-merge because the failure mode only surfaces in the GHA runner environment. Localbun run test:e2eworks because the developer's shell already hasCLOUDFLARE_API_TOKENset.