fix(ci): pass CLOUDFLARE_API_TOKEN to deep-check wrangler dev steps by brettdavies · Pull Request #92 · brettdavies/agentnative-site

brettdavies · 2026-05-15T07:28:55Z

Summary

Pass CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID to the two deep-check.yml jobs (Playwright + Lighthouse CI) that spin up wrangler dev as the test web server. Without these env vars, wrangler 4.x exits with Not logged in before Playwright can connect, and the nightly fails 100% of the time.

The regression landed silently in PR #84 (U3-followup) when the sandbox container image moved off Docker Hub onto the Cloudflare managed registry. Anonymous pulls work for docker.io/...; the CF managed registry requires auth, and wrangler 4.x authenticates to read the container image manifest even under --local. ci.yml did not catch it because nothing in the ci.yml pipeline invokes wrangler dev. Today's first deep-check after PR #91 merged surfaced it as the Playwright + Lighthouse CI jobs failed simultaneously with the same Not logged in error from wrangler dev.

The same secrets deploy.yml already uses are passed here; no new provisioning needed. Header comment updated to document the new dependency.

Changelog

Fixed

deep-check.yml's e2e and lhci jobs no longer fail 100% with Not logged in from wrangler dev. Both jobs now have CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID plumbed through, which wrangler 4.x needs in order to read the container image manifest from the CF managed registry even under --local.

Type of Change

fix: Bug fix (non-breaking change which fixes an issue)

Related Issues/Stories

Story: deep-check has been failing on every scheduled run since PR fix(live-scoring): migrate sandbox image off Docker Hub (U3-followup) #84 (U3-followup) landed on dev. The first deep-check that ran on main with the migrated container image (run 25905730897, today 07:20 UTC, scheduled trigger immediately after PR release: routes-inheritance fix + post-#85 promotion #91 merged to main) surfaced both jobs failing identically.
Issue: None.
Architecture: None.
Related PRs: fix(live-scoring): migrate sandbox image off Docker Hub (U3-followup) #84 (introduced the regression by migrating off Docker Hub), release: routes-inheritance fix + post-#85 promotion #91 (the release that promoted the new image pin to main and triggered the first deep-check failure with main's view of it).

Files Modified

Modified:

.github/workflows/deep-check.yml: header comment under "Secrets" now documents the CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID dependency and the reason. The e2e job's End-to-end tests step and the lhci job's Lighthouse CI step each gained an env: block setting CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID from the existing repo secrets.

Created:

None.

Renamed:

None.

Deleted:

None.

Testing

All tests passing locally (no new tests)

Test Summary:

315 unit and regression tests pass locally.
Pre-push gate green.
The actual auth-fix verification path is post-merge: manually dispatch the deep-check workflow against the merge commit via gh workflow run deep-check.yml --ref main and confirm both jobs reach the test phase (no longer fail at Not logged in). Cannot fully verify pre-merge because the failure mode only surfaces in the GHA runner environment. Local bun run test:e2e works because the developer's shell already has CLOUDFLARE_API_TOKEN set.

Both jobs in deep-check.yml (e2e + lhci) start the test web server via `wrangler dev --local`. In wrangler 4.x, dev mode authenticates to the Cloudflare managed registry to read the container image manifest from `registry.cloudflare.com/<acct>/anc-sandbox:<tag>` even under `--local`. Without CLOUDFLARE_API_TOKEN in env, wrangler exits with "Not logged in" before Playwright can connect, and the deep-check nightly fails 100% of the time. The regression landed silently in PR #84 (U3-followup), which migrated the sandbox container image off Docker Hub. Docker Hub is anonymous pullable; the CF managed registry isn't. ci.yml didn't surface this because its tests don't invoke `wrangler dev` (only `bun test` and `wrangler --dry-run`, both of which work without dev-mode auth or already have the token plumbed). Fix: pass the same CF_API_TOKEN + CF_ACCOUNT_ID secrets that deploy.yml uses, on both the e2e step and the lhci step. No new secret provisioning needed; the secrets exist in the repo already. Header comment updated to reflect the new dependency and explain why. Verification path: post-merge, manually dispatch deep-check via `gh workflow run deep-check.yml` against the merge commit to confirm the auth fix lands. Cannot fully verify pre-merge because the failure mode only surfaces in the GHA runner environment (local dev has CLOUDFLARE_API_TOKEN in shell env, so local `bun run test:e2e` works fine). Related: cloudflare/workers-sdk#13925 (today's other CF Workers gotcha), but mechanically unrelated.

Deferring the deep-check Dockerfile-EXPOSE fix to U6, where the image is being rebuilt anyway as part of replacing the Sandbox DO stub. Captured in three places under U6: - Files: explicit Modify line for docker/sandbox/Dockerfile with the EXPOSE directive requirement, including the port-3000 reservation note, the surfacing context (today's deep-check investigation), and the rationale for amortizing the image-rebuild ceremony into U6 rather than spinning a separate image release. - Test scenarios: new Integration (CI) item asserting deep-check.yml goes green on the U6 merge SHA. Both jobs (e2e + lhci) use wrangler dev --local as their test webServer, so they prove the EXPOSE is correct end-to-end. - Verification: deep-check passing becomes part of U6 acceptance alongside the existing two-phase-egress and anc_version invariants. Background: PR #92 (today) added CLOUDFLARE_API_TOKEN to deep-check's e2e + lhci jobs, which resolved the first failure mode (Not logged in) but surfaced the second (container ports). The fix is a one-line Dockerfile change; the cost is the image rebuild + push + lockstep pin bump. Bundling with U6 is cheaper than a standalone image release for a stub container with no functional runtime yet.

## Summary Implements plan U5 of the live-scoring rollout: wires `/api/score` into the Worker with content negotiation, response shape with the R11 triad, registry-fast-path (unmetered) for known tools, and the bot-defense / abuse-mitigation stack (Turnstile siteverify, signed HMAC session cookie, KV kill switch, dual rate-limits). The DO stub still returns `sandbox_stub_until_u6`; this PR exercises the route end-to-end as far as the DO boundary so U6 can drop in the real sandbox without further plumbing. Also adds a regression test for the wrangler.jsonc inheritance fix (PR #90) so that the `env.staging.routes: []` and `env.staging.triggers: { crons: [] }` overrides cannot be silently removed without CI catching it. Driven by the recent compounded learning at `docs/solutions/integration-issues/wrangler-routes-inheritance-staging-custom-domain-drift-2026-05-15.md`. ## Changelog ### Added - Add `/api/score` endpoint that returns registry-known tool scorecards (unmetered fast path pointing at the existing `/score/<slug>` page) and runs the full bot-defense pipeline for unknown tools, returning a `sandbox_stub_until_u6` envelope until U6 ships. - Add `SCORE_KV` namespace binding for the operator-flippable `scoring_disabled` kill switch (flip with `wrangler kv key put --binding=SCORE_KV scoring_disabled true`). - Add `SCORE_LIMITER_IP` secondary rate-limit binding (30 req / 60 s / IP) as a coarse fallback when a session cookie is swapped to dodge the primary session-keyed limiter. ### Changed - Augment `dist/registry-index.json` entries with `version`, `anc_version`, and `scorecard_url` for every tool that has a committed scorecard, so the Worker can build the R11 triad and route registry hits to the existing per-tool page without fetching the scorecard payload. ## Type of Change - [x] `feat`: New feature (non-breaking change which adds functionality) ## Related Issues/Stories - Story: n/a - Issue: n/a - Architecture: docs/plans/2026-04-28-002-feat-live-scoring-cf-sandbox-plan.md (plan U5) - Related PRs: #78 (U1), #79 (U2), #80 (U4), #81 (U3), #84 (U3-fu), #90 (env.staging routes override), #92 (deep-check auth fix) ## Testing - [x] Unit tests added/updated - [x] Integration tests added/updated - [x] Manual testing completed - [x] All tests passing **Test Summary:** - Unit tests: 377 passing (315 baseline + 62 new across 4 files) - Integration tests: covered by `tests/score-handler.test.ts` exercising the full pipeline end-to-end against stubbed bindings - Coverage: every `ScoreError` discriminated-union variant has an HTTP-status mapping test; q-value content negotiation has the regression test required by the plan (`Accept: text/markdown;q=0.1, application/json;q=0.9` resolves to JSON, not markdown) Locally verified: - `bun test` (377/377 pass) - `bunx biome check src/ tests/` (clean) - `bunx tsc --noEmit` (31 pre-existing errors, none in U5 code; baseline before this PR was 39) - `bun run build` (registry-index emits with enriched entries; ripgrep entry verified to carry `version: 15.1.0`, `anc_version: 0.3.0`, `scorecard_url: /score/ripgrep`) ## Files Modified **Modified:** - `src/build/build.mjs`, `src/build/registry-index.mjs`: reorder build pipeline so scorecards load before indexes emit; pass an enrichment map keyed by tool name into registry-index emission. - `src/worker/index.ts`: add the `/api/score` path-prefix branch above the asset call; loosen Env to permit narrow asset-only test stubs. - `src/worker/accept.ts`: add `detectScorePreference` for the JSON/markdown surface; keep `detectPreference` unchanged for site-side paths. - `src/worker/score/registry-lookup.ts`: extend `RegistryEntry` with optional `version`, `anc_version`, `scorecard_url`. - `wrangler.jsonc`: add `kv_namespaces: SCORE_KV` and `ratelimits: SCORE_LIMITER_IP` to both prod and staging. - `src/worker-configuration.d.ts`: regenerated via `wrangler types`. - `tests/worker.test.ts`: add a `/api/score routing` describe block with the plan-required q-value test. **Created:** - `src/worker/score/response-shape.ts`: `ScoreError` discriminated union (18 variants), `statusForError`, `shapeScoreSuccess`, `shapeScoreError` with R11 triad enforcement. - `src/worker/score/content-negotiation.ts`: URL-suffix + q-value content detection for `/api/score(.json|.md)`. - `src/worker/score/kill-switch.ts`: SCORE_KV `scoring_disabled` reader with 30 s in-memory cache. - `src/worker/score/session.ts`: signed `__Host-anc-session` cookie issue / parse via Web Crypto HMAC-SHA256. - `src/worker/score/turnstile.ts`: Turnstile siteverify wrapper. - `src/worker/score/handler.ts`: orchestrates the full pipeline. - `src/worker/spec-version.gen.ts`: placeholder; U8 wires the build-emit. - `tests/score-response-shape.test.ts`: 27 tests covering every ScoreError variant + R11 triad enforcement. - `tests/score-handler.test.ts`: 23 tests covering registry-fast-path, kill switch, Turnstile, rate-limits, DO passthrough, session cookie, service-misconfigured fail-fast, content negotiation. - `tests/wrangler-config.test.ts`: 9 tests guarding the wrangler.jsonc inheritance fix. **Renamed:** - None. **Deleted:** - None. ## Key Features - Registry-fast-path returns cache-hit-shaped JSON (no live sandbox spin) for any tool with a committed scorecard, including a GET surface (`GET /api/score?input=ripgrep`) for paste-and-share without form interaction. - R11 response contract is structurally enforced: every success response includes `spec_version`, `anc_version`, `checker_url`; a missing `anc_version` returns 500 `incomplete_response_contract` rather than a silent partial. - ScoreError exhaustiveness: adding a new variant without updating `statusForError` is a compile error. - Operator kill switch: `wrangler kv key put --binding=SCORE_KV scoring_disabled true` propagates to all isolates within 30 s. - Dual rate-limit: primary session-keyed limiter (cache-friendly: same-tool requests in a session don't burn budget); coarse per-IP fallback catches cookie-swappers. ## Benefits - Unblocks U6 (real sandbox) and U7 (R2 cache). The route surface is stable; U6 swaps the DO stub for the real implementation behind the existing envelope. - Cost ceiling defenses are in place at launch: Turnstile blocks headless / no-JS attackers; rate-limits cap distinct-tool requests per session and per IP; KV kill switch gives the operator a seconds-latency disable lever. ## Breaking Changes - [x] No breaking changes - [ ] Breaking changes described below: ## Deployment Notes - [ ] No special deployment steps required - [x] Deployment steps documented below: Two operator-managed secrets must be set out of band before the route accepts POST traffic: - `wrangler secret put TURNSTILE_SECRET` (and `--env staging`) - `wrangler secret put SESSION_HMAC_SECRET` (and `--env staging`) Until those are set, `/api/score` POST requests for non-registry inputs return 500 `service_misconfigured` (fail-fast by design). GET requests for registry-known tools work without either secret. Pull both values from the 1Password vault entry for the production Cloudflare account before promoting to prod. KV namespaces are already created and the IDs committed in wrangler.jsonc. Secondary rate-limit namespaces use `1003` (prod) and `1004` (staging); no operator action needed for those. ## Screenshots/Recordings n/a (server-side surface; UI form lands in U8). ## Checklist - [x] Code follows project conventions and style guidelines - [x] Commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) - [x] Self-review of code completed - [x] Tests added/updated and passing - [x] No new warnings or errors introduced - [x] Changes are backward compatible (or breaking changes documented) ## Additional Context The DO stub at `src/worker/score/do.ts` still returns `{error: sandbox_stub_until_u6}`. The handler surfaces that as a 503 with the structured `sandbox_stub_until_u6` ScoreError so the route is honest about its state. U6 replaces the DO stub with the real `@cloudflare/sandbox`-extending class and the install + score flow. `src/worker/spec-version.gen.ts` is a hand-edited placeholder for U5; U8 wires `src/build/build.mjs` to regenerate it from `src/data/spec/VERSION` and `content/principles/VERSION` at build time.

@graph

First production cut of the live-scoring stack. Promotes everything on dev since PR #91 (the prior release): plan U5-U10 worker code (handler, sandbox DO, container, R2 cache, rate limits, telemetry, kill switch, homepage form, shareable result URLs, monitoring runbook), the docker sandbox image with anc v0.4.0 baked in, deploy split + routing-drift follow-ups (#92 CI token plumbing), the contributor surface (nav, footer Source row, intake template, README rewrite), the build SRP refactor, the SEO JSON-LD @graph fix, the cross-migration rollback rehearsal evidence and recipe correction, and the env.SCORE handler guard that converts a mid-rollback CF 1101 into a typed 503 sandbox_unavailable. Lockstep image bump: both top-level containers[0].image and env.staging.containers[0].image at :9aed5c3 (anc-cli v0.4.0, sha256:dae72c56afe2f332e8745c0517f1ed5d21993470de663409dfc9b3973cdfe4c1). The image cleared staging deploy on dev push 26384622721; soak was skipped per release-cut decision. Triple-diff verification clean (134 files changed; no guarded-path leaks; expected B-diff is the prod-pin bump only).

brettdavies merged commit 3004c80 into dev May 15, 2026
2 checks passed

brettdavies deleted the fix/deep-check-cf-auth branch May 15, 2026 07:30

brettdavies mentioned this pull request May 15, 2026

feat(worker): /api/score live-scoring handler (plan U5) #93

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): pass CLOUDFLARE_API_TOKEN to deep-check wrangler dev steps#92

fix(ci): pass CLOUDFLARE_API_TOKEN to deep-check wrangler dev steps#92
brettdavies merged 1 commit into
devfrom
fix/deep-check-cf-auth

brettdavies commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brettdavies commented May 15, 2026

Summary

Changelog

Fixed

Type of Change

Related Issues/Stories

Files Modified

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant