Skip to content

fix(ci): pass CLOUDFLARE_API_TOKEN to deep-check wrangler dev steps#92

Merged
brettdavies merged 1 commit into
devfrom
fix/deep-check-cf-auth
May 15, 2026
Merged

fix(ci): pass CLOUDFLARE_API_TOKEN to deep-check wrangler dev steps#92
brettdavies merged 1 commit into
devfrom
fix/deep-check-cf-auth

Conversation

@brettdavies
Copy link
Copy Markdown
Owner

Summary

Pass CLOUDFLARE_API_TOKEN + CLOUDFLARE_ACCOUNT_ID to the two deep-check.yml jobs (Playwright + Lighthouse CI) that spin up wrangler dev as the test web server. Without these env vars, wrangler 4.x exits with Not logged in before Playwright can connect, and the nightly fails 100% of the time.

The regression landed silently in PR #84 (U3-followup) when the sandbox container image moved off Docker Hub onto the Cloudflare managed registry. Anonymous pulls work for docker.io/...; the CF managed registry requires auth, and wrangler 4.x authenticates to read the container image manifest even under --local. ci.yml did not catch it because nothing in the ci.yml pipeline invokes wrangler dev. Today's first deep-check after PR #91 merged surfaced it as the Playwright + Lighthouse CI jobs failed simultaneously with the same Not logged in error from wrangler dev.

The same secrets deploy.yml already uses are passed here; no new provisioning needed. Header comment updated to document the new dependency.

Changelog

Fixed

  • deep-check.yml's e2e and lhci jobs no longer fail 100% with Not logged in from wrangler dev. Both jobs now have CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID plumbed through, which wrangler 4.x needs in order to read the container image manifest from the CF managed registry even under --local.

Type of Change

  • fix: Bug fix (non-breaking change which fixes an issue)

Related Issues/Stories

Files Modified

Modified:

  • .github/workflows/deep-check.yml: header comment under "Secrets" now documents the CLOUDFLARE_API_TOKEN / CLOUDFLARE_ACCOUNT_ID dependency and the reason. The e2e job's End-to-end tests step and the lhci job's Lighthouse CI step each gained an env: block setting CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID from the existing repo secrets.

Created:

  • None.

Renamed:

  • None.

Deleted:

  • None.

Testing

  • All tests passing locally (no new tests)

Test Summary:

  • 315 unit and regression tests pass locally.
  • Pre-push gate green.
  • The actual auth-fix verification path is post-merge: manually dispatch the deep-check workflow against the merge commit via gh workflow run deep-check.yml --ref main and confirm both jobs reach the test phase (no longer fail at Not logged in). Cannot fully verify pre-merge because the failure mode only surfaces in the GHA runner environment. Local bun run test:e2e works because the developer's shell already has CLOUDFLARE_API_TOKEN set.

Both jobs in deep-check.yml (e2e + lhci) start the test web server via
`wrangler dev --local`. In wrangler 4.x, dev mode authenticates to the
Cloudflare managed registry to read the container image manifest from
`registry.cloudflare.com/<acct>/anc-sandbox:<tag>` even under
`--local`. Without CLOUDFLARE_API_TOKEN in env, wrangler exits with
"Not logged in" before Playwright can connect, and the deep-check
nightly fails 100% of the time.

The regression landed silently in PR #84 (U3-followup), which migrated
the sandbox container image off Docker Hub. Docker Hub is anonymous
pullable; the CF managed registry isn't. ci.yml didn't surface this
because its tests don't invoke `wrangler dev` (only `bun test` and
`wrangler --dry-run`, both of which work without dev-mode auth or
already have the token plumbed).

Fix: pass the same CF_API_TOKEN + CF_ACCOUNT_ID secrets that deploy.yml
uses, on both the e2e step and the lhci step. No new secret
provisioning needed; the secrets exist in the repo already. Header
comment updated to reflect the new dependency and explain why.

Verification path: post-merge, manually dispatch deep-check via
`gh workflow run deep-check.yml` against the merge commit to confirm
the auth fix lands. Cannot fully verify pre-merge because the failure
mode only surfaces in the GHA runner environment (local dev has
CLOUDFLARE_API_TOKEN in shell env, so local `bun run test:e2e`
works fine).

Related: cloudflare/workers-sdk#13925 (today's other CF Workers
gotcha), but mechanically unrelated.
@brettdavies brettdavies merged commit 3004c80 into dev May 15, 2026
2 checks passed
@brettdavies brettdavies deleted the fix/deep-check-cf-auth branch May 15, 2026 07:30
brettdavies added a commit that referenced this pull request May 15, 2026
Deferring the deep-check Dockerfile-EXPOSE fix to U6, where the
image is being rebuilt anyway as part of replacing the Sandbox DO
stub. Captured in three places under U6:

- Files: explicit Modify line for docker/sandbox/Dockerfile with the
  EXPOSE directive requirement, including the port-3000 reservation
  note, the surfacing context (today's deep-check investigation), and
  the rationale for amortizing the image-rebuild ceremony into U6
  rather than spinning a separate image release.

- Test scenarios: new Integration (CI) item asserting deep-check.yml
  goes green on the U6 merge SHA. Both jobs (e2e + lhci) use
  wrangler dev --local as their test webServer, so they prove the
  EXPOSE is correct end-to-end.

- Verification: deep-check passing becomes part of U6 acceptance
  alongside the existing two-phase-egress and anc_version invariants.

Background: PR #92 (today) added CLOUDFLARE_API_TOKEN to deep-check's
e2e + lhci jobs, which resolved the first failure mode (Not logged in)
but surfaced the second (container ports). The fix is a one-line
Dockerfile change; the cost is the image rebuild + push + lockstep
pin bump. Bundling with U6 is cheaper than a standalone image
release for a stub container with no functional runtime yet.
brettdavies added a commit that referenced this pull request May 15, 2026
## Summary

Implements plan U5 of the live-scoring rollout: wires `/api/score` into
the Worker with content negotiation, response shape with the R11 triad,
registry-fast-path (unmetered) for known tools, and the bot-defense /
abuse-mitigation stack (Turnstile siteverify, signed HMAC session
cookie, KV kill switch, dual rate-limits). The DO stub still returns
`sandbox_stub_until_u6`; this PR exercises the route end-to-end as far
as the DO boundary so U6 can drop in the real sandbox without further
plumbing.

Also adds a regression test for the wrangler.jsonc inheritance fix (PR
#90) so that the `env.staging.routes: []` and `env.staging.triggers: {
crons: [] }` overrides cannot be silently removed without CI catching
it. Driven by the recent compounded learning at
`docs/solutions/integration-issues/wrangler-routes-inheritance-staging-custom-domain-drift-2026-05-15.md`.

## Changelog

### Added

- Add `/api/score` endpoint that returns registry-known tool scorecards
(unmetered fast path pointing at the existing `/score/<slug>` page) and
runs the full bot-defense pipeline for unknown tools, returning a
`sandbox_stub_until_u6` envelope until U6 ships.
- Add `SCORE_KV` namespace binding for the operator-flippable
`scoring_disabled` kill switch (flip with `wrangler kv key put
--binding=SCORE_KV scoring_disabled true`).
- Add `SCORE_LIMITER_IP` secondary rate-limit binding (30 req / 60 s /
IP) as a coarse fallback when a session cookie is swapped to dodge the
primary session-keyed limiter.

### Changed

- Augment `dist/registry-index.json` entries with `version`,
`anc_version`, and `scorecard_url` for every tool that has a committed
scorecard, so the Worker can build the R11 triad and route registry hits
to the existing per-tool page without fetching the scorecard payload.

## Type of Change

- [x] `feat`: New feature (non-breaking change which adds functionality)

## Related Issues/Stories

- Story: n/a
- Issue: n/a
- Architecture:
docs/plans/2026-04-28-002-feat-live-scoring-cf-sandbox-plan.md (plan U5)
- Related PRs: #78 (U1), #79 (U2), #80 (U4), #81 (U3), #84 (U3-fu), #90
(env.staging routes override), #92 (deep-check auth fix)

## Testing

- [x] Unit tests added/updated
- [x] Integration tests added/updated
- [x] Manual testing completed
- [x] All tests passing

**Test Summary:**

- Unit tests: 377 passing (315 baseline + 62 new across 4 files)
- Integration tests: covered by `tests/score-handler.test.ts` exercising
the full pipeline end-to-end against stubbed bindings
- Coverage: every `ScoreError` discriminated-union variant has an
HTTP-status mapping test; q-value content negotiation has the regression
test required by the plan (`Accept: text/markdown;q=0.1,
application/json;q=0.9` resolves to JSON, not markdown)

Locally verified:

- `bun test` (377/377 pass)
- `bunx biome check src/ tests/` (clean)
- `bunx tsc --noEmit` (31 pre-existing errors, none in U5 code; baseline
before this PR was 39)
- `bun run build` (registry-index emits with enriched entries; ripgrep
entry verified to carry `version: 15.1.0`, `anc_version: 0.3.0`,
`scorecard_url: /score/ripgrep`)

## Files Modified

**Modified:**

- `src/build/build.mjs`, `src/build/registry-index.mjs`: reorder build
pipeline so scorecards load before indexes emit; pass an enrichment map
keyed by tool name into registry-index emission.
- `src/worker/index.ts`: add the `/api/score` path-prefix branch above
the asset call; loosen Env to permit narrow asset-only test stubs.
- `src/worker/accept.ts`: add `detectScorePreference` for the
JSON/markdown surface; keep `detectPreference` unchanged for site-side
paths.
- `src/worker/score/registry-lookup.ts`: extend `RegistryEntry` with
optional `version`, `anc_version`, `scorecard_url`.
- `wrangler.jsonc`: add `kv_namespaces: SCORE_KV` and `ratelimits:
SCORE_LIMITER_IP` to both prod and staging.
- `src/worker-configuration.d.ts`: regenerated via `wrangler types`.
- `tests/worker.test.ts`: add a `/api/score routing` describe block with
the plan-required q-value test.

**Created:**

- `src/worker/score/response-shape.ts`: `ScoreError` discriminated union
(18 variants), `statusForError`, `shapeScoreSuccess`, `shapeScoreError`
with R11 triad enforcement.
- `src/worker/score/content-negotiation.ts`: URL-suffix + q-value
content detection for `/api/score(.json|.md)`.
- `src/worker/score/kill-switch.ts`: SCORE_KV `scoring_disabled` reader
with 30 s in-memory cache.
- `src/worker/score/session.ts`: signed `__Host-anc-session` cookie
issue / parse via Web Crypto HMAC-SHA256.
- `src/worker/score/turnstile.ts`: Turnstile siteverify wrapper.
- `src/worker/score/handler.ts`: orchestrates the full pipeline.
- `src/worker/spec-version.gen.ts`: placeholder; U8 wires the
build-emit.
- `tests/score-response-shape.test.ts`: 27 tests covering every
ScoreError variant + R11 triad enforcement.
- `tests/score-handler.test.ts`: 23 tests covering registry-fast-path,
kill switch, Turnstile, rate-limits, DO passthrough, session cookie,
service-misconfigured fail-fast, content negotiation.
- `tests/wrangler-config.test.ts`: 9 tests guarding the wrangler.jsonc
inheritance fix.

**Renamed:**

- None.

**Deleted:**

- None.

## Key Features

- Registry-fast-path returns cache-hit-shaped JSON (no live sandbox
spin) for any tool with a committed scorecard, including a GET surface
(`GET /api/score?input=ripgrep`) for paste-and-share without form
interaction.
- R11 response contract is structurally enforced: every success response
includes `spec_version`, `anc_version`, `checker_url`; a missing
`anc_version` returns 500 `incomplete_response_contract` rather than a
silent partial.
- ScoreError exhaustiveness: adding a new variant without updating
`statusForError` is a compile error.
- Operator kill switch: `wrangler kv key put --binding=SCORE_KV
scoring_disabled true` propagates to all isolates within 30 s.
- Dual rate-limit: primary session-keyed limiter (cache-friendly:
same-tool requests in a session don't burn budget); coarse per-IP
fallback catches cookie-swappers.

## Benefits

- Unblocks U6 (real sandbox) and U7 (R2 cache). The route surface is
stable; U6 swaps the DO stub for the real implementation behind the
existing envelope.
- Cost ceiling defenses are in place at launch: Turnstile blocks
headless / no-JS attackers; rate-limits cap distinct-tool requests per
session and per IP; KV kill switch gives the operator a seconds-latency
disable lever.

## Breaking Changes

- [x] No breaking changes
- [ ] Breaking changes described below:

## Deployment Notes

- [ ] No special deployment steps required
- [x] Deployment steps documented below:

Two operator-managed secrets must be set out of band before the route
accepts POST traffic:

- `wrangler secret put TURNSTILE_SECRET` (and `--env staging`)
- `wrangler secret put SESSION_HMAC_SECRET` (and `--env staging`)

Until those are set, `/api/score` POST requests for non-registry inputs
return 500 `service_misconfigured` (fail-fast by design). GET requests
for registry-known tools work without either secret. Pull both values
from the 1Password vault entry for the production Cloudflare account
before promoting to prod.

KV namespaces are already created and the IDs committed in
wrangler.jsonc. Secondary rate-limit namespaces use `1003` (prod) and
`1004` (staging); no operator action needed for those.

## Screenshots/Recordings

n/a (server-side surface; UI form lands in U8).

## Checklist

- [x] Code follows project conventions and style guidelines
- [x] Commit messages follow [Conventional
Commits](https://www.conventionalcommits.org/)
- [x] Self-review of code completed
- [x] Tests added/updated and passing
- [x] No new warnings or errors introduced
- [x] Changes are backward compatible (or breaking changes documented)

## Additional Context

The DO stub at `src/worker/score/do.ts` still returns `{error:
sandbox_stub_until_u6}`. The handler surfaces that as a 503 with the
structured `sandbox_stub_until_u6` ScoreError so the route is honest
about its state. U6 replaces the DO stub with the real
`@cloudflare/sandbox`-extending class and the install + score flow.

`src/worker/spec-version.gen.ts` is a hand-edited placeholder for U5; U8
wires `src/build/build.mjs` to regenerate it from
`src/data/spec/VERSION` and `content/principles/VERSION` at build time.
brettdavies added a commit that referenced this pull request May 25, 2026
First production cut of the live-scoring stack. Promotes everything on
dev since PR #91 (the prior release): plan U5-U10 worker code (handler,
sandbox DO, container, R2 cache, rate limits, telemetry, kill switch,
homepage form, shareable result URLs, monitoring runbook), the docker
sandbox image with anc v0.4.0 baked in, deploy split + routing-drift
follow-ups (#92 CI token plumbing), the contributor surface (nav,
footer Source row, intake template, README rewrite), the build SRP
refactor, the SEO JSON-LD @graph fix, the cross-migration rollback
rehearsal evidence and recipe correction, and the env.SCORE handler
guard that converts a mid-rollback CF 1101 into a typed 503
sandbox_unavailable.

Lockstep image bump: both top-level containers[0].image and
env.staging.containers[0].image at :9aed5c3 (anc-cli v0.4.0,
sha256:dae72c56afe2f332e8745c0517f1ed5d21993470de663409dfc9b3973cdfe4c1).
The image cleared staging deploy on dev push 26384622721; soak was
skipped per release-cut decision.

Triple-diff verification clean (134 files changed; no guarded-path
leaks; expected B-diff is the prod-pin bump only).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant