TobyKThurston · TobyKThurston · Apr 25, 2026 · Apr 26, 2026
diff --git a/README.md b/README.md
@@ -0,0 +1,118 @@
+# vibe check
+
+> One snapshot, a swarm of agents, every bug at once.
+
+A vision-driven adversarial QA tool. Point it at a SaaS app, and a swarm of GPT-4o-mini agents drive headless Chromium browsers in parallel — each pursuing a different attack intent (XSS, race conditions, validation bypass, crashes) — and stream their findings into a live tree UI as they fork.
+
+Built in 48 hours for the Vercel × Parallel Agents hackathon.
+
+---
+
+## The idea
+
+QA is sequential: one tester, one path, one bug at a time. Real apps fail in branching ways — the same form crashes on empty input, accepts negative numbers, double-submits, *and* reflects XSS, depending on which path you take.
+
+**vibe check** parallelizes the search. At every interesting page (a "fork point"), an LLM looks at the live screenshot + DOM, generates 2–5 adversarial intents tailored to what's actually on screen, and spawns an isolated Chromium context per intent. Each fork runs its own vision-loop agent: screenshot → reason → click/fill/eval → repeat, until a verdict (`bug` / `passed` / `tolerable`) is returned. The "control" branch that completes the happy flow chains into the next fork point, so attack surfaces compound rather than reset.
+
+The whole tree streams to the browser over SSE while it runs.
+
+---
+
+## How it works
+
+```
+ROOT (snapshot of app state)
+├── fp1.intent-A   xss-probe          → bug
+├── fp1.intent-B   control-normal     → passed ─┐
+├── fp1.intent-C   concurrency-stress → bug     │
+└── fp1.intent-D   input-fuzz         → bug     │
+                                                ▼
+                                ┌── fp2.intent-A
+                                ├── fp2.intent-B
+                                └── fp2.intent-C
+```
+
+1. **Discovery.** An LLM planner inspects the target and proposes fork points (form pages, settings, billing, etc.). For known demo targets, a pinned catalog skips discovery to avoid hallucinated 404 paths.
+2. **Intent generation.** At each fork point, GPT-4o-mini sees a real screenshot + compact DOM and produces 2–5 intents grounded in what's visible — not a fixed taxonomy.
+3. **Per-fork agent loop.** Each intent gets its own `BrowserContext`. The agent calls a tool-bound model that returns one of `click | fill | press | eval | done`, capped at 5 steps. CDP `captureScreenshot` polling streams frames into the UI.
+4. **Verdict aggregation.** A small bug catalog (XSS, server-error, validation-bypass, broken-ui-state, duplicate-state, auth-bypass, data-leak, crash) is enforced via OpenAI tool-call schemas, so verdicts arrive as typed structured data.
+5. **Live tree.** React Flow renders forks as they're created. Frames update in place. Each terminal node carries its evidence + a one-click "Claude-fix prompt" tailored to the bug.
+
+---
+
+## Stack
+
+- **Next.js 16 (App Router) + React 19** — UI, API routes, Server Components
+- **Playwright + @sparticuz/chromium** — headless browser per fork
+- **OpenAI** — `gpt-4o-mini` vision + function-calling for the agent loop and intent planner
+- **Server-Sent Events** — frame and event streaming to the tree UI
+- **React Flow (`@xyflow/react`)** — fork-tree layout
+- **Zod** — runtime validation of LLM-returned actions
+- **Vercel Sandbox** — for sandboxed dev/preview targets
+- **TypeScript, strict mode**
+
+---
+
+## Engineering details worth a look
+
+- **`lib/fork-runner.ts`** — orchestrates the multi-fork-point chain, pinned catalogs for known demo hosts, and per-fork context isolation.
+- **`lib/agent.ts`** — the vision agent loop. Tool-bound action schema means the model can never return free-form text the executor doesn't understand.
+- **`lib/runs.ts`** — in-memory pub/sub run store with a replay log, so SSE clients that connect late get the full event history before live tail. Stored on `globalThis` to survive Next.js dev hot-reloads.
+- **`lib/buggy-cart-server.ts`** — "Helix," a deliberately-buggy multi-page SaaS (issues / billing / settings) with 9 planted bugs across 5 categories. Doubles as a smoke target and a demo backdrop.
+- **`components/fork-tree.tsx`** — live tree UI with capped row heights, inline bug evidence, and the Claude-fix prompt panel.
+- **`scripts/fork-proof/run.ts`** — headless smoke harness that exercises the full fork-runner without the UI.
+
+---
+
+## Run locally
+
+```bash
+pnpm install
+echo "OPENAI_API_KEY=sk-..." > .env.local
+pnpm dev
+```
+
+Open http://localhost:3000, paste any URL (or leave blank to attack the built-in Helix app), and hit **Start**.
+
+```bash
+pnpm fork-proof          # headless end-to-end smoke
+pnpm helix               # serve the buggy demo target standalone
+pnpm m1:smoke            # M1 sandbox smoke
+```
+
+---
+
+## Project layout
+
+```
+app/
+  page.tsx               landing + start CTA
+  runs/[id]/page.tsx     live run view
+  api/runs/route.ts      POST /api/runs (kicks off a run)
+  api/runs/[id]/...      SSE stream + status endpoints
+components/
+  fork-tree.tsx          live tree UI (React Flow)
+lib/
+  fork-runner.ts         multi-fork orchestrator
+  agent.ts               vision-loop adversarial agent
+  buggy-cart-server.ts   the Helix demo target
+  runs.ts                in-memory run store + pub/sub
+  events.ts              typed run-event protocol
+  chromium-launcher.ts   sparticuz/chromium wiring
+scripts/
+  fork-proof/run.ts      headless smoke
+  helix.ts               serve Helix standalone
+slides/                  demo deck
+```
+
+---
+
+## What's intentionally not here
+
+This is a hackathon prototype. The run store is in-memory; there's no auth, no persistence, no rate limiting on the OpenAI calls, and the agent step cap is hardcoded. It's tuned for demos, not for production scale. The interesting parts are the agent loop, the fork topology, and the streaming protocol — those are all production-shaped.
+
+---
+
+## Credits
+
+Built by Toby Thurston for the Vercel × Parallel Agents hackathon, 2026.
diff --git a/app/globals.css b/app/globals.css
@@ -95,25 +95,17 @@ kbd {
 
 .brand {
   display: inline-flex;
-  align-items: center;
+  align-items: baseline;
   gap: 0.55rem;
-  letter-spacing: -0.01em;
-}
-
-.brand-mark {
-  display: inline-flex;
-  align-items: center;
-  justify-content: center;
-  width: 22px;
-  height: 22px;
-  border-radius: 5px;
-  background: var(--accent);
-  color: #0a0b0d;
 }
 
 .brand strong {
-  font-weight: 600;
-  font-size: 0.92rem;
+  font-family: var(--font-serif), Georgia, serif;
+  font-style: italic;
+  font-weight: 500;
+  font-size: 1.25rem;
+  letter-spacing: -0.02em;
+  color: var(--ink);
 }
 
 .env-chip {

diff --git a/app/page.tsx b/app/page.tsx
@@ -46,17 +46,6 @@ export default function Home() {
 
       <nav className="top-nav">
         <div className="brand">
-          <span className="brand-mark" aria-hidden="true">
-            <svg viewBox="0 0 16 16" width="14" height="14">
-              <path
-                d="M2 8 L7 8 M7 8 L13 3 M7 8 L13 8 M7 8 L13 13"
-                stroke="currentColor"
-                strokeWidth="1.5"
-                fill="none"
-                strokeLinecap="round"
-              />
-            </svg>
-          </span>
           <strong>vibe check</strong>
         </div>
         <div className="env-chip">

diff --git a/components/fork-tree.tsx b/components/fork-tree.tsx
@@ -94,6 +94,12 @@ const STATUS_COLOR: Record<
 }
 
 function RootNodeView({ data }: NodeProps<Node<RootNode>>) {
+  const isLoading = data.cartSize === undefined
+  const cartLabel = isLoading
+    ? 'Preparing shared state…'
+    : data.cartSize === 0
+    ? 'Starting from an empty cart'
+    : `Cart starts with ${data.cartSize} item${data.cartSize === 1 ? '' : 's'}`
   return (
     <div
       style={{
@@ -117,8 +123,32 @@ function RootNodeView({ data }: NodeProps<Node<RootNode>>) {
       >
         Fork point · shared state
       </div>
-      <div style={{ fontSize: 15, marginTop: 6, fontWeight: 500, letterSpacing: '-0.01em' }}>
-        Cart has {data.cartSize ?? '—'} items
+      <div
+        style={{
+          fontSize: 15,
+          marginTop: 6,
+          fontWeight: 500,
+          letterSpacing: '-0.01em',
+          display: 'flex',
+          alignItems: 'center',
+          gap: 8,
+          color: isLoading ? '#9ea3ad' : '#ececee',
+        }}
+      >
+        {isLoading && (
+          <span
+            aria-hidden
+            style={{
+              width: 6,
+              height: 6,
+              borderRadius: '50%',
+              background: '#7aa7ff',
+              animation: 'pulse 1.4s ease-in-out infinite',
+              flexShrink: 0,
+            }}
+          />
+        )}
+        {cartLabel}
       </div>
       <div
         style={{
@@ -955,7 +985,7 @@ export function RunView({ runId }: { runId: string }) {
           aria-label="back to home"
           title="back to home"
         >
-          ◆ <strong>vibe check</strong>
+          <strong>vibe check</strong>
           <span className="tag">/ run {shortId(runId)}</span>
         </Link>
 

diff --git a/lib/fork-runner.ts b/lib/fork-runner.ts
@@ -46,7 +46,50 @@ type ForkPoint = {
   /** id of the prior fork point this one chains from (parent will be that point's control). */
   chainsFrom?: string
   /** What server-side state to count toward "created N items" duplicate detection. */
-  countStateKey?: 'issues' | 'orders'
+  countStateKey?: 'issues' | 'orders' | 'tasks'
+}
+
+// Pinned fork-point catalogs for known demo targets. Used to bypass the
+// LLM discovery pass when we already know the right paths — avoids the
+// "agent navigated to a 404 then hallucinated #title" failure mode.
+const KNOWN_TARGETS: Record<string, ForkPoint[]> = {
+  'riverline-hackathon-test.vercel.app': [
+    {
+      id: 'fp-tasks-new',
+      index: 0,
+      title: 'Create task',
+      initialUrl: '/tasks/new',
+      context:
+        'Task creation form (Riverline). Likely fields include title, description, priority, assignee. Probe for reflected XSS in the title, server crashes on empty inputs, and concurrent-submit duplicate creates.',
+      countStateKey: 'tasks',
+    },
+    {
+      id: 'fp-billing',
+      index: 1,
+      title: 'Billing checkout',
+      initialUrl: '/billing',
+      context:
+        'Plan / billing form (Riverline). Likely fields include seats, coupon, email, name, card. Probe for negative seats, coupon abuse, missing-email crashes, and concurrent-submit duplicate orders.',
+      countStateKey: 'orders',
+      chainsFrom: 'fp-tasks-new',
+    },
+    {
+      id: 'fp-settings',
+      index: 2,
+      title: 'Profile settings',
+      initialUrl: '/settings',
+      context:
+        'Profile / settings page (Riverline). Likely fields include display name and avatar URL. Probe for javascript: URL schemes and reflected payloads in profile fields.',
+    },
+  ],
+}
+
+function pickKnownTargetForkPoints(serverUrl: string): ForkPoint[] | undefined {
+  try {
+    return KNOWN_TARGETS[new URL(serverUrl).hostname]
+  } catch {
+    return undefined
+  }
 }
 
 // ---------- Browser config ----------
@@ -328,7 +371,12 @@ async function evaluateVerdict(
   let itemsCreated = 0
   if (fp.countStateKey) {
     try {
-      const path = fp.countStateKey === 'issues' ? '/api/issues' : '/api/orders'
+      const path =
+        fp.countStateKey === 'issues'
+          ? '/api/issues'
+          : fp.countStateKey === 'tasks'
+            ? '/api/tasks'
+            : '/api/orders'
       const r = await page.evaluate(
         (u) => fetch(u).then((rr) => rr.json()).catch(() => ({})),
         serverUrl + path
@@ -621,13 +669,30 @@ async function runForkPoint(opts: {
   // Warm a context to the fork-point URL and snapshot its state.
   const warmCtx = await browser.newContext({ viewport: VIEWPORT })
   const warmPage = await warmCtx.newPage()
+  let warmStatus: number | undefined
   try {
-    await warmPage.goto(serverUrl + fp.initialUrl)
+    const resp = await warmPage.goto(serverUrl + fp.initialUrl)
+    warmStatus = resp?.status()
     await warmPage.waitForLoadState('domcontentloaded').catch(() => {})
   } catch (e) {
     console.log(`[runner ${fp.id}] warm goto failed:`, (e as Error).message)
   }
 
+  // If the fork-point URL doesn't actually exist on the target, bail out
+  // cleanly instead of generating intents against a 404 (which causes the
+  // agent to hallucinate selectors like #title that aren't on the page).
+  if (warmStatus !== undefined && warmStatus >= 400) {
+    console.log(`[runner ${fp.id}] warm goto returned HTTP ${warmStatus} — skipping fork point`)
+    await warmCtx.close().catch(() => {})
+    emit(runId, {
+      type: 'phase_complete',
+      phaseId: fp.id,
+      phaseIndex: fp.index,
+      at: Date.now(),
+    })
+    return { bugsFound: 0, controlForkId: undefined, intentsRun: 0 }
+  }
+
   // Generate intents from a screenshot + DOM (or fall back).
   let intents: GeneratedIntent[]
   if (useLLM) {
@@ -747,7 +812,11 @@ export async function runForkExperiment(
   // actually there (forms, mutations, inputs) and proposes 1-4 pages worth
   // probing. Falls back to the entry URL alone if discovery fails or no API key.
   let pointsToRun: ForkPoint[]
-  if (useLLM) {
+  const pinned = pickKnownTargetForkPoints(serverUrl)
+  if (pinned) {
+    console.log(`[runner] using pinned fork points for known target: ${new URL(serverUrl).hostname}`)
+    pointsToRun = pinned
+  } else if (useLLM) {
     let discovered: Awaited<ReturnType<typeof discoverForkPoints>> | undefined
     try {
       const reconCtx = await browser.newContext({ viewport: VIEWPORT })