Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# vibe check

> One snapshot, a swarm of agents, every bug at once.

A vision-driven adversarial QA tool. Point it at a SaaS app, and a swarm of GPT-4o-mini agents drive headless Chromium browsers in parallel — each pursuing a different attack intent (XSS, race conditions, validation bypass, crashes) — and stream their findings into a live tree UI as they fork.

Built in 48 hours for the Vercel × Parallel Agents hackathon.

---

## The idea

QA is sequential: one tester, one path, one bug at a time. Real apps fail in branching ways — the same form crashes on empty input, accepts negative numbers, double-submits, *and* reflects XSS, depending on which path you take.

**vibe check** parallelizes the search. At every interesting page (a "fork point"), an LLM looks at the live screenshot + DOM, generates 2–5 adversarial intents tailored to what's actually on screen, and spawns an isolated Chromium context per intent. Each fork runs its own vision-loop agent: screenshot → reason → click/fill/eval → repeat, until a verdict (`bug` / `passed` / `tolerable`) is returned. The "control" branch that completes the happy flow chains into the next fork point, so attack surfaces compound rather than reset.

The whole tree streams to the browser over SSE while it runs.

---

## How it works

```
ROOT (snapshot of app state)
├── fp1.intent-A xss-probe → bug
├── fp1.intent-B control-normal → passed ─┐
├── fp1.intent-C concurrency-stress → bug │
└── fp1.intent-D input-fuzz → bug │
┌── fp2.intent-A
├── fp2.intent-B
└── fp2.intent-C
```

1. **Discovery.** An LLM planner inspects the target and proposes fork points (form pages, settings, billing, etc.). For known demo targets, a pinned catalog skips discovery to avoid hallucinated 404 paths.
2. **Intent generation.** At each fork point, GPT-4o-mini sees a real screenshot + compact DOM and produces 2–5 intents grounded in what's visible — not a fixed taxonomy.
3. **Per-fork agent loop.** Each intent gets its own `BrowserContext`. The agent calls a tool-bound model that returns one of `click | fill | press | eval | done`, capped at 5 steps. CDP `captureScreenshot` polling streams frames into the UI.
4. **Verdict aggregation.** A small bug catalog (XSS, server-error, validation-bypass, broken-ui-state, duplicate-state, auth-bypass, data-leak, crash) is enforced via OpenAI tool-call schemas, so verdicts arrive as typed structured data.
5. **Live tree.** React Flow renders forks as they're created. Frames update in place. Each terminal node carries its evidence + a one-click "Claude-fix prompt" tailored to the bug.

---

## Stack

- **Next.js 16 (App Router) + React 19** — UI, API routes, Server Components
- **Playwright + @sparticuz/chromium** — headless browser per fork
- **OpenAI** — `gpt-4o-mini` vision + function-calling for the agent loop and intent planner
- **Server-Sent Events** — frame and event streaming to the tree UI
- **React Flow (`@xyflow/react`)** — fork-tree layout
- **Zod** — runtime validation of LLM-returned actions
- **Vercel Sandbox** — for sandboxed dev/preview targets
- **TypeScript, strict mode**

---

## Engineering details worth a look

- **`lib/fork-runner.ts`** — orchestrates the multi-fork-point chain, pinned catalogs for known demo hosts, and per-fork context isolation.
- **`lib/agent.ts`** — the vision agent loop. Tool-bound action schema means the model can never return free-form text the executor doesn't understand.
- **`lib/runs.ts`** — in-memory pub/sub run store with a replay log, so SSE clients that connect late get the full event history before live tail. Stored on `globalThis` to survive Next.js dev hot-reloads.
- **`lib/buggy-cart-server.ts`** — "Helix," a deliberately-buggy multi-page SaaS (issues / billing / settings) with 9 planted bugs across 5 categories. Doubles as a smoke target and a demo backdrop.
- **`components/fork-tree.tsx`** — live tree UI with capped row heights, inline bug evidence, and the Claude-fix prompt panel.
- **`scripts/fork-proof/run.ts`** — headless smoke harness that exercises the full fork-runner without the UI.

---

## Run locally

```bash
pnpm install
echo "OPENAI_API_KEY=sk-..." > .env.local
pnpm dev
```

Open http://localhost:3000, paste any URL (or leave blank to attack the built-in Helix app), and hit **Start**.

```bash
pnpm fork-proof # headless end-to-end smoke
pnpm helix # serve the buggy demo target standalone
pnpm m1:smoke # M1 sandbox smoke
```

---

## Project layout

```
app/
page.tsx landing + start CTA
runs/[id]/page.tsx live run view
api/runs/route.ts POST /api/runs (kicks off a run)
api/runs/[id]/... SSE stream + status endpoints
components/
fork-tree.tsx live tree UI (React Flow)
lib/
fork-runner.ts multi-fork orchestrator
agent.ts vision-loop adversarial agent
buggy-cart-server.ts the Helix demo target
runs.ts in-memory run store + pub/sub
events.ts typed run-event protocol
chromium-launcher.ts sparticuz/chromium wiring
scripts/
fork-proof/run.ts headless smoke
helix.ts serve Helix standalone
slides/ demo deck
```

---

## What's intentionally not here

This is a hackathon prototype. The run store is in-memory; there's no auth, no persistence, no rate limiting on the OpenAI calls, and the agent step cap is hardcoded. It's tuned for demos, not for production scale. The interesting parts are the agent loop, the fork topology, and the streaming protocol — those are all production-shaped.

---

## Credits

Built by Toby Thurston for the Vercel × Parallel Agents hackathon, 2026.
22 changes: 7 additions & 15 deletions app/globals.css
Original file line number Diff line number Diff line change
Expand Up @@ -95,25 +95,17 @@ kbd {

.brand {
display: inline-flex;
align-items: center;
align-items: baseline;
gap: 0.55rem;
letter-spacing: -0.01em;
}

.brand-mark {
display: inline-flex;
align-items: center;
justify-content: center;
width: 22px;
height: 22px;
border-radius: 5px;
background: var(--accent);
color: #0a0b0d;
}

.brand strong {
font-weight: 600;
font-size: 0.92rem;
font-family: var(--font-serif), Georgia, serif;
font-style: italic;
font-weight: 500;
font-size: 1.25rem;
letter-spacing: -0.02em;
color: var(--ink);
}

.env-chip {
Expand Down
11 changes: 0 additions & 11 deletions app/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -46,17 +46,6 @@ export default function Home() {

<nav className="top-nav">
<div className="brand">
<span className="brand-mark" aria-hidden="true">
<svg viewBox="0 0 16 16" width="14" height="14">
<path
d="M2 8 L7 8 M7 8 L13 3 M7 8 L13 8 M7 8 L13 13"
stroke="currentColor"
strokeWidth="1.5"
fill="none"
strokeLinecap="round"
/>
</svg>
</span>
<strong>vibe check</strong>
</div>
<div className="env-chip">
Expand Down
36 changes: 33 additions & 3 deletions components/fork-tree.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,12 @@ const STATUS_COLOR: Record<
}

function RootNodeView({ data }: NodeProps<Node<RootNode>>) {
const isLoading = data.cartSize === undefined
const cartLabel = isLoading
? 'Preparing shared state…'
: data.cartSize === 0
? 'Starting from an empty cart'
: `Cart starts with ${data.cartSize} item${data.cartSize === 1 ? '' : 's'}`
return (
<div
style={{
Expand All @@ -117,8 +123,32 @@ function RootNodeView({ data }: NodeProps<Node<RootNode>>) {
>
Fork point · shared state
</div>
<div style={{ fontSize: 15, marginTop: 6, fontWeight: 500, letterSpacing: '-0.01em' }}>
Cart has {data.cartSize ?? '—'} items
<div
style={{
fontSize: 15,
marginTop: 6,
fontWeight: 500,
letterSpacing: '-0.01em',
display: 'flex',
alignItems: 'center',
gap: 8,
color: isLoading ? '#9ea3ad' : '#ececee',
}}
>
{isLoading && (
<span
aria-hidden
style={{
width: 6,
height: 6,
borderRadius: '50%',
background: '#7aa7ff',
animation: 'pulse 1.4s ease-in-out infinite',
flexShrink: 0,
}}
/>
)}
{cartLabel}
</div>
<div
style={{
Expand Down Expand Up @@ -955,7 +985,7 @@ export function RunView({ runId }: { runId: string }) {
aria-label="back to home"
title="back to home"
>
<strong>vibe check</strong>
<strong>vibe check</strong>
<span className="tag">/ run {shortId(runId)}</span>
</Link>

Expand Down
77 changes: 73 additions & 4 deletions lib/fork-runner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,50 @@ type ForkPoint = {
/** id of the prior fork point this one chains from (parent will be that point's control). */
chainsFrom?: string
/** What server-side state to count toward "created N items" duplicate detection. */
countStateKey?: 'issues' | 'orders'
countStateKey?: 'issues' | 'orders' | 'tasks'
}

// Pinned fork-point catalogs for known demo targets. Used to bypass the
// LLM discovery pass when we already know the right paths — avoids the
// "agent navigated to a 404 then hallucinated #title" failure mode.
const KNOWN_TARGETS: Record<string, ForkPoint[]> = {
'riverline-hackathon-test.vercel.app': [
{
id: 'fp-tasks-new',
index: 0,
title: 'Create task',
initialUrl: '/tasks/new',
context:
'Task creation form (Riverline). Likely fields include title, description, priority, assignee. Probe for reflected XSS in the title, server crashes on empty inputs, and concurrent-submit duplicate creates.',
countStateKey: 'tasks',
},
{
id: 'fp-billing',
index: 1,
title: 'Billing checkout',
initialUrl: '/billing',
context:
'Plan / billing form (Riverline). Likely fields include seats, coupon, email, name, card. Probe for negative seats, coupon abuse, missing-email crashes, and concurrent-submit duplicate orders.',
countStateKey: 'orders',
chainsFrom: 'fp-tasks-new',
},
{
id: 'fp-settings',
index: 2,
title: 'Profile settings',
initialUrl: '/settings',
context:
'Profile / settings page (Riverline). Likely fields include display name and avatar URL. Probe for javascript: URL schemes and reflected payloads in profile fields.',
},
],
}

function pickKnownTargetForkPoints(serverUrl: string): ForkPoint[] | undefined {
try {
return KNOWN_TARGETS[new URL(serverUrl).hostname]
} catch {
return undefined
}
}

// ---------- Browser config ----------
Expand Down Expand Up @@ -328,7 +371,12 @@ async function evaluateVerdict(
let itemsCreated = 0
if (fp.countStateKey) {
try {
const path = fp.countStateKey === 'issues' ? '/api/issues' : '/api/orders'
const path =
fp.countStateKey === 'issues'
? '/api/issues'
: fp.countStateKey === 'tasks'
? '/api/tasks'
: '/api/orders'
const r = await page.evaluate(
(u) => fetch(u).then((rr) => rr.json()).catch(() => ({})),
serverUrl + path
Expand Down Expand Up @@ -621,13 +669,30 @@ async function runForkPoint(opts: {
// Warm a context to the fork-point URL and snapshot its state.
const warmCtx = await browser.newContext({ viewport: VIEWPORT })
const warmPage = await warmCtx.newPage()
let warmStatus: number | undefined
try {
await warmPage.goto(serverUrl + fp.initialUrl)
const resp = await warmPage.goto(serverUrl + fp.initialUrl)
warmStatus = resp?.status()
await warmPage.waitForLoadState('domcontentloaded').catch(() => {})
} catch (e) {
console.log(`[runner ${fp.id}] warm goto failed:`, (e as Error).message)
}

// If the fork-point URL doesn't actually exist on the target, bail out
// cleanly instead of generating intents against a 404 (which causes the
// agent to hallucinate selectors like #title that aren't on the page).
if (warmStatus !== undefined && warmStatus >= 400) {
console.log(`[runner ${fp.id}] warm goto returned HTTP ${warmStatus} — skipping fork point`)
await warmCtx.close().catch(() => {})
emit(runId, {
type: 'phase_complete',
phaseId: fp.id,
phaseIndex: fp.index,
at: Date.now(),
})
return { bugsFound: 0, controlForkId: undefined, intentsRun: 0 }
}

// Generate intents from a screenshot + DOM (or fall back).
let intents: GeneratedIntent[]
if (useLLM) {
Expand Down Expand Up @@ -747,7 +812,11 @@ export async function runForkExperiment(
// actually there (forms, mutations, inputs) and proposes 1-4 pages worth
// probing. Falls back to the entry URL alone if discovery fails or no API key.
let pointsToRun: ForkPoint[]
if (useLLM) {
const pinned = pickKnownTargetForkPoints(serverUrl)
if (pinned) {
console.log(`[runner] using pinned fork points for known target: ${new URL(serverUrl).hostname}`)
pointsToRun = pinned
} else if (useLLM) {
let discovered: Awaited<ReturnType<typeof discoverForkPoints>> | undefined
try {
const reconCtx = await browser.newContext({ viewport: VIEWPORT })
Expand Down
Loading