Performance snapshots + diffing across runs

## Context

Today, running `/crawl-sim https://example.com` twice produces two independent reports. The second run has no idea the first run existed. Users have no way to answer:

- "Did the refactor we just shipped actually improve our GPTBot score?"
- "Which findings regressed since last week?"
- "Is our llms.txt score trending up?"

Claude's memory system can't be the answer here because:
- It's per-user and per-session
- It doesn't survive a fresh Claude Code install, a machine change, or a memory wipe
- It's not shareable with teammates
- It's not available at all on other agent harnesses (Codex, Gemini, custom runners)

The answer needs to live **in the file system**, next to the audited project, independent of any particular agent platform.

## Principle

> **Every audit writes a snapshot. Every audit compares against the previous snapshot for the same URL.** No agent memory required.

## Proposal

### File layout

```
<cwd>/.crawl-sim/
├── snapshots/
│   ├── example_com.json            # latest snapshot for this URL
│   └── example_com.history.jsonl   # append-only log of all snapshots
└── README.md                        # auto-generated, explains the dir
```

The directory lives in the **user's working directory**, not the crawl-sim install location, so each project gets its own snapshot history.

### Snapshot schema

Each snapshot is the current `compute-score.sh` output plus two metadata fields:

```json
{
  "url": "https://example.com",
  "timestamp": "2026-04-11T22:41:00Z",
  "version": "0.1.0",
  "gitCommit": "bb62a6d (optional — recorded if the cwd is a git repo)",
  "overall": { "score": 88, "grade": "A-" },
  "bots": { ... },
  "categories": { ... },
  "findings": [ ... ]
}
```

URL → filename mapping: replace `/` with `_`, drop scheme, drop query string. `https://example.com/blog/post?ref=x` → `example_com_blog_post.json`.

### Diffing

When a new audit runs and a previous snapshot exists for the same URL:

1. Load the old snapshot
2. Compute deltas per bot, per category, and overall
3. Match findings by title/category to detect resolved vs persisting vs new issues
4. Add a "since last run" block to the output

Example output addition:

```
Overall: 88/100 (A-)    [▲ +6 since 2026-04-04 (17 days ago)]

Per-bot change:
  Googlebot      95  A   [▲ +2]
  GPTBot         82  B+  [▲ +12]  ← biggest improvement
  ClaudeBot      82  B+  [▲ +12]
  PerplexityBot  79  B   [—]

Per-category change:
  Accessibility       96  A   [—]
  Content Visibility  81  B+  [▲ +14]  ← likely fix: SSR article cards
  Structured Data     92  A   [▼ -7]   ← regression: BreadcrumbList removed
  Technical Signals   90  A   [—]
  AI Readiness        65  C   [▲ +15]  ← new llms.txt file

Resolved since last run: 2 findings
Persisting: 3 findings
New: 1 finding (Structured Data regression)
```

After showing the diff, append the new snapshot to both the latest file and the `.history.jsonl` log.

### History file

`example_com.history.jsonl` is append-only. One JSON object per line, one per run. Useful for:
- Future trend charts
- Debugging "when did this regress?"
- Generating sparklines in reports

Unlimited retention for now; add pruning (e.g., keep last 100) if the file gets unwieldy.

### New script

`scripts/compare-snapshots.sh <current.json> <previous.json>` — outputs a JSON diff object that the skill can include in the narrative. Pure function, easy to test.

### Orchestration changes

SKILL.md step 9 ("Interpret and produce output") grows a pre-step:

```
Step 8.5: Diff against previous snapshot
  - Look for .crawl-sim/snapshots/<url>.json in the cwd
  - If it exists, load it and run compare-snapshots.sh
  - Pass the diff to the interpretation step
  - At the end, write the new snapshot to the same location and append to history.jsonl
```

### Flags

- `/crawl-sim <url> --no-snapshot` — skip writing the new snapshot (useful for exploratory runs)
- `/crawl-sim <url> --no-diff` — skip the comparison step
- `/crawl-sim <url> --baseline` — force overwrite the existing snapshot as the new reference point (resets deltas to zero)

## Future: git-tracking

Not scoping this now, but noting for later: an opt-in mode where `.crawl-sim/snapshots/` is **committed to the user's repo** so every PR's audit is version-controlled. This enables:

- PR checks that fail the build on score regressions (`jq -e '.overall.score >= 70'`)
- Historical review via `git log -p .crawl-sim/snapshots/`
- Team-shareable baselines without a central database

A follow-up issue can add a `--git` flag that ensures `.crawl-sim/` is tracked, adds a suggested `.gitignore` exemption, and outputs a suggested CI step. For now, file-based local snapshots are enough — leave it out of this issue.

## Acceptance criteria

- [ ] `scripts/compare-snapshots.sh <current.json> <previous.json>` exists, outputs a structured diff JSON
- [ ] `compute-score.sh` output is stable enough to diff cleanly across runs (no timestamps / random IDs in the comparable fields)
- [ ] SKILL.md orchestrates the snapshot load/diff/save cycle using `<cwd>/.crawl-sim/snapshots/`
- [ ] The output format includes per-bot, per-category, and per-finding deltas
- [ ] URL → filename mapping handles query strings, trailing slashes, and special chars safely
- [ ] `.history.jsonl` is append-only and never rewritten
- [ ] Flags: `--no-snapshot`, `--no-diff`, `--baseline`
- [ ] README documents the snapshot directory layout and mentions git-tracking as a future option
- [ ] The skill auto-creates `.crawl-sim/README.md` on first use to explain the directory

## Out of scope

- Git-tracking automation (future issue)
- Trend charts / visualizations (future, once history has enough data)
- Cloud sync of snapshots (never — the whole point is local-first and agent-agnostic)
- Pruning old history (add when it becomes a real problem)

## Why this is the best answer

- Works identically across agent platforms (Claude Code, Codex, custom runners)
- Survives memory wipes, machine changes, and reinstalls
- Shareable with teams via the user's existing version control
- Enables CI integration with no new infrastructure
- Keeps the "scripts as deterministic evidence" contract — the snapshot is just another script output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance snapshots + diffing across runs #9

Context

Principle

Proposal

File layout

Snapshot schema

Diffing

History file

New script

Orchestration changes

Flags

Future: git-tracking

Acceptance criteria

Out of scope

Why this is the best answer

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance snapshots + diffing across runs #9

Description

Context

Principle

Proposal

File layout

Snapshot schema

Diffing

History file

New script

Orchestration changes

Flags

Future: git-tracking

Acceptance criteria

Out of scope

Why this is the best answer

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions