Skip to content

Performance snapshots + diffing across runs #9

@BraedenBDev

Description

@BraedenBDev

Context

Today, running /crawl-sim https://example.com twice produces two independent reports. The second run has no idea the first run existed. Users have no way to answer:

  • "Did the refactor we just shipped actually improve our GPTBot score?"
  • "Which findings regressed since last week?"
  • "Is our llms.txt score trending up?"

Claude's memory system can't be the answer here because:

  • It's per-user and per-session
  • It doesn't survive a fresh Claude Code install, a machine change, or a memory wipe
  • It's not shareable with teammates
  • It's not available at all on other agent harnesses (Codex, Gemini, custom runners)

The answer needs to live in the file system, next to the audited project, independent of any particular agent platform.

Principle

Every audit writes a snapshot. Every audit compares against the previous snapshot for the same URL. No agent memory required.

Proposal

File layout

<cwd>/.crawl-sim/
├── snapshots/
│   ├── example_com.json            # latest snapshot for this URL
│   └── example_com.history.jsonl   # append-only log of all snapshots
└── README.md                        # auto-generated, explains the dir

The directory lives in the user's working directory, not the crawl-sim install location, so each project gets its own snapshot history.

Snapshot schema

Each snapshot is the current compute-score.sh output plus two metadata fields:

{
  "url": "https://example.com",
  "timestamp": "2026-04-11T22:41:00Z",
  "version": "0.1.0",
  "gitCommit": "bb62a6d (optional — recorded if the cwd is a git repo)",
  "overall": { "score": 88, "grade": "A-" },
  "bots": { ... },
  "categories": { ... },
  "findings": [ ... ]
}

URL → filename mapping: replace / with _, drop scheme, drop query string. https://example.com/blog/post?ref=xexample_com_blog_post.json.

Diffing

When a new audit runs and a previous snapshot exists for the same URL:

  1. Load the old snapshot
  2. Compute deltas per bot, per category, and overall
  3. Match findings by title/category to detect resolved vs persisting vs new issues
  4. Add a "since last run" block to the output

Example output addition:

Overall: 88/100 (A-)    [▲ +6 since 2026-04-04 (17 days ago)]

Per-bot change:
  Googlebot      95  A   [▲ +2]
  GPTBot         82  B+  [▲ +12]  ← biggest improvement
  ClaudeBot      82  B+  [▲ +12]
  PerplexityBot  79  B   [—]

Per-category change:
  Accessibility       96  A   [—]
  Content Visibility  81  B+  [▲ +14]  ← likely fix: SSR article cards
  Structured Data     92  A   [▼ -7]   ← regression: BreadcrumbList removed
  Technical Signals   90  A   [—]
  AI Readiness        65  C   [▲ +15]  ← new llms.txt file

Resolved since last run: 2 findings
Persisting: 3 findings
New: 1 finding (Structured Data regression)

After showing the diff, append the new snapshot to both the latest file and the .history.jsonl log.

History file

example_com.history.jsonl is append-only. One JSON object per line, one per run. Useful for:

  • Future trend charts
  • Debugging "when did this regress?"
  • Generating sparklines in reports

Unlimited retention for now; add pruning (e.g., keep last 100) if the file gets unwieldy.

New script

scripts/compare-snapshots.sh <current.json> <previous.json> — outputs a JSON diff object that the skill can include in the narrative. Pure function, easy to test.

Orchestration changes

SKILL.md step 9 ("Interpret and produce output") grows a pre-step:

Step 8.5: Diff against previous snapshot
  - Look for .crawl-sim/snapshots/<url>.json in the cwd
  - If it exists, load it and run compare-snapshots.sh
  - Pass the diff to the interpretation step
  - At the end, write the new snapshot to the same location and append to history.jsonl

Flags

  • /crawl-sim <url> --no-snapshot — skip writing the new snapshot (useful for exploratory runs)
  • /crawl-sim <url> --no-diff — skip the comparison step
  • /crawl-sim <url> --baseline — force overwrite the existing snapshot as the new reference point (resets deltas to zero)

Future: git-tracking

Not scoping this now, but noting for later: an opt-in mode where .crawl-sim/snapshots/ is committed to the user's repo so every PR's audit is version-controlled. This enables:

  • PR checks that fail the build on score regressions (jq -e '.overall.score >= 70')
  • Historical review via git log -p .crawl-sim/snapshots/
  • Team-shareable baselines without a central database

A follow-up issue can add a --git flag that ensures .crawl-sim/ is tracked, adds a suggested .gitignore exemption, and outputs a suggested CI step. For now, file-based local snapshots are enough — leave it out of this issue.

Acceptance criteria

  • scripts/compare-snapshots.sh <current.json> <previous.json> exists, outputs a structured diff JSON
  • compute-score.sh output is stable enough to diff cleanly across runs (no timestamps / random IDs in the comparable fields)
  • SKILL.md orchestrates the snapshot load/diff/save cycle using <cwd>/.crawl-sim/snapshots/
  • The output format includes per-bot, per-category, and per-finding deltas
  • URL → filename mapping handles query strings, trailing slashes, and special chars safely
  • .history.jsonl is append-only and never rewritten
  • Flags: --no-snapshot, --no-diff, --baseline
  • README documents the snapshot directory layout and mentions git-tracking as a future option
  • The skill auto-creates .crawl-sim/README.md on first use to explain the directory

Out of scope

  • Git-tracking automation (future issue)
  • Trend charts / visualizations (future, once history has enough data)
  • Cloud sync of snapshots (never — the whole point is local-first and agent-agnostic)
  • Pruning old history (add when it becomes a real problem)

Why this is the best answer

  • Works identically across agent platforms (Claude Code, Codex, custom runners)
  • Survives memory wipes, machine changes, and reinstalls
  • Shareable with teams via the user's existing version control
  • Enables CI integration with no new infrastructure
  • Keeps the "scripts as deterministic evidence" contract — the snapshot is just another script output

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions