Follow-up from docs/issues/2026-04-12-accuracy-from-live-audit.md (M4 remainder) and docs/plans/2026-04-18-live-audit-follow-through.md.
Problem
docs/output-schemas.md exists, but the contracts are still only human-readable. There are no *.schema.json files and CI does not validate actual script outputs against machine-readable schemas. That means output drift can still land silently.
Goal
Make output contracts enforceable.
Scope
- Add JSON Schema files for the core script outputs, at minimum:
fetch-as-bot.sh
extract-meta.sh
extract-jsonld.sh
extract-links.sh
check-robots.sh
check-llmstxt.sh
check-sitemap.sh
diff-render.sh
compute-score.sh
build-report.sh
- Add CI validation that runs representative fixtures through those schemas.
- Document the schema directory and validation workflow in
docs/output-schemas.md.
Acceptance criteria
Why this matters
The live audit showed that crawl-sim's raw data can be correct while the interpretation layer drifts. Locking the JSON contracts reduces that drift surface area.
Follow-up from
docs/issues/2026-04-12-accuracy-from-live-audit.md(M4 remainder) anddocs/plans/2026-04-18-live-audit-follow-through.md.Problem
docs/output-schemas.mdexists, but the contracts are still only human-readable. There are no*.schema.jsonfiles and CI does not validate actual script outputs against machine-readable schemas. That means output drift can still land silently.Goal
Make output contracts enforceable.
Scope
fetch-as-bot.shextract-meta.shextract-jsonld.shextract-links.shcheck-robots.shcheck-llmstxt.shcheck-sitemap.shdiff-render.shcompute-score.shbuild-report.shdocs/output-schemas.md.Acceptance criteria
schemas/*.schema.jsonexists for the script outputs abovedocs/output-schemas.mdpoints to the machine-readable schemas, not just prose examplesWhy this matters
The live audit showed that crawl-sim's raw data can be correct while the interpretation layer drifts. Locking the JSON contracts reduces that drift surface area.