[#1075 sub-issue] /baseline --tests periodic pass — cross-session test-baseline snapshot

## Summary

Third sub-issue of three for umbrella #1075. Build a periodic-aggregation pass that produces a machine-readable test-baseline snapshot across sessions — which tests have been failing for how long, with cross-session trend visibility.

## Blocked by

#1098 (docs sweep — first variant). Build that first.

## Context

Pre-existing test failures get re-verified each pipeline by narrative comparison ("implementer says these 230 are pre-existing"). No machine-readable baseline aggregates across sessions, so cross-session trends are lost: a test that has been failing for 12 sessions looks the same as a test that started failing today.

This session's #1086 pipeline surfaced the gap directly: the STEP 1 baseline-failing-tests capture timed out at 120s and silently produced a 0-byte file (filed as #1094 by CIA). The fix-forward classification relied on manual `git stash` verification at reviewer stage. A periodic baseline pass would have a known-good snapshot to compare against — and would obviate the per-pipeline STEP 1 capture entirely.

## Implementation Approach

- Periodic command (e.g., `/baseline --tests`) that runs `pytest --tb=no -q` against the full suite with generous timeout.
- Persists results to a versioned artifact (e.g., `.claude/local/test_baseline.json`) with: test ID → first-failed-at-commit + last-seen-at-commit + current status + consecutive-failures count.
- Subsequent `/implement` pipelines read this artifact at STEP 1 instead of running their own baseline capture.
- Idempotent — re-running with no test changes produces an updated snapshot but no spurious diff.

## Test Scenarios

1. Initial run captures full pass/fail snapshot to artifact path.
2. Subsequent run with no changes produces identical artifact contents (idempotent).
3. `/implement` STEP 1 consumes this artifact instead of running its own baseline capture.
4. Tests fixed since last snapshot are correctly classified as "fixed".
5. New failures since last snapshot are correctly classified as "new" — implement gate blocks on these.
6. Cross-session signal: a test that has been failing for N sessions shows a consecutive-failures count of N.

## Acceptance Criteria

- [ ] `/baseline --tests` (or equivalent) runs the full pytest suite with generous timeout.
- [ ] Persists results to versioned artifact with test ID, first-failed-at-commit, last-seen-at-commit, status, consecutive-failures count.
- [ ] Subsequent `/implement` pipelines consume this artifact instead of running baseline capture.
- [ ] Idempotent on a clean repo.
- [ ] Structural test for the artifact path.
- [ ] Documented in `docs/COMMANDS.md` (or equivalent) with usage example.

## Relation to umbrella

- Sub-issue **#3** of #1075.
- Blocked by #1098 (build docs sweep first to establish pattern).
- Replaces the closed #1063 in spirit.
- Indirectly addresses #1094 (this session's STEP 1 timeout) by removing the per-pipeline baseline-capture step entirely.

**Plugin Version**: 3.50.0 (8035385)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#1075 sub-issue] /baseline --tests periodic pass — cross-session test-baseline snapshot #1100

Summary

Blocked by

Context

Implementation Approach

Test Scenarios

Acceptance Criteria

Relation to umbrella

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[#1075 sub-issue] /baseline --tests periodic pass — cross-session test-baseline snapshot #1100

Description

Summary

Blocked by

Context

Implementation Approach

Test Scenarios

Acceptance Criteria

Relation to umbrella

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions