Parent
PRD: release-adoption control plane — #2 (slice 5, roadmap)
What to build
The value-proving track (ADR 0002): a Behavioral A/B benchmark comparing stock CC vs lobotomized-CC — same version/model/effort/prompt — scored on the behavioral axes the Lobotomy targets via an LLM judge (paired, randomized order), with a Correctness guardrail. Evidence, not a gate; it feeds the Behavioral A/B field of the Adoption record.
Hybrid home: ~/repos/bench exposes its run/judge/aggregate as library primitives (a leaf refactor in bench); the tweakcc-specific behavior-bait fixtures, the behavioral rubric, and the A/B driver live in tweakcc-maint. Driver seam mirrors the gate: the LLM judge sits behind a stubbable port so pairing/randomization/aggregation are tested without real model calls.
Tracking placeholder — needs its own design pass (and the bench refactor) before AFK-ready.
Acceptance criteria
Blocked by
None - roadmap, needs triage.
🤖 Generated with Claude Code
Parent
PRD: release-adoption control plane — #2 (slice 5, roadmap)
What to build
The value-proving track (ADR 0002): a Behavioral A/B benchmark comparing stock CC vs lobotomized-CC — same version/model/effort/prompt — scored on the behavioral axes the Lobotomy targets via an LLM judge (paired, randomized order), with a Correctness guardrail. Evidence, not a gate; it feeds the Behavioral A/B field of the Adoption record.
Hybrid home:
~/repos/benchexposes its run/judge/aggregate as library primitives (a leaf refactor in bench); the tweakcc-specific behavior-bait fixtures, the behavioral rubric, and the A/B driver live intweakcc-maint. Driver seam mirrors the gate: the LLM judge sits behind a stubbable port so pairing/randomization/aggregation are tested without real model calls.Tracking placeholder — needs its own design pass (and the bench refactor) before AFK-ready.
Acceptance criteria
Blocked by
None - roadmap, needs triage.
🤖 Generated with Claude Code