Roadmap: Behavioral A/B benchmark (stock vs lobotomized-CC)

## Parent

PRD: release-adoption control plane — #2 (slice 5, roadmap)

## What to build

The value-proving track (ADR 0002): a **Behavioral A/B benchmark** comparing **stock CC** vs **lobotomized-CC** — same version/model/effort/prompt — scored on the behavioral axes the **Lobotomy** targets via an LLM judge (paired, randomized order), with a **Correctness guardrail**. Evidence, not a gate; it feeds the Behavioral A/B field of the **Adoption record**.

Hybrid home: `~/repos/bench` exposes its run/judge/aggregate as **library primitives** (a leaf refactor in bench); the tweakcc-specific behavior-bait **fixtures**, the behavioral **rubric**, and the **A/B driver** live in `tweakcc-maint`. Driver seam mirrors the gate: the LLM judge sits behind a stubbable port so pairing/randomization/aggregation are tested without real model calls.

Tracking placeholder — needs its own design pass (and the bench refactor) before AFK-ready.

## Acceptance criteria

- [ ] Triaged: scope the bench library refactor + the behavioral rubric/fixtures.

## Blocked by

None - roadmap, needs triage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap: Behavioral A/B benchmark (stock vs lobotomized-CC) #11

Parent

What to build

Acceptance criteria

Blocked by

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Roadmap: Behavioral A/B benchmark (stock vs lobotomized-CC) #11

Description

Parent

What to build

Acceptance criteria

Blocked by

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions