epic: non-SWE work flows without harness friction (umbrella)

## Mission

Operationalize PROJECT.md's *macro alignment with micro flexibility* and CLAUDE.md's *process applies WHEN doing software development*: distinguish SWE from non-SWE work at session start, route the harness accordingly, and let the user see and override what's happening.

**Single destination**: When the user is doing SWE work, the harness enforces. When they are not, it stays out of the way. The system measures the difference and recalibrates without human intervention.

## The disconnect being closed

PROJECT.md says macro alignment with micro flexibility; CLAUDE.md says process applies WHEN doing software development. Implementation today is the opposite: micro-rigid (every Write/Edit gated on path) and macro-blind (no upfront SWE-vs-not detection). 1,577 PreToolUse `deny` events Apr–May; bypass primitives (`touch /tmp/skip_*`, `AUTONOMOUS_DEV_BYPASS=1`, `SKIP_AGENT_COMPLETENESS_GATE=1`, `--skip-review`) appear in 7+ sessions.

A 13-class intent classifier (`lib/intent_classifier.py`, 9 SWE classes from #971 + 4 non-SWE classes from #1023) and per-session mode artifact (`/tmp/session_mode_<sid>.json` from Phase D #998) already exist. Phase E (#999) wired hooks to consult it — **but it ships default-off**. Phase 2 (#961, PR #1037) added classifier-gated plan-critic + research skip. The plumbing is built; it needs to land.

## Milestones

### M0 — Framework reliability (PREREQUISITE)
*Without this, telemetry lies, classifier rollout is unsafe, every long pipeline is a crash risk.*

- **#1041 — Durable pipeline state isolation (umbrella)** — closes #989, #1028, #1029, #1030, #1033, #1039
- #1017 — document hook deadlock protocol
- #1019 — audit & close silent agent/hook source-tree writes
- #1021 — investigate plugin symlink between autonomous-dev and consumer repos (canonical source-of-truth decision)
- #1036 — hook commands resolve to submodule path, not project root
- #1018 — `/bypass-hooks` slash command (planning happens in M0; implementation lands in M3)

**Exit criterion**: zero `/tmp/implement_pipeline_state.json` collisions across 14 days; crash-resume works without manual `record_agent_completion()` calls.

### M1 — Hook telemetry observability
*Measure before tuning. Every M2/M3/M4 claim is unfalsifiable without this.*

- #1012 — per-hook timing wrapper (W0)
- #1022 — 1-day baseline capture + p50/p95/p99 publish

**Exit criterion**: top-5 slowest hooks + top-5 most-blocked gates published; baseline JSONL committed.

### M2 — Intent-aware gating: turn it on safely
*This is the work. Most of the user's friction lives here.*

**Phase track (Phase 1 #971 + Phase D #998 + Phase E #999 already shipped):**
- #961 — Phase 2: classifier-gated plan-critic + research skip (PR #1037)
- #962 — Phase 3: impact-based test selection
- #963 — Phase 4: cross-issue research sharing
- #1024 — `AMBIGUOUS` → ask-once via `AskUserQuestion`

**Hard-floor classifier wrap:**
- #1014 — Stage 1 shadow mode (W2.1)
- #1015 — Stage 2 active enforcement (W2.2)

**New umbrellas:**
- **#1042 — Audit every hook for classifier-gate coverage** — closes #1038, #931, #934, #1031, #1002, #918, #913, #916
- **#1043 — Calibrate non-SWE intent classes against real session corpus**

**Exit criterion**: `INTENT_CLASSIFIER_ENFORCE=true` ships as the *default* for `doc/config/typo/status_query/conversation` plus the 4 non-SWE classes from #1023; ≥80% reduction in `decision=deny` events on non-SWE classes; no regression on `intent=implement / security_critical`.

### M3 — User-gated overrides
*Stop silent bypasses. Make blocks visible and auditable.*

- #1018 — `/bypass-hooks <reason>` slash command
- #974 — `--interactive` / `--autonomous` mode flags + ask-on-second-block
- #975 — stable named-block registry (depends on #974)
- #976 — `/authorize <block_id>` scoped override (depends on #975)
- #1025 — diff-shape-aware completeness gate
- #1027 — mode-aware agent-completeness gate (light/fix should not require full-mode agents)
- #1040 — replace silent `SKIP_AGENT_COMPLETENESS_GATE=1` with `/authorize` (today's discovery)

**Exit criterion**: silent `.bypass` file marker deprecated; every override emits a structured audit event; deletion-only diffs no longer require all 9 pipeline agents.

### M4 — Self-tuning (closed-loop closer)
*The actuator. Without this, "gets better every week" stays aspirational.*

- **#964 — Phase 5: classifier telemetry aggregation + auto-rollback**
- #1026 — 3× recurrence policy: forces root-cause architecture review

**Exit criterion**: classifier skip-rule changes ship without human approval if benchmark improves; revert if regresses; baseline updated on success.

### M5 — Real-world use validation
*Prove it works for what the user actually wants it to do.*

- **#1044 — One-week real-world validation in non-harness repo (`realign` or `spektiv`)**

**Exit criterion**: ≥7 consecutive days of pipeline runs in a non-harness repo with zero manual recovery sessions and zero `SKIP_AGENT_COMPLETENESS_GATE=1`-class bypasses.

## Dependency graph

```
M0 (framework reliability) ──┐
                             ├──> M2 (intent-aware gating) ──> M3 (overrides) ──> M4 (self-tuning) ──> M5 (real use)
M1 (telemetry baseline)    ──┘                                      ▲                       ▲
                                                                    │                       │
                #1042 hook audit runs in M2 ────────────────────────┘                       │
                #1043 calibration runs in M2 ─────────────────────────────────────────── ───┘
```

M0 and M1 can run in parallel. Everything else is strictly sequenced.

## Out of scope (parallel track, not blocking)

- #1013 / W1 — manifest reconciliation
- #1016 / W3 — lib audit + GenAI primitive consolidation
- #888 — plan-critic Tier-1 epic (separate concern)

These are internal cleanup, not on the friction critical path. They proceed in parallel.

## Original empirical analysis

**(Preserved from original issue body — still relevant evidence.)**

Empirical analysis of 132 archived sessions over 7 days showed hooks fire on every Claude Code invocation as if it might be `/implement`, regardless of actual user intent. Strategic, exploratory, and triage tasks accumulated hundreds of blocks per session.

Top 8 most-blocked sessions:

| Session | Prompt | Blocks |
|---|---|---:|
| `09a1e592` | (local-command-caveat session) | 243 |
| `e7fd9310` | "bug? from other claude" | 192 |
| `1bc3fea4` | "what open issues do we want to work on in order of priority" | 175 |
| `3c57a2ee` | (local-command-caveat session) | 159 |
| `44114e96` | `/implement #784` | 105 |
| `0be49a64` | `/implement --batch --issues 851,863,860` | 103 |
| `2502327a` | "what open issues do we have?" | 101 |
| `46261312` | "look at this claude session in realign. why do i need to run critic..." | 97 |

Sessions whose first prompt is *not* `/implement` accumulate hundreds of blocks. The classifier exists and works; it just isn't consulted at the friction-critical sites.

## Current status (2026-05-04)

| Milestone | Status |
|---|---|
| M0 | In planning — #1041 umbrella created today |
| M1 | Scoped — #1012, #1022 ready |
| M2 | Phase 2 (#961) PR #1037 open; #1042 + #1043 umbrellas created today |
| M3 | Scoped — 6 issues |
| M4 | Scoped — #964 + #1026 |
| M5 | Scoped — #1044 umbrella created today |

**Next concrete move**: land #1037 (Phase 2), then start M0 (#1041 + #1036).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: non-SWE work flows without harness friction (umbrella) #943

Mission

The disconnect being closed

Milestones

M0 — Framework reliability (PREREQUISITE)

M1 — Hook telemetry observability

M2 — Intent-aware gating: turn it on safely

M3 — User-gated overrides

M4 — Self-tuning (closed-loop closer)

M5 — Real-world use validation

Dependency graph

Out of scope (parallel track, not blocking)

Original empirical analysis

Current status (2026-05-04)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Session	Prompt	Blocks
`09a1e592`	(local-command-caveat session)	243
`e7fd9310`	"bug? from other claude"	192
`1bc3fea4`	"what open issues do we want to work on in order of priority"	175
`3c57a2ee`	(local-command-caveat session)	159
`44114e96`	`/implement #784`	105
`0be49a64`	`/implement --batch --issues 851,863,860`	103
`2502327a`	"what open issues do we have?"	101
`46261312`	"look at this claude session in realign. why do i need to run critic..."	97

Milestone	Status
M0	In planning — #1041 umbrella created today
M1	Scoped — #1012, #1022 ready
M2	Phase 2 (#961) PR #1037 open; #1042 + #1043 umbrellas created today
M3	Scoped — 6 issues
M4	Scoped — #964 + #1026
M5	Scoped — #1044 umbrella created today

epic: non-SWE work flows without harness friction (umbrella) #943

Description

Mission

The disconnect being closed

Milestones

M0 — Framework reliability (PREREQUISITE)

M1 — Hook telemetry observability

M2 — Intent-aware gating: turn it on safely

M3 — User-gated overrides

M4 — Self-tuning (closed-loop closer)

M5 — Real-world use validation

Dependency graph

Out of scope (parallel track, not blocking)

Original empirical analysis

Current status (2026-05-04)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions