You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Operationalize PROJECT.md's macro alignment with micro flexibility and CLAUDE.md's process applies WHEN doing software development: distinguish SWE from non-SWE work at session start, route the harness accordingly, and let the user see and override what's happening.
Single destination: When the user is doing SWE work, the harness enforces. When they are not, it stays out of the way. The system measures the difference and recalibrates without human intervention.
The disconnect being closed
PROJECT.md says macro alignment with micro flexibility; CLAUDE.md says process applies WHEN doing software development. Implementation today is the opposite: micro-rigid (every Write/Edit gated on path) and macro-blind (no upfront SWE-vs-not detection). 1,577 PreToolUse deny events Apr–May; bypass primitives (touch /tmp/skip_*, AUTONOMOUS_DEV_BYPASS=1, SKIP_AGENT_COMPLETENESS_GATE=1, --skip-review) appear in 7+ sessions.
A 13-class intent classifier (lib/intent_classifier.py, 9 SWE classes from #971 + 4 non-SWE classes from #1023) and per-session mode artifact (/tmp/session_mode_<sid>.json from Phase D #998) already exist. Phase E (#999) wired hooks to consult it — but it ships default-off. Phase 2 (#961, PR #1037) added classifier-gated plan-critic + research skip. The plumbing is built; it needs to land.
Milestones
M0 — Framework reliability (PREREQUISITE)
Without this, telemetry lies, classifier rollout is unsafe, every long pipeline is a crash risk.
Exit criterion: INTENT_CLASSIFIER_ENFORCE=true ships as the default for doc/config/typo/status_query/conversation plus the 4 non-SWE classes from #1023; ≥80% reduction in decision=deny events on non-SWE classes; no regression on intent=implement / security_critical.
M3 — User-gated overrides
Stop silent bypasses. Make blocks visible and auditable.
Exit criterion: ≥7 consecutive days of pipeline runs in a non-harness repo with zero manual recovery sessions and zero SKIP_AGENT_COMPLETENESS_GATE=1-class bypasses.
These are internal cleanup, not on the friction critical path. They proceed in parallel.
Original empirical analysis
(Preserved from original issue body — still relevant evidence.)
Empirical analysis of 132 archived sessions over 7 days showed hooks fire on every Claude Code invocation as if it might be /implement, regardless of actual user intent. Strategic, exploratory, and triage tasks accumulated hundreds of blocks per session.
Top 8 most-blocked sessions:
Session
Prompt
Blocks
09a1e592
(local-command-caveat session)
243
e7fd9310
"bug? from other claude"
192
1bc3fea4
"what open issues do we want to work on in order of priority"
175
3c57a2ee
(local-command-caveat session)
159
44114e96
/implement #784
105
0be49a64
/implement --batch --issues 851,863,860
103
2502327a
"what open issues do we have?"
101
46261312
"look at this claude session in realign. why do i need to run critic..."
97
Sessions whose first prompt is not/implement accumulate hundreds of blocks. The classifier exists and works; it just isn't consulted at the friction-critical sites.
Mission
Operationalize PROJECT.md's macro alignment with micro flexibility and CLAUDE.md's process applies WHEN doing software development: distinguish SWE from non-SWE work at session start, route the harness accordingly, and let the user see and override what's happening.
Single destination: When the user is doing SWE work, the harness enforces. When they are not, it stays out of the way. The system measures the difference and recalibrates without human intervention.
The disconnect being closed
PROJECT.md says macro alignment with micro flexibility; CLAUDE.md says process applies WHEN doing software development. Implementation today is the opposite: micro-rigid (every Write/Edit gated on path) and macro-blind (no upfront SWE-vs-not detection). 1,577 PreToolUse
denyevents Apr–May; bypass primitives (touch /tmp/skip_*,AUTONOMOUS_DEV_BYPASS=1,SKIP_AGENT_COMPLETENESS_GATE=1,--skip-review) appear in 7+ sessions.A 13-class intent classifier (
lib/intent_classifier.py, 9 SWE classes from #971 + 4 non-SWE classes from #1023) and per-session mode artifact (/tmp/session_mode_<sid>.jsonfrom Phase D #998) already exist. Phase E (#999) wired hooks to consult it — but it ships default-off. Phase 2 (#961, PR #1037) added classifier-gated plan-critic + research skip. The plumbing is built; it needs to land.Milestones
M0 — Framework reliability (PREREQUISITE)
Without this, telemetry lies, classifier rollout is unsafe, every long pipeline is a crash risk.
/bypass-hooksslash command (planning happens in M0; implementation lands in M3)Exit criterion: zero
/tmp/implement_pipeline_state.jsoncollisions across 14 days; crash-resume works without manualrecord_agent_completion()calls.M1 — Hook telemetry observability
Measure before tuning. Every M2/M3/M4 claim is unfalsifiable without this.
Exit criterion: top-5 slowest hooks + top-5 most-blocked gates published; baseline JSONL committed.
M2 — Intent-aware gating: turn it on safely
This is the work. Most of the user's friction lives here.
Phase track (Phase 1 #971 + Phase D #998 + Phase E #999 already shipped):
AMBIGUOUS→ ask-once viaAskUserQuestionHard-floor classifier wrap:
New umbrellas:
Exit criterion:
INTENT_CLASSIFIER_ENFORCE=trueships as the default fordoc/config/typo/status_query/conversationplus the 4 non-SWE classes from #1023; ≥80% reduction indecision=denyevents on non-SWE classes; no regression onintent=implement / security_critical.M3 — User-gated overrides
Stop silent bypasses. Make blocks visible and auditable.
/bypass-hooks <reason>slash command--interactive/--autonomousmode flags + ask-on-second-block/authorize <block_id>scoped override (depends on feat: stable named-block registry in unified_pre_tool.py + decision payload #975)SKIP_AGENT_COMPLETENESS_GATE=1with/authorize(today's discovery)Exit criterion: silent
.bypassfile marker deprecated; every override emits a structured audit event; deletion-only diffs no longer require all 9 pipeline agents.M4 — Self-tuning (closed-loop closer)
The actuator. Without this, "gets better every week" stays aspirational.
Exit criterion: classifier skip-rule changes ship without human approval if benchmark improves; revert if regresses; baseline updated on success.
M5 — Real-world use validation
Prove it works for what the user actually wants it to do.
realignorspektiv)Exit criterion: ≥7 consecutive days of pipeline runs in a non-harness repo with zero manual recovery sessions and zero
SKIP_AGENT_COMPLETENESS_GATE=1-class bypasses.Dependency graph
M0 and M1 can run in parallel. Everything else is strictly sequenced.
Out of scope (parallel track, not blocking)
These are internal cleanup, not on the friction critical path. They proceed in parallel.
Original empirical analysis
(Preserved from original issue body — still relevant evidence.)
Empirical analysis of 132 archived sessions over 7 days showed hooks fire on every Claude Code invocation as if it might be
/implement, regardless of actual user intent. Strategic, exploratory, and triage tasks accumulated hundreds of blocks per session.Top 8 most-blocked sessions:
09a1e592e7fd93101bc3fea43c57a2ee44114e96/implement #7840be49a64/implement --batch --issues 851,863,8602502327a46261312Sessions whose first prompt is not
/implementaccumulate hundreds of blocks. The classifier exists and works; it just isn't consulted at the friction-critical sites.Current status (2026-05-04)
Next concrete move: land #1037 (Phase 2), then start M0 (#1041 + #1036).