diff --git a/docs/effort-band-routing.md b/docs/effort-band-routing.md new file mode 100644 index 0000000..1a91ba6 --- /dev/null +++ b/docs/effort-band-routing.md @@ -0,0 +1,82 @@ +# Effort-band model routing (phase-aware bump-from-floor) + +Apply the upstream "effort-band" routing idea (v7.0) to our port **without +breaking** the tier model or `copilot-preset`: the model is derived from the +**task size and the pipeline phase**, floored by the agent's declared tier. + +## The insight + +Our three tiers (`small`/`balanced`/`deep`) already *are* three effort bands. The +improvement over one-model-per-agent: **derive the band from objective signals** +(the `task-size-classifier`, recorded by `/scope`) — and spend the effort where +it pays. The effort belongs in **spec/plan**: a complex task needs deep reasoning +to design and decompose; once the plan is solid, **implementation is routine**. +So the size raises the band **only while planning**, and the build runs at the +floor. + +## How it works + +`model-routing.json` gains an `effortBand`: + +``` +ladder: [small, balanced, deep] +sizeBand: { trivial: small, standard: balanced, complex: deep } +bumpStages: [needs-plan] # the planning phase: /scope -> /specs -> /plan +``` + +Resolution (pure `effectiveBand(floor, size, stage)` in `lib/shared.ts`): + +``` +effective = (stage in bumpStages) ? max_on_ladder(floor, sizeBand[size]) : floor +``` + +- **Planning** (`stage = needs-plan`): the size bumps the band — a `complex` plan + runs the architect + the five plan-review critics at `deep`. +- **Build/review** (`stage = plan-approved`): **no bump** — implementers and + reviewers run at their **floor**. Don't spend `deep` on mechanical + implementation once the plan is solid. +- **Never below the floor** — high-stakes agents (deep floor: security/domain/ + arch-review, architect, security-engineer, codebase-recon) always hold at deep. +- **No signal / trivial / unscoped → the floor** (backward-compatible with + today's static tiers). + +The `model-routing` extension reads the floor (`agentModel`) + size + stage +(plan-gate state), **logs** every dispatch to `.omp/state/model-routing.log`, and: + +- default **`advisory`** — warns when the dispatched tier ≠ the band tier; +- `DEV_TEAM_EFFORT_ROUTING=enforce` — blocks and names the model to use; +- `=off` — disables the band (floor tier only). + +`copilot-preset` is untouched — the band picks a *tier*; `modelRoles` (or the +Copilot remap) resolves the concrete model. + +## Decisions + +- **Effort into spec/plan, not the build.** A complex task bumps the *planning* + agents to deep; the build/review runs at the floor. This matches "put the + package on the plan — if the plan is solid, implementation is trivial," and it + avoids the failure mode of routing every lexical reviewer (naming, complexity, + a11y) on opus just because the task is complex. +- **Bump-from-floor, never below the agent's tier.** The tier is a quality floor; + the size can only raise it, and only while planning. Keeps the north star + (quality first) without over-spending. +- **Default advisory, not enforce.** The orchestrator applies the rule; the + extension audits and warns. `enforce` is opt-in so a not-yet-updated caller + isn't bricked; `off` is the escape hatch. + +## Pre-existing routing debt cleaned up + +- The orchestrator's **Resolution Procedure** described a Claude-Code hook + (`agent-model-resolve.sh`, `.claude/model-overrides.json`, `updatedInput`, + `deny`, a non-existent `docs/model-routing.md` + ADR 0004) the OMP extension + never implemented. Rewritten to reality + the new behavior. +- 4 skills + the agent-registry referenced the same stale + `hooks/agent-model-resolve.sh` — repointed to the `model-routing` extension. + +Left as-is (separate, larger debt): the `/model-routing-check` skill's exec block +still targets the old `.claude/` resolver; `/routing` is the accurate effort-band +diagnostic and the skill now points to it. + +## Verified +`ci-validate-json` 23/23 · extensions compile · unit suite green (incl. 11 +phase-aware `effectiveBand` cases) · no `agent-model-resolve.sh` references remain. diff --git a/plugins/dev-team/agents/orchestrator.md b/plugins/dev-team/agents/orchestrator.md index 5fc0ff9..861ac9a 100644 --- a/plugins/dev-team/agents/orchestrator.md +++ b/plugins/dev-team/agents/orchestrator.md @@ -26,19 +26,19 @@ thinking-level: medium - Task classification algorithm - Load balancing logic -## Resolution Procedure +## Resolution Procedure (floor tier + effort band) -Each agent's `model:` frontmatter encodes the tier alias (`haiku`, `sonnet`, `opus`) appropriate for its task. Tier-to-snapshot resolution is **enforced by OMP extension `model-routing` + `.omp/config.yml` `modelRoles`**. The LLM cannot bypass it. +Each agent's `model:` frontmatter declares its **floor tier** — `pi/smol` (small), `claude-sonnet-4-6` (balanced), or `claude-opus-4-8` (deep). Tier resolution is **native OMP**: `.omp/config.yml` `modelRoles` (and, with the copilot-preset plugin, the Copilot remap) turn the tier into a concrete model. The source of truth for tiers is `skill://dev-team-knowledge/model-routing.json`. -When the orchestrator (or any caller) spawns a subagent via the `task` tool with `model: `, the extension: +On top of the floor, the `model-routing` extension applies **phase-aware effort-band routing (bump-from-floor)**. The effort goes into **spec/plan**, not the build: the **task size** (recorded at `/scope` → plan-gate state; classifier `skill://dev-team-knowledge/task-size-classifier.md`) raises the band **only while planning** (`stage = needs-plan`, i.e. `/scope` → `/specs` → `/plan`): -1. Reads `skill://dev-team-knowledge/model-routing.json` — the single source of truth for tier → snapshot mapping. -2. Reads `.claude/model-overrides.json` if present (per-user, gitignored, populated by the `/init-dev-team` probe or by hand for restricted endpoints). -3. Walks the alias chain up to 3 hops along the `haiku → sonnet → opus` cascade. Each tier alias resolves to either another tier (bumped) or the literal `"unavailable"` sentinel (refusal). -4. On any bump, rewrites `tool_input.model` via `hookSpecificOutput.updatedInput` and appends one JSONL event to `.claude/metrics/model-routing.log`. -5. On exhaustion, cycle, missing routing.json, or malformed overrides, emits `permissionDecision: "deny"` with an actionable `permissionDecisionReason`. The dispatch never reaches the harness. +- During planning, target band: `trivial` → `small`, `standard` → `balanced`, `complex` → `deep`; effective = the **higher of the agent's floor and that target**. So a `complex` plan runs the architect and plan-review critics at `deep`. +- Once the plan is **approved** (`stage = plan-approved`, the build/review phase), there is **no bump** — implementers and reviewers run at their **floor**. A solid plan makes the build routine, so don't spend `deep` on mechanical implementation. +- An agent is **never routed below its floor** (high-stakes deep agents — security/domain/arch-review, architect, security-engineer, codebase-recon — always hold at deep). No signal / trivial / unscoped → the floor (static, backward-compatible). -For triage, run `/model-routing-check` — read-only diagnostic that prints the effective map, overrides, recent bumps, and probe applicability. See `docs/model-routing.md` for contract, fallback firing, hand-writing overrides, and Bedrock/Vertex/proxy troubleshooting. See [ADR 0004](../../../docs/adr/0004-pre-dispatch-model-resolution.md) for the design rationale (pre-dispatch vs. runtime retry; hook vs. orchestrator instruction). +So when you spawn a subagent via `task`, **pass the effort-band tier** = `(stage in bumpStages) ? max(floor, sizeBand[size]) : floor` (data in `model-routing.json` → `effortBand`). The extension logs every dispatch to `.omp/state/model-routing.log` and, by default (`advisory`), **warns** when the dispatched tier ≠ the band tier. `DEV_TEAM_EFFORT_ROUTING=enforce` upgrades the warning to a **block** naming the model to use; `=off` disables the band (floor tier only). + +For triage, run `/routing` or `/model-routing-check` (read-only): the tier map plus, per floor, the effective band for the current stage + task size. ### Tier guidance (informational) diff --git a/plugins/dev-team/commands/help.md b/plugins/dev-team/commands/help.md index b815817..ed865a4 100644 --- a/plugins/dev-team/commands/help.md +++ b/plugins/dev-team/commands/help.md @@ -4,9 +4,10 @@ - **Pipeline** (enforced order): `/scope` (pre-analysis) → `/specs` → `/plan` → `/plan-approve` → `/build` → `/code-review` → `/review-approve` → `/pr` -- **Plan gate**: `/scope [--trivial]`, `/trivial`, `/plan-approve [path]`, - `/plan-reset` — source edits are blocked until the task is scoped and (if - non-trivial) a plan is approved. +- **Plan gate**: `/scope [--trivial | --complex]`, `/trivial`, + `/plan-approve [path]`, `/plan-reset` — source edits are blocked until the task + is scoped and (if non-trivial) a plan is approved. The scope size also drives + effort-band model routing (`/routing`). - **Verify**: `/impl-verify` (strict build + tests, bounded verdict) - **Review**: `/code-review` (`/review`), `/review-agent`, `/review-approve` - **Guardrails**: `/careful on|off`, `/freeze `, `/unfreeze`, diff --git a/plugins/dev-team/config.snippet.yml b/plugins/dev-team/config.snippet.yml index 7ffaf63..fd161ff 100644 --- a/plugins/dev-team/config.snippet.yml +++ b/plugins/dev-team/config.snippet.yml @@ -84,6 +84,24 @@ eval: # .csproj/.sln -> dotnet, Cargo.toml -> rust, go.mod -> go, pyproject.toml -> # python, package.json -> node (dotnet wins a mixed repo). +# --------------------------------------------------------------------------- +# EFFORT-BAND MODEL ROUTING (model-routing extension) +# --------------------------------------------------------------------------- +# Phase-aware bump-from-floor: the effort goes into spec/plan, not the build. The +# TASK SIZE (set by /scope) raises the band ONLY while planning (stage +# needs-plan): target trivial->small, standard->balanced, complex->deep, and +# effective = max(agent floor, target). Once the plan is approved, build/review +# run at the floor (no bump) — a solid plan makes the build routine. An agent is +# never routed BELOW its tier (high-stakes deep agents hold). No signal -> floor. +# Configured in skills/dev-team-knowledge/model-routing.json -> effortBand; +# copilot-preset unaffected (the band picks a tier; modelRoles resolves it). +# +# Enforcement is set there (default "advisory") and overridable per-run by env: +# DEV_TEAM_EFFORT_ROUTING=advisory # log + warn on mismatch (default) +# DEV_TEAM_EFFORT_ROUTING=enforce # block a mis-banded dispatch +# DEV_TEAM_EFFORT_ROUTING=off # disable the band (pure static tiers) +# Inspect with /routing or /model-routing-check. + # --------------------------------------------------------------------------- # TASK / SUBAGENT GRAPH # --------------------------------------------------------------------------- diff --git a/plugins/dev-team/extensions/lib/shared.ts b/plugins/dev-team/extensions/lib/shared.ts index 1982833..006e904 100644 --- a/plugins/dev-team/extensions/lib/shared.ts +++ b/plugins/dev-team/extensions/lib/shared.ts @@ -148,8 +148,45 @@ export interface RoutingTier { rationale: string; } +export interface EffortBandConfig { + ladder: string[]; + // Task size -> target band (e.g. { trivial: "small", standard: "balanced", complex: "deep" }). + sizeBand: Record; + // Pipeline stages where the size bump applies. The effort goes into spec/plan + // (`needs-plan`); once the plan is approved, implementation/review run at the + // floor — a solid plan makes the build routine. Default: ["needs-plan"]. + bumpStages?: string[]; + enforcement?: string; +} + export interface RoutingConfig { tiers: Record; + effortBand?: EffortBandConfig; +} + +// Resolve the effective band from the agent's FLOOR tier, the task size, and the +// pipeline stage. Phase-aware bump-from-floor: the size raises the band ONLY +// during the planning phase(s) (`bumpStages`, default ["needs-plan"]) — put the +// effort into spec/plan; post-plan build/review runs at the floor. An agent is +// never routed BELOW its tier (high-stakes deep agents hold). Pure, no I/O. +// - a floor outside the ladder (pinned/default) -> unchanged +// - stage not in bumpStages (build, trivial, unscoped) -> the floor +// - else -> the higher of (floor, sizeBand[size]) on the ladder +export function effectiveBand( + floor: string, + size: string | undefined, + stage: string | undefined, + cfg: EffortBandConfig | undefined, +): string { + if (!cfg || !Array.isArray(cfg.ladder)) return floor; + const fi = cfg.ladder.indexOf(floor); + if (fi === -1) return floor; // pinned/default — not on the ladder + const bumpStages = cfg.bumpStages ?? ["needs-plan"]; + if (!stage || !bumpStages.includes(stage)) return floor; // no bump outside planning + const target = size ? cfg.sizeBand?.[size] : undefined; + const ti = target ? cfg.ladder.indexOf(target) : -1; + if (ti === -1) return floor; + return cfg.ladder[Math.max(fi, ti)]; } export function loadRouting(): RoutingConfig | null { diff --git a/plugins/dev-team/extensions/model-routing.ts b/plugins/dev-team/extensions/model-routing.ts index e0ff059..a799ffc 100644 --- a/plugins/dev-team/extensions/model-routing.ts +++ b/plugins/dev-team/extensions/model-routing.ts @@ -1,69 +1,119 @@ -// model-routing.ts — dispatch tier logger + routing diagnostic. +// model-routing.ts — effort-band dispatch routing + observability + diagnostic. // -// In OMP the actual tier -> model mapping is native (.omp/config.yml modelRoles -// + each agent's `model:` frontmatter). This extension adds an observability -// layer the config alone doesn't give you: it records every subagent dispatch -// with its resolved tier to .omp/state/model-routing.log, and registers a -// `/routing` command that prints the tier table. +// Base model resolution in OMP is native (.omp/config.yml modelRoles + each +// agent's `model:` frontmatter). This extension adds **effort-band routing** on +// top: the model is derived from the TASK SIZE (trivial/standard/complex, from +// the task-size-classifier, recorded by /scope in plan-gate state), not only +// from the agent's static tier. The agent's declared tier is the BASE band; the +// task size shifts it along the ladder [small, balanced, deep]. This is more +// deterministic than one-model-per-agent and ties spend to the work. // -// All tiers are cloud. The small tier (`pi/smol`) is a CHEAP-cloud, high-volume -// tier (default modelRoles.smol = claude-haiku-4-5; cheaper still via the -// copilot-preset plugin). Source of truth: -// skills/dev-team-knowledge/model-routing.json. +// It also logs every dispatch (observability) and registers /routing. +// +// Enforcement (config `effortBand.enforcement`, overridable by env +// DEV_TEAM_EFFORT_ROUTING): +// off — no band; pure static tiers (just log). +// advisory — (default) log + warn when the dispatched tier != the band tier. +// enforce — block the dispatch and tell the caller the model to use. +// +// copilot-preset is unaffected: the band picks a TIER, then modelRoles resolves +// the concrete model (Anthropic id or github-copilot/*). import type { ExtensionAPI } from "@oh-my-pi/pi-coding-agent"; import { + type EffortBandConfig, agentModel, appendJSONL, + effectiveBand, loadRouting, nowISO, + readState, statePath, } from "./lib/shared.ts"; const SMALL_TIER_MODELS = new Set(["pi/smol", "smol"]); function tierOf(model: string | null): string { - if (model === null) return "default"; + if (model === null || model === "") return "default"; if (SMALL_TIER_MODELS.has(model)) return "small"; if (model.includes("opus")) return "deep"; if (model.includes("sonnet")) return "balanced"; + if (model === "small" || model === "balanced" || model === "deep") return model; return "pinned"; } +function taskState(cwd: string): { size?: string; stage?: string } { + return readState<{ size?: string; stage?: string }>(cwd, "plan-gate.json", {}); +} + export default function modelRouting(pi: ExtensionAPI) { pi.setLabel("model-routing"); - // Log each subagent dispatch with its resolved tier (observability only — - // never blocks). pi.on("tool_call", async (event, ctx) => { if (event.toolName !== "task") return; - const agent = String((event.input as Record).agent ?? ""); + const input = event.input as Record; + const agent = String(input.agent ?? ""); if (!agent) return; - const model = agentModel(agent); + + const routing = loadRouting(); + const band: EffortBandConfig | undefined = routing?.effortBand; + const mode = (process.env.DEV_TEAM_EFFORT_ROUTING ?? band?.enforcement ?? "advisory").toLowerCase(); + + const floor = tierOf(agentModel(agent)); // the agent's declared tier = floor + const { size, stage } = taskState(ctx.cwd); + const dispatched = tierOf(input.model == null ? null : String(input.model)) === "default" + ? floor // no explicit model -> the agent's floor tier is used + : tierOf(String(input.model)); + const eff = mode === "off" ? floor : effectiveBand(floor, size, stage, band); + appendJSONL(statePath(ctx.cwd, "model-routing.log"), { ts: nowISO(), agent, - model, - tier: tierOf(model), + floor, + stage: stage ?? null, + size: size ?? null, + effective: eff, + dispatched, + mode, }); + + if (mode === "off" || eff === dispatched || floor === "default" || floor === "pinned") { + return; + } + + const wantModel = routing?.tiers?.[eff]?.frontmatter ?? eff; + const msg = + `effort-band: ${agent} floor=${floor}, stage=${stage ?? "—"}, size=${size ?? "—"} -> dispatch at "${eff}" ` + + `(model: ${wantModel}), but got "${dispatched}".`; + if (mode === "enforce") { + return { + block: true, + reason: `${msg}\nRe-dispatch with model: ${wantModel}. (Set DEV_TEAM_EFFORT_ROUTING=advisory to warn instead, or =off to disable.)`, + }; + } + ctx.ui.notify(msg, "warn"); // advisory }); - // Diagnostic: print the tier -> frontmatter map (complements the - // /model-routing-check skill, which also resolves against your config). pi.registerCommand("routing", { - description: "Show the dev-team tier -> model map", + description: "Show the dev-team tier map + effort-band (current task size and effective bands)", handler: async (_args, ctx) => { const routing = loadRouting(); if (!routing) { - ctx.ui.notify( - "no model-routing.json found in skills/dev-team-knowledge", - "error", - ); + ctx.ui.notify("no model-routing.json found in skills/dev-team-knowledge", "error"); return; } const lines = Object.entries(routing.tiers).map( ([tier, t]) => `${tier}: ${t.frontmatter} (${t.intent})`, ); + const band = routing.effortBand; + if (band) { + const { size, stage } = taskState(ctx.cwd); + const mode = (process.env.DEV_TEAM_EFFORT_ROUTING ?? band.enforcement ?? "advisory").toLowerCase(); + lines.push("", `effort-band [${mode}] — stage: ${stage ?? "—"}, size: ${size ?? "—"}`); + for (const b of band.ladder) { + lines.push(` floor ${b} -> ${effectiveBand(b, size, stage, band)}`); + } + } ctx.ui.notify(lines.join("\n"), "info"); }, }); diff --git a/plugins/dev-team/extensions/plan-gate.ts b/plugins/dev-team/extensions/plan-gate.ts index f37c2f0..ba4093b 100644 --- a/plugins/dev-team/extensions/plan-gate.ts +++ b/plugins/dev-team/extensions/plan-gate.ts @@ -40,8 +40,11 @@ export function isGatedSource(p: string): boolean { // trivial — scoped trivial; source edits allowed // plan-approved — a plan was approved; source edits allowed export type Stage = "needs-plan" | "trivial" | "plan-approved"; +// Task size from the pre-analysis, also consumed by model-routing (effort-band). +export type Size = "trivial" | "standard" | "complex"; interface PlanGateState { stage?: Stage; + size?: Size; planPath?: string; at?: string; } @@ -58,28 +61,30 @@ export function gateDecision(stage: Stage | undefined): Decision { export default function planGate(pi: ExtensionAPI) { pi.setLabel("plan-gate"); - const set = (cwd: string, stage: Stage, planPath?: string) => + const set = (cwd: string, stage: Stage, size: Size, planPath?: string) => writeState(cwd, "plan-gate.json", { stage, + size, planPath, at: new Date().toISOString(), } satisfies PlanGateState); pi.registerCommand("scope", { description: - "Pre-analysis: classify the task trivial vs needs-a-plan. Usage: /scope [--trivial]", + "Pre-analysis: classify the task per the task-size-classifier. Usage: /scope [--trivial | --complex]", handler: async (args, ctx) => { - const trivial = /(^|\s)--trivial(\s|$)/.test(` ${args ?? ""} `); - if (trivial) { - set(ctx.cwd, "trivial"); + const a = ` ${args ?? ""} `; + if (/(^|\s)--trivial(\s|$)/.test(a)) { + set(ctx.cwd, "trivial", "trivial"); ctx.ui.notify( - "scoped TRIVIAL — source edits unlocked for this task. /plan-reset to re-arm.", + "scoped TRIVIAL — fast path: source edits unlocked; agents route at their floor band (cheap agents stay cheap). /plan-reset to re-arm.", "warn", ); } else { - set(ctx.cwd, "needs-plan"); + const size: Size = /(^|\s)--complex(\s|$)/.test(a) ? "complex" : "standard"; + set(ctx.cwd, "needs-plan", size); ctx.ui.notify( - "scoped NON-TRIVIAL — draft a plan with /plan, get human sign-off, then /plan-approve to unlock implementation.", + `scoped NON-TRIVIAL (${size}) — draft a plan with /plan, get human sign-off, then /plan-approve to unlock implementation.`, "info", ); } @@ -90,7 +95,7 @@ export default function planGate(pi: ExtensionAPI) { description: "Shortcut for /scope --trivial: mark the current task trivial and unlock source edits", handler: async (_args, ctx) => { - set(ctx.cwd, "trivial"); + set(ctx.cwd, "trivial", "trivial"); ctx.ui.notify( "scoped TRIVIAL — source edits unlocked for this task. /plan-reset to re-arm.", "warn", @@ -111,7 +116,7 @@ export default function planGate(pi: ExtensionAPI) { return; } const planPath = (args ?? "").trim() || st.planPath; - set(ctx.cwd, "plan-approved", planPath); + set(ctx.cwd, "plan-approved", st.size ?? "standard", planPath); ctx.ui.notify( `plan approved${planPath ? ` (${planPath})` : ""} — implementation unlocked`, "info", diff --git a/plugins/dev-team/scripts/extensions.test.ts b/plugins/dev-team/scripts/extensions.test.ts index e72d772..3eb5f2d 100644 --- a/plugins/dev-team/scripts/extensions.test.ts +++ b/plugins/dev-team/scripts/extensions.test.ts @@ -10,9 +10,15 @@ import { tail, type VerifyState, } from "../extensions/lib/impl-verify-core.ts"; -import { globToRegExp, matchesAny } from "../extensions/lib/shared.ts"; +import { effectiveBand, globToRegExp, matchesAny } from "../extensions/lib/shared.ts"; import { gateDecision, isGatedSource } from "../extensions/plan-gate.ts"; +const BAND = { + ladder: ["small", "balanced", "deep"], + sizeBand: { trivial: "small", standard: "balanced", complex: "deep" }, + bumpStages: ["needs-plan"], +}; + let failures = 0; function check(name: string, cond: boolean, extra?: unknown): void { if (cond) { @@ -93,6 +99,23 @@ check("plan-gate: needs-plan -> need-plan", gateDecision("needs-plan") === "need check("plan-gate: trivial -> allow", gateDecision("trivial") === "allow"); check("plan-gate: plan-approved -> allow", gateDecision("plan-approved") === "allow"); +// --- effort-band model routing (phase-aware bump-from-floor) ------------- +// During planning (needs-plan): effective = max(floor, sizeBand[size]). +check("band: planning, small floor, complex -> deep", effectiveBand("small", "complex", "needs-plan", BAND) === "deep"); +check("band: planning, small floor, standard -> balanced", effectiveBand("small", "standard", "needs-plan", BAND) === "balanced"); +check("band: planning, small floor, trivial -> small", effectiveBand("small", "trivial", "needs-plan", BAND) === "small"); +check("band: planning never below floor (balanced/trivial)", effectiveBand("balanced", "trivial", "needs-plan", BAND) === "balanced"); +check("band: planning, deep floor holds", effectiveBand("deep", "standard", "needs-plan", BAND) === "deep"); +// Build (plan-approved): NO bump — everyone at floor (a solid plan makes the build routine). +check("band: build, small floor, complex -> small (no bump)", effectiveBand("small", "complex", "plan-approved", BAND) === "small"); +check("band: build, balanced floor, complex -> balanced (no bump)", effectiveBand("balanced", "complex", "plan-approved", BAND) === "balanced"); +// Trivial stage / unscoped: no bump -> floor. +check("band: trivial stage -> floor", effectiveBand("small", "complex", "trivial", BAND) === "small"); +check("band: unscoped (no stage) -> floor", effectiveBand("balanced", "complex", undefined, BAND) === "balanced"); +// Off-ladder + no config. +check("band: off-ladder floor (pinned) unchanged", effectiveBand("pinned", "complex", "needs-plan", BAND) === "pinned"); +check("band: no config -> floor", effectiveBand("small", "complex", "needs-plan", undefined) === "small"); + if (failures) { console.error(`\n${failures} check(s) failed`); process.exit(1); diff --git a/plugins/dev-team/skills/code-review/SKILL.md b/plugins/dev-team/skills/code-review/SKILL.md index 01faf13..c374edf 100644 --- a/plugins/dev-team/skills/code-review/SKILL.md +++ b/plugins/dev-team/skills/code-review/SKILL.md @@ -21,7 +21,7 @@ allowed-tools: >- # Code Review -Role: orchestrator. Route work to review agents; do not review code yourself. Pass each agent's tier alias (from its `model:` frontmatter) when dispatching — the PreToolUse hook `hooks/agent-model-resolve.sh` resolves it to the active snapshot per the Resolution Procedure in `agents/orchestrator.md`. +Role: orchestrator. Route work to review agents; do not review code yourself. Pass each agent's tier alias (from its `model:` frontmatter) when dispatching — OMP resolves the tier natively (`modelRoles`) and the `model-routing` extension applies effort-band routing per the Resolution Procedure in `agents/orchestrator.md`. Output templates and JSON schemas: [`code-review/output-format.md`](code-review/output-format.md). Example report: [`code-review/examples/sample-report.md`](code-review/examples/sample-report.md). @@ -29,7 +29,7 @@ Output templates and JSON schemas: [`code-review/output-format.md`](code-review/ 1. **Do not review code yourself.** Delegate all semantic analysis to review agents. 2. **Minimize context per agent.** Pass only what each agent's `Context needs` field requires. -3. **Route to the right model tier.** Each agent's `model:` frontmatter declares its tier alias (`haiku`/`sonnet`/`opus`); the PreToolUse hook `hooks/agent-model-resolve.sh` resolves it to the active snapshot per `agents/orchestrator.md` → Resolution Procedure. Do not override the frontmatter value. +3. **Route to the right model tier.** Each agent's `model:` frontmatter declares its tier alias (`haiku`/`sonnet`/`opus`); OMP resolves the tier natively (`modelRoles`) and the `model-routing` extension applies effort-band routing per `agents/orchestrator.md` → Resolution Procedure. Do not override the frontmatter value. 4. **Run deterministic gates first.** Lint, type-check, secret scan are cheaper than AI. Stop if they fail. 5. **Return structured results.** Aggregate agent JSON; do not add your own findings. 6. **Be concise.** Tables and JSON, no preambles, no filler. @@ -168,7 +168,7 @@ Spawn agents as parallel subagents in a single message using the `task` tool. - `full-file` → complete files - `project-structure` → full files + directory tree - When reviewing full repository (clean auto-scope, `--all`, or `--path`), always pass full files. -- **Model**: pass each agent's declared tier alias (`haiku`/`sonnet`/`opus`) from its `model:` frontmatter. The PreToolUse hook `hooks/agent-model-resolve.sh` resolves the tier to the active snapshot per `agents/orchestrator.md` → Resolution Procedure. +- **Model**: pass each agent's declared tier alias (`haiku`/`sonnet`/`opus`) from its `model:` frontmatter. The PreToolUse hook the `model-routing` extension resolves the tier to the active snapshot per `agents/orchestrator.md` → Resolution Procedure. - **Static analysis context**: if step 2b produced findings, inject into every agent's prompt using the format in `skill://static-analysis-integration`: "These issues were detected by static analysis. Do not re-report them. Focus on semantic concerns." - **Per-agent output**: `{"agentName": "", "status": "pass|warn|fail", "issues": [], "summary": "..."}` (full schema in `output-format.md`). diff --git a/plugins/dev-team/skills/dev-team-knowledge/agent-registry.md b/plugins/dev-team/skills/dev-team-knowledge/agent-registry.md index db3ebc0..04f612d 100644 --- a/plugins/dev-team/skills/dev-team-knowledge/agent-registry.md +++ b/plugins/dev-team/skills/dev-team-knowledge/agent-registry.md @@ -22,7 +22,7 @@ This file contains the complete registry tables. CLAUDE.md references this file ## Review Agents -Spawned by the orchestrator during Phase 3 inline checkpoints and full `/code-review` runs. Each agent declares a tier alias in its `model:` frontmatter; the PreToolUse hook `hooks/agent-model-resolve.sh` resolves it to the active snapshot per the **Resolution Procedure** in `agents/orchestrator.md`. +Spawned by the orchestrator during Phase 3 inline checkpoints and full `/code-review` runs. Each agent declares a tier alias in its `model:` frontmatter; OMP resolves the tier natively (`modelRoles`) and the `model-routing` extension applies effort-band routing per the **Resolution Procedure** in `agents/orchestrator.md`. | Agent | File | Model Tier | What It Checks | |-------|------|------------|----------------| diff --git a/plugins/dev-team/skills/dev-team-knowledge/model-routing.json b/plugins/dev-team/skills/dev-team-knowledge/model-routing.json index 0409ab5..f56c7ee 100644 --- a/plugins/dev-team/skills/dev-team-knowledge/model-routing.json +++ b/plugins/dev-team/skills/dev-team-knowledge/model-routing.json @@ -18,5 +18,17 @@ "rationale": "Cross-file reasoning, high-stakes design synthesis, threat modeling, broad reconnaissance (security-review, domain-review, arch-review, architect, security-engineer, codebase-recon)." } }, + "effortBand": { + "$comment": "Effort-band routing (phase-aware bump-from-floor): the model is derived from the TASK SIZE (task-size-classifier, recorded by /scope in plan-gate state) and the PIPELINE STAGE, floored by the agent's declared tier. Put the effort into spec/plan: the size raises the band ONLY while planning (stage in bumpStages, default needs-plan); once the plan is approved, build/review run at the floor (a solid plan makes the build routine). effective = (stage in bumpStages) ? max(floor, sizeBand[size]) : floor. The agent tier is a FLOOR — never routed below it (high-stakes deep agents hold). No signal -> floor (static, retro-compatible). copilot-preset is unaffected — the band picks a tier, then modelRoles resolves the model.", + "ladder": ["small", "balanced", "deep"], + "sizeBand": { + "trivial": "small", + "standard": "balanced", + "complex": "deep" + }, + "bumpStages": ["needs-plan"], + "enforcement": "advisory", + "$enforcementNote": "advisory: the extension logs + warns on a dispatch whose tier != the effort-band tier (the orchestrator applies the rule). Set env DEV_TEAM_EFFORT_ROUTING=enforce to upgrade the warning to a block, or =off to disable the band entirely (pure static tiers, agent tier only)." + }, "notes": "Keep the high-volume `small` tier on the cheapest capable model to limit token spend; reserve balanced/deep for work that needs them. Pair with the copilot-preset and token-diet plugins for further cost control." } diff --git a/plugins/dev-team/skills/dev-team-knowledge/task-size-classifier.md b/plugins/dev-team/skills/dev-team-knowledge/task-size-classifier.md index 28771cf..cf31d36 100644 --- a/plugins/dev-team/skills/dev-team-knowledge/task-size-classifier.md +++ b/plugins/dev-team/skills/dev-team-knowledge/task-size-classifier.md @@ -62,6 +62,18 @@ rework than the saved ceremony. Expected saving on small tasks (upstream measurement): **~65% fewer turns, ~45% lower cost** vs the full pipeline. +## Drives effort-band model routing + +The size is also the **effort signal for model routing** (phase-aware +bump-from-floor). `/scope` records the size and stage in plan-gate state; the +`model-routing` extension raises the band **only during planning** +(`stage = needs-plan`): target `trivial`→small, `standard`→balanced, +`complex`→deep, and each agent routes at **max(its floor tier, that target)** — +never below its declared tier. **Once the plan is approved, there is no bump** — +the build/review runs at the floor (a solid plan makes implementation routine). +So a complex task spends `deep` on spec/plan, not on mechanical build. See +`model-routing.json` → `effortBand`; inspect with `/routing`. + ## Decision logging Record each classification in `memory/decisions.md`: the signal values, the diff --git a/plugins/dev-team/skills/model-routing-check/SKILL.md b/plugins/dev-team/skills/model-routing-check/SKILL.md index 69401f5..39e5292 100644 --- a/plugins/dev-team/skills/model-routing-check/SKILL.md +++ b/plugins/dev-team/skills/model-routing-check/SKILL.md @@ -44,6 +44,11 @@ Four sections, in order: probe-supported; Bedrock, Vertex, and any other host is probe-skipped (manual override file recommended). +> **Effort-band routing** (task-size → band shift) is a separate, OMP-native +> concern handled by the `model-routing` extension. For the effective band per +> base at the current task size, run **`/routing`** — it reads +> `model-routing.json` → `effortBand` and the plan-gate size directly. + ## How to fix common findings - **Bumps appearing in the log** — the resolver is silently rerouting a diff --git a/plugins/dev-team/skills/review-agent/SKILL.md b/plugins/dev-team/skills/review-agent/SKILL.md index 135715a..7829eaa 100644 --- a/plugins/dev-team/skills/review-agent/SKILL.md +++ b/plugins/dev-team/skills/review-agent/SKILL.md @@ -21,7 +21,7 @@ You have been invoked with the `/review-agent` skill. Run a single named review This command is executed under orchestrator direction. Pass the named agent's tier alias (from its `model:` frontmatter) when dispatching — -the PreToolUse hook `hooks/agent-model-resolve.sh` resolves it to the +the PreToolUse hook the `model-routing` extension resolves it to the active snapshot per the Resolution Procedure in `.claude/agents/orchestrator.md`. ## Worker constraints diff --git a/plugins/dev-team/skills/test-design/SKILL.md b/plugins/dev-team/skills/test-design/SKILL.md index a99cd03..a3a26a8 100644 --- a/plugins/dev-team/skills/test-design/SKILL.md +++ b/plugins/dev-team/skills/test-design/SKILL.md @@ -20,7 +20,7 @@ does not review files itself — it coordinates. This command is executed under orchestrator direction. Dispatch each agent with its tier alias (from its `model:` frontmatter); the PreToolUse hook -`hooks/agent-model-resolve.sh` resolves it to the active snapshot per the +the `model-routing` extension resolves it to the active snapshot per the Resolution Procedure in `agents/orchestrator.md`. ## Orchestrator constraints