Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions docs/effort-band-routing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Effort-band model routing (phase-aware bump-from-floor)

Apply the upstream "effort-band" routing idea (v7.0) to our port **without
breaking** the tier model or `copilot-preset`: the model is derived from the
**task size and the pipeline phase**, floored by the agent's declared tier.

## The insight

Our three tiers (`small`/`balanced`/`deep`) already *are* three effort bands. The
improvement over one-model-per-agent: **derive the band from objective signals**
(the `task-size-classifier`, recorded by `/scope`) — and spend the effort where
it pays. The effort belongs in **spec/plan**: a complex task needs deep reasoning
to design and decompose; once the plan is solid, **implementation is routine**.
So the size raises the band **only while planning**, and the build runs at the
floor.

## How it works

`model-routing.json` gains an `effortBand`:

```
ladder: [small, balanced, deep]
sizeBand: { trivial: small, standard: balanced, complex: deep }
bumpStages: [needs-plan] # the planning phase: /scope -> /specs -> /plan
```

Resolution (pure `effectiveBand(floor, size, stage)` in `lib/shared.ts`):

```
effective = (stage in bumpStages) ? max_on_ladder(floor, sizeBand[size]) : floor
```

- **Planning** (`stage = needs-plan`): the size bumps the band — a `complex` plan
runs the architect + the five plan-review critics at `deep`.
- **Build/review** (`stage = plan-approved`): **no bump** — implementers and
reviewers run at their **floor**. Don't spend `deep` on mechanical
implementation once the plan is solid.
- **Never below the floor** — high-stakes agents (deep floor: security/domain/
arch-review, architect, security-engineer, codebase-recon) always hold at deep.
- **No signal / trivial / unscoped → the floor** (backward-compatible with
today's static tiers).

The `model-routing` extension reads the floor (`agentModel`) + size + stage
(plan-gate state), **logs** every dispatch to `.omp/state/model-routing.log`, and:

- default **`advisory`** — warns when the dispatched tier ≠ the band tier;
- `DEV_TEAM_EFFORT_ROUTING=enforce` — blocks and names the model to use;
- `=off` — disables the band (floor tier only).

`copilot-preset` is untouched — the band picks a *tier*; `modelRoles` (or the
Copilot remap) resolves the concrete model.

## Decisions

- **Effort into spec/plan, not the build.** A complex task bumps the *planning*
agents to deep; the build/review runs at the floor. This matches "put the
package on the plan — if the plan is solid, implementation is trivial," and it
avoids the failure mode of routing every lexical reviewer (naming, complexity,
a11y) on opus just because the task is complex.
- **Bump-from-floor, never below the agent's tier.** The tier is a quality floor;
the size can only raise it, and only while planning. Keeps the north star
(quality first) without over-spending.
- **Default advisory, not enforce.** The orchestrator applies the rule; the
extension audits and warns. `enforce` is opt-in so a not-yet-updated caller
isn't bricked; `off` is the escape hatch.

## Pre-existing routing debt cleaned up

- The orchestrator's **Resolution Procedure** described a Claude-Code hook
(`agent-model-resolve.sh`, `.claude/model-overrides.json`, `updatedInput`,
`deny`, a non-existent `docs/model-routing.md` + ADR 0004) the OMP extension
never implemented. Rewritten to reality + the new behavior.
- 4 skills + the agent-registry referenced the same stale
`hooks/agent-model-resolve.sh` — repointed to the `model-routing` extension.

Left as-is (separate, larger debt): the `/model-routing-check` skill's exec block
still targets the old `.claude/` resolver; `/routing` is the accurate effort-band
diagnostic and the skill now points to it.

## Verified
`ci-validate-json` 23/23 · extensions compile · unit suite green (incl. 11
phase-aware `effectiveBand` cases) · no `agent-model-resolve.sh` references remain.
18 changes: 9 additions & 9 deletions plugins/dev-team/agents/orchestrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,19 +26,19 @@ thinking-level: medium
- Task classification algorithm
- Load balancing logic

## Resolution Procedure
## Resolution Procedure (floor tier + effort band)

Each agent's `model:` frontmatter encodes the tier alias (`haiku`, `sonnet`, `opus`) appropriate for its task. Tier-to-snapshot resolution is **enforced by OMP extension `model-routing` + `.omp/config.yml` `modelRoles`**. The LLM cannot bypass it.
Each agent's `model:` frontmatter declares its **floor tier** — `pi/smol` (small), `claude-sonnet-4-6` (balanced), or `claude-opus-4-8` (deep). Tier resolution is **native OMP**: `.omp/config.yml` `modelRoles` (and, with the copilot-preset plugin, the Copilot remap) turn the tier into a concrete model. The source of truth for tiers is `skill://dev-team-knowledge/model-routing.json`.

When the orchestrator (or any caller) spawns a subagent via the `task` tool with `model: <tier>`, the extension:
On top of the floor, the `model-routing` extension applies **phase-aware effort-band routing (bump-from-floor)**. The effort goes into **spec/plan**, not the build: the **task size** (recorded at `/scope` → plan-gate state; classifier `skill://dev-team-knowledge/task-size-classifier.md`) raises the band **only while planning** (`stage = needs-plan`, i.e. `/scope` → `/specs` → `/plan`):

1. Reads `skill://dev-team-knowledge/model-routing.json` — the single source of truth for tier → snapshot mapping.
2. Reads `.claude/model-overrides.json` if present (per-user, gitignored, populated by the `/init-dev-team` probe or by hand for restricted endpoints).
3. Walks the alias chain up to 3 hops along the `haiku → sonnet → opus` cascade. Each tier alias resolves to either another tier (bumped) or the literal `"unavailable"` sentinel (refusal).
4. On any bump, rewrites `tool_input.model` via `hookSpecificOutput.updatedInput` and appends one JSONL event to `.claude/metrics/model-routing.log`.
5. On exhaustion, cycle, missing routing.json, or malformed overrides, emits `permissionDecision: "deny"` with an actionable `permissionDecisionReason`. The dispatch never reaches the harness.
- During planning, target band: `trivial` → `small`, `standard` → `balanced`, `complex` → `deep`; effective = the **higher of the agent's floor and that target**. So a `complex` plan runs the architect and plan-review critics at `deep`.
- Once the plan is **approved** (`stage = plan-approved`, the build/review phase), there is **no bump** — implementers and reviewers run at their **floor**. A solid plan makes the build routine, so don't spend `deep` on mechanical implementation.
- An agent is **never routed below its floor** (high-stakes deep agents — security/domain/arch-review, architect, security-engineer, codebase-recon — always hold at deep). No signal / trivial / unscoped → the floor (static, backward-compatible).

For triage, run `/model-routing-check` — read-only diagnostic that prints the effective map, overrides, recent bumps, and probe applicability. See `docs/model-routing.md` for contract, fallback firing, hand-writing overrides, and Bedrock/Vertex/proxy troubleshooting. See [ADR 0004](../../../docs/adr/0004-pre-dispatch-model-resolution.md) for the design rationale (pre-dispatch vs. runtime retry; hook vs. orchestrator instruction).
So when you spawn a subagent via `task`, **pass the effort-band tier** = `(stage in bumpStages) ? max(floor, sizeBand[size]) : floor` (data in `model-routing.json` → `effortBand`). The extension logs every dispatch to `.omp/state/model-routing.log` and, by default (`advisory`), **warns** when the dispatched tier ≠ the band tier. `DEV_TEAM_EFFORT_ROUTING=enforce` upgrades the warning to a **block** naming the model to use; `=off` disables the band (floor tier only).

For triage, run `/routing` or `/model-routing-check` (read-only): the tier map plus, per floor, the effective band for the current stage + task size.

### Tier guidance (informational)

Expand Down
7 changes: 4 additions & 3 deletions plugins/dev-team/commands/help.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@

- **Pipeline** (enforced order): `/scope` (pre-analysis) → `/specs` → `/plan` →
`/plan-approve` → `/build` → `/code-review` → `/review-approve` → `/pr`
- **Plan gate**: `/scope [--trivial]`, `/trivial`, `/plan-approve [path]`,
`/plan-reset` — source edits are blocked until the task is scoped and (if
non-trivial) a plan is approved.
- **Plan gate**: `/scope [--trivial | --complex]`, `/trivial`,
`/plan-approve [path]`, `/plan-reset` — source edits are blocked until the task
is scoped and (if non-trivial) a plan is approved. The scope size also drives
effort-band model routing (`/routing`).
- **Verify**: `/impl-verify` (strict build + tests, bounded verdict)
- **Review**: `/code-review` (`/review`), `/review-agent`, `/review-approve`
- **Guardrails**: `/careful on|off`, `/freeze <glob>`, `/unfreeze`,
Expand Down
18 changes: 18 additions & 0 deletions plugins/dev-team/config.snippet.yml
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,24 @@ eval:
# .csproj/.sln -> dotnet, Cargo.toml -> rust, go.mod -> go, pyproject.toml ->
# python, package.json -> node (dotnet wins a mixed repo).

# ---------------------------------------------------------------------------
# EFFORT-BAND MODEL ROUTING (model-routing extension)
# ---------------------------------------------------------------------------
# Phase-aware bump-from-floor: the effort goes into spec/plan, not the build. The
# TASK SIZE (set by /scope) raises the band ONLY while planning (stage
# needs-plan): target trivial->small, standard->balanced, complex->deep, and
# effective = max(agent floor, target). Once the plan is approved, build/review
# run at the floor (no bump) — a solid plan makes the build routine. An agent is
# never routed BELOW its tier (high-stakes deep agents hold). No signal -> floor.
# Configured in skills/dev-team-knowledge/model-routing.json -> effortBand;
# copilot-preset unaffected (the band picks a tier; modelRoles resolves it).
#
# Enforcement is set there (default "advisory") and overridable per-run by env:
# DEV_TEAM_EFFORT_ROUTING=advisory # log + warn on mismatch (default)
# DEV_TEAM_EFFORT_ROUTING=enforce # block a mis-banded dispatch
# DEV_TEAM_EFFORT_ROUTING=off # disable the band (pure static tiers)
# Inspect with /routing or /model-routing-check.

# ---------------------------------------------------------------------------
# TASK / SUBAGENT GRAPH
# ---------------------------------------------------------------------------
Expand Down
37 changes: 37 additions & 0 deletions plugins/dev-team/extensions/lib/shared.ts
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,45 @@ export interface RoutingTier {
rationale: string;
}

export interface EffortBandConfig {
ladder: string[];
// Task size -> target band (e.g. { trivial: "small", standard: "balanced", complex: "deep" }).
sizeBand: Record<string, string>;
// Pipeline stages where the size bump applies. The effort goes into spec/plan
// (`needs-plan`); once the plan is approved, implementation/review run at the
// floor — a solid plan makes the build routine. Default: ["needs-plan"].
bumpStages?: string[];
enforcement?: string;
}

export interface RoutingConfig {
tiers: Record<string, RoutingTier>;
effortBand?: EffortBandConfig;
}

// Resolve the effective band from the agent's FLOOR tier, the task size, and the
// pipeline stage. Phase-aware bump-from-floor: the size raises the band ONLY
// during the planning phase(s) (`bumpStages`, default ["needs-plan"]) — put the
// effort into spec/plan; post-plan build/review runs at the floor. An agent is
// never routed BELOW its tier (high-stakes deep agents hold). Pure, no I/O.
// - a floor outside the ladder (pinned/default) -> unchanged
// - stage not in bumpStages (build, trivial, unscoped) -> the floor
// - else -> the higher of (floor, sizeBand[size]) on the ladder
export function effectiveBand(
floor: string,
size: string | undefined,
stage: string | undefined,
cfg: EffortBandConfig | undefined,
): string {
if (!cfg || !Array.isArray(cfg.ladder)) return floor;
const fi = cfg.ladder.indexOf(floor);
if (fi === -1) return floor; // pinned/default — not on the ladder
const bumpStages = cfg.bumpStages ?? ["needs-plan"];
if (!stage || !bumpStages.includes(stage)) return floor; // no bump outside planning
const target = size ? cfg.sizeBand?.[size] : undefined;
const ti = target ? cfg.ladder.indexOf(target) : -1;
if (ti === -1) return floor;
return cfg.ladder[Math.max(fi, ti)];
}

export function loadRouting(): RoutingConfig | null {
Expand Down
98 changes: 74 additions & 24 deletions plugins/dev-team/extensions/model-routing.ts
Original file line number Diff line number Diff line change
@@ -1,69 +1,119 @@
// model-routing.ts — dispatch tier logger + routing diagnostic.
// model-routing.ts — effort-band dispatch routing + observability + diagnostic.
//
// In OMP the actual tier -> model mapping is native (.omp/config.yml modelRoles
// + each agent's `model:` frontmatter). This extension adds an observability
// layer the config alone doesn't give you: it records every subagent dispatch
// with its resolved tier to .omp/state/model-routing.log, and registers a
// `/routing` command that prints the tier table.
// Base model resolution in OMP is native (.omp/config.yml modelRoles + each
// agent's `model:` frontmatter). This extension adds **effort-band routing** on
// top: the model is derived from the TASK SIZE (trivial/standard/complex, from
// the task-size-classifier, recorded by /scope in plan-gate state), not only
// from the agent's static tier. The agent's declared tier is the BASE band; the
// task size shifts it along the ladder [small, balanced, deep]. This is more
// deterministic than one-model-per-agent and ties spend to the work.
//
// All tiers are cloud. The small tier (`pi/smol`) is a CHEAP-cloud, high-volume
// tier (default modelRoles.smol = claude-haiku-4-5; cheaper still via the
// copilot-preset plugin). Source of truth:
// skills/dev-team-knowledge/model-routing.json.
// It also logs every dispatch (observability) and registers /routing.
//
// Enforcement (config `effortBand.enforcement`, overridable by env
// DEV_TEAM_EFFORT_ROUTING):
// off — no band; pure static tiers (just log).
// advisory — (default) log + warn when the dispatched tier != the band tier.
// enforce — block the dispatch and tell the caller the model to use.
//
// copilot-preset is unaffected: the band picks a TIER, then modelRoles resolves
// the concrete model (Anthropic id or github-copilot/*).

import type { ExtensionAPI } from "@oh-my-pi/pi-coding-agent";
import {
type EffortBandConfig,
agentModel,
appendJSONL,
effectiveBand,
loadRouting,
nowISO,
readState,
statePath,
} from "./lib/shared.ts";

const SMALL_TIER_MODELS = new Set(["pi/smol", "smol"]);

function tierOf(model: string | null): string {
if (model === null) return "default";
if (model === null || model === "") return "default";
if (SMALL_TIER_MODELS.has(model)) return "small";
if (model.includes("opus")) return "deep";
if (model.includes("sonnet")) return "balanced";
if (model === "small" || model === "balanced" || model === "deep") return model;
return "pinned";
}

function taskState(cwd: string): { size?: string; stage?: string } {
return readState<{ size?: string; stage?: string }>(cwd, "plan-gate.json", {});
}

export default function modelRouting(pi: ExtensionAPI) {
pi.setLabel("model-routing");

// Log each subagent dispatch with its resolved tier (observability only —
// never blocks).
pi.on("tool_call", async (event, ctx) => {
if (event.toolName !== "task") return;
const agent = String((event.input as Record<string, unknown>).agent ?? "");
const input = event.input as Record<string, unknown>;
const agent = String(input.agent ?? "");
if (!agent) return;
const model = agentModel(agent);

const routing = loadRouting();
const band: EffortBandConfig | undefined = routing?.effortBand;
const mode = (process.env.DEV_TEAM_EFFORT_ROUTING ?? band?.enforcement ?? "advisory").toLowerCase();

const floor = tierOf(agentModel(agent)); // the agent's declared tier = floor
const { size, stage } = taskState(ctx.cwd);
const dispatched = tierOf(input.model == null ? null : String(input.model)) === "default"
? floor // no explicit model -> the agent's floor tier is used
: tierOf(String(input.model));
const eff = mode === "off" ? floor : effectiveBand(floor, size, stage, band);

appendJSONL(statePath(ctx.cwd, "model-routing.log"), {
ts: nowISO(),
agent,
model,
tier: tierOf(model),
floor,
stage: stage ?? null,
size: size ?? null,
effective: eff,
dispatched,
mode,
});

if (mode === "off" || eff === dispatched || floor === "default" || floor === "pinned") {
return;
}

const wantModel = routing?.tiers?.[eff]?.frontmatter ?? eff;
const msg =
`effort-band: ${agent} floor=${floor}, stage=${stage ?? "—"}, size=${size ?? "—"} -> dispatch at "${eff}" ` +
`(model: ${wantModel}), but got "${dispatched}".`;
if (mode === "enforce") {
return {
block: true,
reason: `${msg}\nRe-dispatch with model: ${wantModel}. (Set DEV_TEAM_EFFORT_ROUTING=advisory to warn instead, or =off to disable.)`,
};
}
ctx.ui.notify(msg, "warn"); // advisory
});

// Diagnostic: print the tier -> frontmatter map (complements the
// /model-routing-check skill, which also resolves against your config).
pi.registerCommand("routing", {
description: "Show the dev-team tier -> model map",
description: "Show the dev-team tier map + effort-band (current task size and effective bands)",
handler: async (_args, ctx) => {
const routing = loadRouting();
if (!routing) {
ctx.ui.notify(
"no model-routing.json found in skills/dev-team-knowledge",
"error",
);
ctx.ui.notify("no model-routing.json found in skills/dev-team-knowledge", "error");
return;
}
const lines = Object.entries(routing.tiers).map(
([tier, t]) => `${tier}: ${t.frontmatter} (${t.intent})`,
);
const band = routing.effortBand;
if (band) {
const { size, stage } = taskState(ctx.cwd);
const mode = (process.env.DEV_TEAM_EFFORT_ROUTING ?? band.enforcement ?? "advisory").toLowerCase();
lines.push("", `effort-band [${mode}] — stage: ${stage ?? "—"}, size: ${size ?? "—"}`);
for (const b of band.ladder) {
lines.push(` floor ${b} -> ${effectiveBand(b, size, stage, band)}`);
}
}
ctx.ui.notify(lines.join("\n"), "info");
},
});
Expand Down
Loading
Loading