Skip to content

Latest commit

 

History

History
183 lines (123 loc) · 12.2 KB

File metadata and controls

183 lines (123 loc) · 12.2 KB

Long-Running Goals

Planr makes long-running, autonomous goal runs durable. A loop driver — Codex /goal, Claude Code /goal or /loop, an automation, or a human re-dispatching a skill — supplies continuation pressure: "do not stop until the goal holds." Planr supplies everything such a run loses between sessions: the plan, the task map, picks, evidence logs, review gates, approvals, and recovery.

This is complementary, not competing. /goal stays the orchestrator; Planr is the state layer underneath it. Without a native loop primitive you lose only automatic re-prompting — never state, evidence, or recovery.

Division Of Labor

Concern Owner
Continuation pressure, re-prompting, session autonomy loop driver (/goal, automation, human)
Scope, acceptance criteria, verification contract Planr plan (planr plan new/check)
Task state, dependencies, what is next Planr map (planr map, planr pick)
Stop condition that survives compaction Planr context (--tag goal-contract)
Proof the work happened and runs Planr logs (planr log, planr done)
Maker/checker separation Planr reviews (planr review) + subagent roles
Recovery after session loss or host switch Planr map + stored contract, from zero chat context

The Workflow

1. Prep — $planr-goal (once, interactive)

$planr-goal Add CSV export to the reports page, should work in the browser

The skill compiles the goal and stops — no implementation:

  • creates and checks a feature-scoped plan (planr plan new -> plan check; strict, empty sections fail),
  • builds the map and links execution order (planr map build is idempotent; planr link add ... --type blocks),
  • stores the goal contract durably in Planr:
planr context add "GOAL CONTRACT pl-csv-export: DONE when every in-scope map item is closed with log evidence, all reviews closed with verdict complete, no open approvals in scope, and a live browser verification log exists for the export flow. Iteration budget: 10." --tag goal-contract
  • prints the exact starter command for your host.

2. Execute — the loop driver runs $planr-loop

/goal Use $planr-loop on plan pl-csv-export. The loop contract is stored in planr context (tag: goal-contract). Continue until the contract holds or the iteration budget is exhausted. You are operating autonomously: the user is not watching, so never end a turn on a plan, a question, or a promise — proceed until the contract holds or you are blocked on input only the user can provide.

The autonomy clause matters on long runs: deep into a session, frontier models occasionally end a turn with a statement of intent instead of the corresponding action, or pause to ask permission they already have. Stating the operating mode up front prevents both.

Each iteration follows the $planr-loop protocol:

1. planr plan audit <plan-id>   one-call contract verdict; holds -> exit
2. $planr-work     pick exactly one ready item, implement, finish with planr done --review
3. live verify     run the platform verification, log it with planr log add --kind verification --cmd
4. $planr-review   independent audit; complete -> review close --close-target,
                   findings -> fix items become the next ready items
5. repeat

plan audit replaces the hand-rolled final audit: it checks items_settled, reviews_complete, approvals_clear, and verification_logged clause by clause with evidence, includes the stored goal contract, and answers holds: true/false in one command.

The per-item path is three commands since v1.1.6:

planr pick --json --plan <plan-id>                           # flat work packet, leased only from the goal's plan
planr done <item-id> --summary "..." --cmd "..." --review --next
planr review close <review-id> --verdict complete --reviewer <id> --close-target

--plan keeps the lease inside the goal contract: when several plans share the board (a parallel feature, leftovers from an aborted prep run), a plan-scoped goal run never picks work outside its own plan. A pick that finds nothing in scope never widens silently: it reports reason: "nothing_ready" when nothing is ready at all, or reason: "ready_items_excluded_by_filter" with the excluded items, the cause per item, and the exact repair pick commands when ready work exists outside the filter.

done/close/review close responses and the pick packet include a remaining snapshot (counts with explicit zeros for every status, settled, total), so the orchestrator evaluates the stop condition straight from the completion output — no extra map status round-trip. The same responses list what each settlement unlocked, so the loop sees its next work without re-reading the map. --next never hands a worker its own freshly created review, so maker and checker stay separate even in compact loops. The review verdict records review_mode (single_agent or independent) automatically from worker identity — no ceremony note needed. The contract's "all reviews closed" clause audits review items that exist; an item closed with plain done satisfies the contract without a review gate, so low-signal reviews can be skipped without blocking plan audit.

3. Finish

When the contract holds, the loop exits through $planr-summary: an evidence-backed account of what shipped, which commands proved it, and what (if anything) stayed blocked.

Recovery

The defining property of a long-running goal: the session will die before the goal does. With Planr that costs nothing. Start a new session — same host or a different one — with the same starter line:

/goal Use $planr-loop on plan pl-csv-export. The loop contract is stored in planr context (tag: goal-contract).

Iteration 1 reads the map and the stored contract: items already settled stay settled, open reviews stay open, the next ready item is picked. No chat history needed. planr recover sweep handles stale picks from interrupted workers.

Per-Host Setup

Codex with /goal

The recommended combination. Install the plugin and provision the subagent roles once:

codex plugin marketplace add instructa/planr
codex plugin add planr@planr
planr project init "My Product" --client codex   # writes .codex/agents/planr-worker.toml + planr-reviewer.toml

Then:

$planr-goal <your goal>          # prep: plan, map, contract, starter command
/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).

The /goal PM dispatches spawn the planr_worker agent for item <id> and spawn the planr_reviewer agent for item <id> — the role files preload $planr-work and $planr-review, so dispatches stay one line. Codex Automations work the same way: set the automation prompt to the starter line. The provisioned worker role pins a cheaper effort tier; see Cost Tiering.

Claude Code

Same shape via the plugin (/plugin install planr@planr), which registers the planr-worker and planr-reviewer subagents automatically:

/planr:planr-goal <your goal>
/goal Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).

/loop works for fixed-cadence runs instead of goal-conditioned ones. The registered worker subagent pins a cheaper model tier; see Cost Tiering.

Cursor and hosts without a loop primitive

Identical protocol; the human (or a background agent) is the re-dispatcher:

Use $planr-goal: <your goal>
Use $planr-loop on plan <plan-id>. The loop contract is stored in planr context (tag: goal-contract).

$planr-loop iterates within the session under its own budget. If the session ends before the contract holds, dispatch the same line again — recovery is identical to the /goal case.

Plain MCP clients

Any MCP-capable agent uses the same flow over planr mcp. Every session starts with map state, so the loop is resumable by construction.

Cost Tiering

A goal run has three roles with different intelligence needs, so they should not all run on the same model tier:

  • Driver (the /goal session): decomposition, dispatch decisions, conflict resolution, final synthesis. Run it on the strongest model you have — this is never configured in Planr files, it is simply the model of the main session.
  • Worker: bounded implementation. planr pick --json is a complete handoff packet (one item, scope, evidence format, stop after review request), so the worker runs safely on a cheaper tier.
  • Reviewer: the truth gate. It inherits the driver's model on purpose — make workers cheap, not the verdict.

Where each host configures the worker tier (the shipped role files carry these defaults):

Host Driver Worker Configured in
Codex session default (e.g. gpt-5.5 at high) model = "gpt-5.5", model_reasoning_effort = "medium" .codex/agents/planr-worker.toml
Claude Code session model (e.g. fable at high via /model + /effort) model: opus, effort: medium planr-worker.md frontmatter
Cursor chat model of the driving session chosen per dispatch in the host's subagent tooling no Planr files — pick a cheaper model when dispatching the worker task

The defaults use aliases and generic names so they track model generations; pin a full model id (e.g. claude-opus-4-8) only if you need determinism, and use model: sonnet as the budget alternative. The role files are user-owned copies — planr project init provisions them once and never overwrites local edits — so changing the tier is editing one line.

Two traps to verify once per setup:

  • Claude Code: the CLAUDE_CODE_SUBAGENT_MODEL environment variable silently overrides every subagent's model: frontmatter. Make sure it is unset, then dispatch the worker on a trivial item and confirm the subagent's messages in the session log (~/.claude/projects/<project>/*.jsonl) carry the worker model, not the driver's.
  • Codex: some versions ignore custom agent files on spawn (openai/codex#26868) — the child then inherits the parent model. Spawn planr_worker on a trivial item and confirm the child metadata shows the pinned model and effort with a non-null agent_path.

Both failure modes are silent (the run still works, just at driver prices), which is why the smoke test is worth the two minutes.

Coming From Other Goal Tools

If you already run goal workflows with other tools, the concepts map directly:

Elsewhere In Planr
Goal charter file (goal.md) product/build plan (planr plan new, rich scope + verification)
Board/state file (state.yaml) the map (planr map show, authoritative item state)
One active task planr pick (single owner, heartbeat, stale recovery)
Task receipts planr log / planr done (files, commands, results)
Goal oracle / completion proof goal contract + live verification log
Scout/Judge/Worker roles worker/reviewer subagents + $planr-status for honest reads
Final audit before done $planr-review with review close --verdict complete

Using such tools for intake or visualization alongside Planr is fine — keep one rule: the Planr map stays the single source of truth for item status, links, picks, reviews, approvals, and completion.

Rules That Keep Goal Runs Honest

  • Never weaken a stored goal contract mid-run; scope changes go through $planr-plan and the user.
  • "Done" means the feature ran (live verification log), not that it compiles.
  • The maker never closes its own review; single-agent hosts record review-mode honestly.
  • Two iterations without map-state movement -> stop and report instead of grinding.
  • Destructive or out-of-repo side effects always go behind planr approval request.
  • Lessons that should outlive the iteration (a confirmed approach, a correction, a dead end) go into planr context add "..." --tag lesson — the next iteration or a fresh session recovers them with planr context list --tag lesson, not from chat history.

See also: Skills, Operating Model, Task Graph Model, Codex, Claude Code, Cursor.