diff --git a/AGENTS.md b/AGENTS.md index ea82a62..12af3d4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -69,6 +69,7 @@ README.md public overview and development notes .agents/skills/ shared skills, committed .claude/skills symlink to .agents/skills for claude compatibility bin/ helper scripts, committed; read each script's header before first use + fm-completeness-check.sh formal completeness gate: proves a task's done/teardown/merge claim consistent with the directives (Z3-backed via fm-completeness.py + fm-completeness.rules.json). Wired into fm-teardown.sh and fm-merge-local.sh; FAILS OPEN when the solver tooling is absent, so it never wedges the lifecycle and the existing bash checks remain the hard guarantee. Off-switch: FM_COMPLETENESS_GATE=0; enforce-when-broken: FM_COMPLETENESS_STRICT=1 config/crew-harness crewmate harness override; LOCAL, gitignored; absent or "default" = same as firstmate data/ personal fleet records; LOCAL, gitignored as a whole backlog.md task queue, dependencies, history @@ -113,6 +114,10 @@ Otherwise it prints one line per problem or capability fact; handle each: - `FLEET_SYNC: : skipped: ` - bootstrap continued; investigate only if the dirty, diverged, or offline clone blocks work. - `TASKS_AXI: available` - an optional capability fact, not a problem; record it silently and use section 10 for backlog mutations. It prints only after the `tasks-axi` compatibility probe passes for version 0.1.1 or newer; absence or incompatibility only falls back to hand-editing and never blocks work. +- `COMPLETENESS_GATE: available` - an optional capability fact, not a problem; record it silently and never surface it to the captain. + Bootstrap prints this only when `python3` can import `z3`. + When present, the formal completeness gate (`bin/fm-completeness-check.sh`, wired into teardown and the local merge) actively proves each done/teardown/merge claim against the directives; when `z3` is absent the gate fails open and firstmate relies on the existing bash safety checks exactly as before. + It is never a missing tool to install: `fm-bootstrap.sh install z3` opts in, and its absence never blocks work. Bootstrap's fleet refresh is bounded by `FM_FLEET_SYNC_BOOTSTRAP_TIMEOUT` seconds, default 20; a timeout is reported as a `FLEET_SYNC` skip and does not block startup. @@ -339,12 +344,12 @@ A ship task's path from `done` to landed on `main` is set by the project's `mode - **no-mistakes** - the stages below as written: no-mistakes validation pipeline -> PR -> captain merge. - **direct-PR** - no pipeline. The crewmate pushes and opens the PR itself (its brief says so) and reports `done: PR `. Skip the Validate step and go straight to PR ready (run `fm-pr-check`, relay the PR). Teardown uses the normal pushed-branch check. -- **local-only** - no remote, no PR. The crewmate stops at `done: ready in branch fm/`. Review the diff with `bin/fm-review-diff.sh `, relay a one-paragraph summary to the captain, and on approval run `bin/fm-merge-local.sh ` to fast-forward local `main` (it refuses anything but a clean fast-forward - if it does, have the crewmate rebase). No `fm-pr-check`. Then teardown, whose safety check requires the branch already merged into local `main`, OR the work pushed to any remote (a fork counts - relevant for upstream-contribution PRs on a local-only-registered project). +- **local-only** - no remote, no PR. The crewmate stops at `done: ready in branch fm/`. Review the diff with `bin/fm-review-diff.sh `, relay a one-paragraph summary to the captain, and on approval run `FM_CAPTAIN_APPROVED=granted bin/fm-merge-local.sh ` to fast-forward local `main` (it refuses anything but a clean fast-forward - if it does, have the crewmate rebase). The `FM_CAPTAIN_APPROVED` assertion is the completeness gate's directive-#2 check: the merge is firstmate exercising the captain's merge authority, so the approval must be asserted explicitly (`granted`, or `not_required` under yolo); without it the gate blocks the merge when the solver tooling is present, and fails open (proceeds) when it is absent. No `fm-pr-check`. Then teardown, whose safety check requires the branch already merged into local `main`, OR the work pushed to any remote (a fork counts - relevant for upstream-contribution PRs on a local-only-registered project). When reviewing any crewmate branch diff, use `bin/fm-review-diff.sh ` rather than `git diff ...branch` directly. Pooled clones keep their local default refs frozen at clone time and can lag `origin`; the helper always compares against the authoritative base. -**yolo (orthogonal).** With `yolo=off` (default) every approval is the captain's: ask-user findings, PR merges, the local-only merge. With `yolo=on`, firstmate makes those calls itself without asking - resolve ask-user findings on your judgment, and run `gh-axi pr merge` / `bin/fm-merge-local.sh` once the work is green/approved - EXCEPT anything destructive, irreversible, or security-sensitive, which still escalates to the captain. Never merge a red PR even under yolo. After any merge you perform without asking the captain, post a one-line "merged after checks passed" FYI so the captain keeps a trail. +**yolo (orthogonal).** With `yolo=off` (default) every approval is the captain's: ask-user findings, PR merges, the local-only merge. With `yolo=on`, firstmate makes those calls itself without asking - resolve ask-user findings on your judgment, and run `gh-axi pr merge` / `FM_CAPTAIN_APPROVED=not_required bin/fm-merge-local.sh` once the work is green/approved - EXCEPT anything destructive, irreversible, or security-sensitive, which still escalates to the captain. Never merge a red PR even under yolo. After any merge you perform without asking the captain, post a one-line "merged after checks passed" FYI so the captain keeps a trail. ### Validate diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1820a95..40d4de9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -38,7 +38,7 @@ See the [no-mistakes quick start](https://kunchenguid.github.io/no-mistakes/star Everything personal to one captain's fleet (`data/`, `state/`, `config/`, `projects/`, `.no-mistakes/`) is gitignored; never commit it. The root `.tasks.toml` is tracked `tasks-axi` config for `data/backlog.md`; compatible `tasks-axi` uses it for routine backlog mutations. It does not make `data/` tracked. -- Helper scripts in `bin/` are plain bash. +- Helper scripts in `bin/` are plain bash, with one exception: `fm-completeness.py` is the optional Z3-backed completeness engine driven by `fm-completeness-check.sh`. Each starts with a usage header comment; keep it accurate when you change behavior. Test scripts and helpers in `tests/` are plain bash too. `shellcheck bin/*.sh tests/*.sh` must pass, and CI enforces it. diff --git a/README.md b/README.md index 1fd3f94..45b5adb 100644 --- a/README.md +++ b/README.md @@ -142,6 +142,8 @@ firstmate works from any terminal - outside tmux, crewmates land in a detached ` - **Local clones stay fresh** - bootstrap and PR-based teardown refresh remote-backed project clones with clean default-branch fast-forwards when the clone is on the default branch and has no local work, and prune local branches whose remote is gone and that no worktree still needs. - **Self-updates stay safe** - `/updatefirstmate` fast-forwards the running firstmate repo and registered secondmate homes from `origin`, then re-reads updated instructions and nudges updated secondmates without touching project clones. The update is fast-forward only: dirty, diverged, offline, and off-default targets are reported and left untouched. +- **Lifecycle claims are provable** - an optional formal completeness gate (`bin/fm-completeness-check.sh`, Z3-backed) wires into teardown and the local merge to prove each done/teardown/merge claim consistent with the directives before the irreversible step. + Bootstrap reports it as available only when `python3` can import `z3`; when the solver is absent the gate fails open and the existing bash safety checks remain the hard guarantee. - **Restart-proof** - all state lives in tmux, status files, local markdown under `data/`, `data/secondmates.md`, and persistent secondmate homes. Kill the first mate session anytime; the next one reconciles and carries on. @@ -161,7 +163,7 @@ The first mate drives these; you rarely need to, but they work by hand too. | `fm-home-seed.sh` | Lease/provision a secondmate home transactionally, clone projects, initialize gates, and maintain `data/secondmates.md` | | `fm-spawn.sh` | Spawn one task, several `id=repo` pairs, or a persistent secondmate with `--secondmate` | | `fm-project-mode.sh` | Resolve a project's delivery mode and `+yolo` flag from `data/projects.md` | -| `fm-merge-local.sh` | Fast-forward a `local-only` project's local default branch after approval | +| `fm-merge-local.sh` | Fast-forward a `local-only` project's local default branch after the captain's approval is asserted via `FM_CAPTAIN_APPROVED` | | `fm-review-diff.sh` | Review a crewmate branch against the authoritative base, with optional `--stat` output | | `fm-watch-arm.sh` | Verified per-home watcher re-arm; reports `started`, `healthy`, or `FAILED`; `--restart` relaunches only this home's watcher | | `fm-watch.sh` | Singleton-safe one-shot watcher; blocks until supervision work is due, queues it durably, then exits with one reason line | @@ -173,6 +175,7 @@ The first mate drives these; you rarely need to, but they work by hand too. | `fm-pr-check.sh` | Record a PR-ready task and arm the watcher's merge poll | | `fm-promote.sh` | Promote a scout task in place so it becomes a protected ship task | | `fm-teardown.sh` | Return the worktree or retire/release a secondmate home; protects ship work, requires scout reports, checks child work, and prints the backlog reminder | +| `fm-completeness-check.sh` | Formal completeness gate wired into teardown and the local merge: proves a done/teardown/merge claim consistent with the directives (Z3-backed via `fm-completeness.py` + `fm-completeness.rules.json`); fails open when the solver is absent | | `fm-harness.sh` | Detect the running harness; resolve the effective crewmate harness | | `fm-lock.sh` | Per-home firstmate session lock | @@ -227,6 +230,11 @@ FM_WATCHER_STALE_GRACE=300 # defaults to FM_GUARD_GRACE; seconds a live watche FM_SIGNAL_GRACE=30 # seconds to coalesce nearby status and turn-end signals into one wake FM_FLEET_SYNC_BOOTSTRAP_TIMEOUT=20 # seconds allowed for bootstrap's best-effort clone refresh FM_FLEET_PRUNE=1 # set to 0 to skip pruning local branches whose upstream is gone +# formal completeness gate (bin/fm-completeness-check.sh); optional, fails open without z3 +FM_COMPLETENESS_GATE=1 # set to 0 to skip the gate entirely (always exits 0) +FM_COMPLETENESS_STRICT=0 # set to 1 to refuse instead of fail open when the solver tooling is absent/broken +FM_COMPLETENESS_RULES= # optional override for the rules file (default bin/fm-completeness.rules.json) +FM_CAPTAIN_APPROVED= # assert the captain's merge approval at the merge gate: granted|yes|1|true, or not_required under yolo FM_BUSY_REGEX='esc (to )?interrupt|Working\.\.\.' # busy-pane signatures, shared by watcher and tmux helper FM_COMPOSER_IDLE_RE= # optional empty-composer regex, applied after dim-ghost and border stripping FM_SEND_RETRIES=3 # fm-send Enter-retry attempts after typing the line once @@ -271,6 +279,7 @@ tests/fm-spawn-batch.test.sh # fm-spawn.sh batch (id=repo) argument tests/fm-bootstrap.test.sh # bootstrap dependency and feature-probe tests tests/fm-update.test.sh # fast-forward-only self-update, reread, nudge, dedup, and skip-safety tests tests/fm-teardown.test.sh # fm-teardown.sh safety and reminder checks: local-only fork-remote allow, truly-unpushed refuse, merged-to-main allow, no-mistakes regression, tasks-axi reminder, --force override +tests/fm-completeness.test.sh # completeness gate: off-switch, fail-open, strict enforcement, argument parsing, and (when z3 imports) the SAT/UNSAT invariant matrix plus --id derivation [ "$(readlink CLAUDE.md)" = "AGENTS.md" ] [ "$(readlink .claude/skills)" = "../.agents/skills" ] FM_HEARTBEAT=2 FM_POLL=1 bin/fm-watch-arm.sh # watcher re-arm smoke test (prints arm status, then "heartbeat") diff --git a/bin/__pycache__/fm-completeness.cpython-313.pyc b/bin/__pycache__/fm-completeness.cpython-313.pyc new file mode 100644 index 0000000..0eb96d7 Binary files /dev/null and b/bin/__pycache__/fm-completeness.cpython-313.pyc differ diff --git a/bin/fm-bootstrap.sh b/bin/fm-bootstrap.sh index a01b385..937993f 100755 --- a/bin/fm-bootstrap.sh +++ b/bin/fm-bootstrap.sh @@ -5,7 +5,12 @@ # Silent = all good. # Lines: "MISSING: (install: )", "NEEDS_GH_AUTH", # "CREW_HARNESS_OVERRIDE: ", "FLEET_SYNC: : skipped: ", -# "TASKS_AXI: available". +# "TASKS_AXI: available", "COMPLETENESS_GATE: available". +# z3 (z3-solver) is an OPTIONAL capability that powers the formal +# completeness gate (bin/fm-completeness-check.sh); reported only when +# python3 can import z3. It is never a MISSING line and never prompts an +# install - the gate fails open without it. Opt in with +# "fm-bootstrap.sh install z3". # treehouse is also MISSING when its installed version lacks # "treehouse get --lease" support. # tasks-axi is an OPTIONAL backlog-management capability reported only @@ -69,6 +74,7 @@ install_cmd() { tmux|node|gh) echo "brew install $1 # or the platform's package manager" ;; treehouse) echo "curl -fsSL https://kunchenguid.github.io/treehouse/install.sh | sh" ;; no-mistakes) echo "curl -fsSL https://raw.githubusercontent.com/kunchenguid/no-mistakes/main/docs/install.sh | sh" ;; + z3) echo "python3 -m pip install z3-solver" ;; gh-axi|chrome-devtools-axi|lavish-axi) echo "npm install -g $1 && $1 setup hooks" ;; *) return 1 ;; esac @@ -103,5 +109,8 @@ crew= [ -f "$CONFIG/crew-harness" ] && crew=$(tr -d '[:space:]' < "$CONFIG/crew-harness" || true) [ -n "$crew" ] && [ "$crew" != "default" ] && echo "CREW_HARNESS_OVERRIDE: $crew" fm_tasks_axi_compatible && echo "TASKS_AXI: available" +if command -v python3 >/dev/null 2>&1 && python3 -c 'import z3' >/dev/null 2>&1; then + echo "COMPLETENESS_GATE: available" +fi fleet_sync exit 0 diff --git a/bin/fm-completeness-check.sh b/bin/fm-completeness-check.sh new file mode 100755 index 0000000..eabdd0c --- /dev/null +++ b/bin/fm-completeness-check.sh @@ -0,0 +1,225 @@ +#!/usr/bin/env bash +# Formal completeness gate for a task-lifecycle decision (AGENTS.md sections 2, 7). +# +# Derives the observed facts for a task (or takes them explicitly), then asks the +# Z3-backed engine (fm-completeness.py + fm-completeness.rules.json) to PROVE the +# completion claim consistent with firstmate's invariants. Hard rules gate; soft +# rules score. Prints a one-line verdict; on a blocked claim it names the violated +# rule and exits non-zero. +# +# Usage: +# fm-completeness-check.sh --gate teardown --id +# fm-completeness-check.sh --gate merge --id # approval via $FM_CAPTAIN_APPROVED +# fm-completeness-check.sh --kind ship --landed none --worktree holds_unlanded_work [...] +# +# Flags (explicit facts override anything derived): +# --gate what decision this guards (drives fact derivation) +# --id derive facts from state/.meta + data// + git +# --kind --landed --report --worktree --captain-approval +# --mode strict (default) gates on invariants; graded also scores soft rules +# --meta set a soft-rule metadata key true (repeatable) +# +# Exit: 0 = SAT (proceed), 2 = UNSAT (blocked), 0 = tooling unavailable (FAIL-OPEN, +# warns and defers to the caller's own checks) unless FM_COMPLETENESS_STRICT=1. +# Set FM_COMPLETENESS_GATE=0 to skip the gate entirely (still exits 0). +set -eu + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +FM_ROOT="${FM_ROOT_OVERRIDE:-$(cd "$SCRIPT_DIR/.." && pwd)}" +FM_HOME="${FM_HOME:-${FM_ROOT_OVERRIDE:-$FM_ROOT}}" +STATE="${FM_STATE_OVERRIDE:-$FM_HOME/state}" +DATA="${FM_DATA_OVERRIDE:-$FM_HOME/data}" +ENGINE="$SCRIPT_DIR/fm-completeness.py" + +GATE="" +ID="" +MODE="strict" +KIND="" +LANDED="" +REPORT="" +WORKTREE="" +APPROVAL="" +META_KEYS=() + +while [ $# -gt 0 ]; do + case "$1" in + --gate) GATE="$2"; shift 2 ;; + --id) ID="$2"; shift 2 ;; + --mode) MODE="$2"; shift 2 ;; + --kind) KIND="$2"; shift 2 ;; + --landed) LANDED="$2"; shift 2 ;; + --report) REPORT="$2"; shift 2 ;; + --worktree) WORKTREE="$2"; shift 2 ;; + --captain-approval) APPROVAL="$2"; shift 2 ;; + --meta) META_KEYS+=("$2"); shift 2 ;; + *) echo "fm-completeness-check: unknown argument '$1'" >&2; exit 64 ;; + esac +done + +# Honor the global off-switch. +if [ "${FM_COMPLETENESS_GATE:-1}" = "0" ]; then + exit 0 +fi + +fail_open() { + # Tooling missing or broken: never wedge the lifecycle. Warn and defer to the + # caller's own (bash) safety checks, unless the operator demands strictness. + if [ "${FM_COMPLETENESS_STRICT:-0}" = "1" ]; then + echo "completeness gate: $1 (FM_COMPLETENESS_STRICT=1 -> refusing)" >&2 + exit 3 + fi + echo "completeness gate: $1; skipping formal check (set FM_COMPLETENESS_STRICT=1 to enforce)" >&2 + exit 0 +} + +command -v python3 >/dev/null 2>&1 || fail_open "python3 not found" +[ -f "$ENGINE" ] || fail_open "engine $ENGINE missing" + +git_unlanded_facts() { + # Mirror fm-teardown.sh's notion of "landed" so the gate never diverges from + # the script it guards. Sets LANDED and WORKTREE for a ship task. + local wt=$1 proj=$2 mode=$3 dirty unpushed default unmerged + if [ ! -d "$wt" ]; then + # No worktree on disk means there is nothing to discard, exactly as + # fm-teardown.sh skips its unlanded check when [ ! -d "$WT" ]. Resolve to a + # non-blocking state: clean worktree and a landed value that clears + # SHIP_REQUIRES_LANDED, rather than landed=none which would false-block. + if [ "$mode" = "local-only" ]; then + LANDED="${LANDED:-local_merged}" + else + LANDED="${LANDED:-pushed}" + fi + WORKTREE="${WORKTREE:-clean}" + return 0 + fi + dirty=$(git -C "$wt" status --porcelain 2>/dev/null | grep -vE '^\?\? \.claude/' | head -1 || true) + unpushed=$(git -C "$wt" log --oneline HEAD --not --remotes -- 2>/dev/null | head -1 || true) + if [ "$mode" = "local-only" ]; then + default=$(git -C "$proj" symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##' || true) + [ -n "$default" ] || default=main + unmerged=$(git -C "$wt" log --oneline HEAD --not "$default" -- 2>/dev/null | head -1 || true) + if [ -z "$unmerged" ]; then + LANDED="${LANDED:-local_merged}" + elif [ -z "$unpushed" ]; then + LANDED="${LANDED:-pushed}" + else + LANDED="${LANDED:-none}" + fi + if [ -n "$dirty" ] || { [ -n "$unmerged" ] && [ -n "$unpushed" ]; }; then + WORKTREE="${WORKTREE:-holds_unlanded_work}" + else + WORKTREE="${WORKTREE:-clean}" + fi + else + if [ -z "$unpushed" ]; then LANDED="${LANDED:-pushed}"; else LANDED="${LANDED:-none}"; fi + if [ -n "$dirty" ] || [ -n "$unpushed" ]; then + WORKTREE="${WORKTREE:-holds_unlanded_work}" + else + WORKTREE="${WORKTREE:-clean}" + fi + fi +} + +# Derive facts from a task id when one is given (explicit flags still win). +if [ -n "$ID" ]; then + META="$STATE/$ID.meta" + if [ -f "$META" ]; then + [ -n "$KIND" ] || KIND=$(grep '^kind=' "$META" | cut -d= -f2- || true) + meta_mode=$(grep '^mode=' "$META" | cut -d= -f2- || true) + wt=$(grep '^worktree=' "$META" | cut -d= -f2- || true) + proj=$(grep '^project=' "$META" | cut -d= -f2- || true) + else + meta_mode=""; wt=""; proj="" + fi + [ -n "$KIND" ] || KIND=ship + [ -n "$meta_mode" ] || meta_mode=no-mistakes + + if [ -z "$REPORT" ]; then + if [ -f "$DATA/$ID/report.md" ]; then REPORT=present; else REPORT=absent; fi + fi + + case "$GATE" in + merge) + [ -n "$KIND" ] || KIND=ship + LANDED="${LANDED:-merged}" + WORKTREE="${WORKTREE:-clean}" + ;; + *) + if [ "$KIND" = scout ]; then + # Scout carve-out: the report governs, the worktree is scratch by contract. + LANDED="${LANDED:-none}"; WORKTREE="${WORKTREE:-clean}"; APPROVAL="${APPROVAL:-not_required}" + elif [ "$KIND" = ship ]; then + git_unlanded_facts "$wt" "$proj" "$meta_mode" + APPROVAL="${APPROVAL:-not_required}" + else + LANDED="${LANDED:-none}"; WORKTREE="${WORKTREE:-clean}"; APPROVAL="${APPROVAL:-not_required}" + fi + ;; + esac +fi + +# Approval at a merge gate is an explicit assertion the caller must make +# (directive #2): $FM_CAPTAIN_APPROVED in {granted,yes,1} -> granted; +# {not_required} -> not_required; anything else / unset -> pending (blocks). +if [ "$GATE" = "merge" ] && [ -z "$APPROVAL" ]; then + case "${FM_CAPTAIN_APPROVED:-}" in + granted|yes|1|true) APPROVAL=granted ;; + not_required) APPROVAL=not_required ;; + *) APPROVAL=pending ;; + esac +fi + +# Defaults for any axis still unset. +[ -n "$KIND" ] || KIND=ship +[ -n "$LANDED" ] || LANDED=none +[ -n "$REPORT" ] || REPORT=absent +[ -n "$WORKTREE" ] || WORKTREE=clean +[ -n "$APPROVAL" ] || APPROVAL=not_required + +case "$ID" in *\"*|*\\*) echo "fm-completeness-check: refusing id with quote/backslash" >&2; exit 64 ;; esac + +# Build the metadata object from repeated --meta keys. +meta_json="{}" +if [ "${#META_KEYS[@]}" -gt 0 ]; then + meta_json="{" + sep="" + for key in "${META_KEYS[@]}"; do + meta_json="$meta_json$sep\"$key\": true" + sep=", " + done + meta_json="$meta_json}" +fi + +facts=$(printf '{"name": "%s", "mode": "%s", "kind": "%s", "landed": "%s", "report": "%s", "worktree": "%s", "captain_approval": "%s", "metadata": %s}' \ + "${ID:-task}" "$MODE" "$KIND" "$LANDED" "$REPORT" "$WORKTREE" "$APPROVAL" "$meta_json") + +errfile=$(mktemp "${TMPDIR:-/tmp}/fm-completeness.XXXXXX") +set +e +out=$(printf '%s' "$facts" | python3 "$ENGINE" 2>"$errfile") +rc=$? +err=$(cat "$errfile" 2>/dev/null || true) +rm -f "$errfile" +set -e + +if [ "$rc" = "3" ]; then + fail_open "engine error: ${err:-unknown}" +fi + +label="${ID:-task}${GATE:+ ($GATE)}" +if [ "$rc" = "0" ]; then + if [ "$MODE" = "graded" ]; then + compliance=$(printf '%s' "$out" | sed -n 's/.*"compliance": \([0-9.]*\).*/\1/p') + echo "completeness gate: SAT - $label clears every invariant (compliance ${compliance:-1.0})" + else + echo "completeness gate: SAT - $label clears every invariant" + fi + exit 0 +fi + +reason=$(printf '%s' "$out" | sed -n 's/.*"reason": "\(.*\)", "violated_rules.*/\1/p') +violated=$(printf '%s' "$out" | sed -n 's/.*"violated_rules": \[\(.*\)\], "counterexample.*/\1/p') +echo "completeness gate: BLOCKED - $label is provably premature" >&2 +[ -n "$violated" ] && echo " violated: $violated" >&2 +[ -n "$reason" ] && echo " reason: $reason" >&2 +[ -z "$reason" ] && echo " $out" >&2 +exit 2 diff --git a/bin/fm-completeness.py b/bin/fm-completeness.py new file mode 100755 index 0000000..e3cdafd --- /dev/null +++ b/bin/fm-completeness.py @@ -0,0 +1,201 @@ +#!/usr/bin/env python3 +"""Formal completeness gate for firstmate task-lifecycle decisions. + +Firstmate's "is this task done / safe to tear down / clear to merge?" calls are +really invariants (AGENTS.md prime directives #2 and #3). This turns them into a +formal rule set checked by Z3: firstmate proposes a completion claim with the +observed facts, and the solver PROVES it either SAT (consistent with every +invariant) or UNSAT (provably premature, with the violated rule named). Hard +rules gate; soft rules score (never block). + +The only dependency is `z3-solver` (public PyPI) — there is no other library to +install. The rules are DATA, not code: they load from fm-completeness.rules.json +(or $FM_COMPLETENESS_RULES). This module is a small self-contained shim that +compiles that data into a Z3 model and runs the check. + +Encoding: each axis is a Z3 EnumSort variable. A hard rule compiles to +`Implies(when, require AND not-forbid)`. A concrete claim is verified by asking, +per rule, whether (facts AND rule) is satisfiable — UNSAT means that rule is +violated by the facts. `prove_consistency` checks the whole hard rule set is +satisfiable over free axes (a contradiction there is a bug in the directives, +not the task). Soft rules score deterministically over the concrete metadata. + +I/O: reads one JSON object of facts on stdin, e.g. + {"name": "fix-x", "kind": "ship", "landed": "none", + "worktree": "holds_unlanded_work", "mode": "strict", + "metadata": {"backlog_recorded": true}} +`mode` is "strict" (default, hard invariants only) or "graded" (also scores soft +rules). All other top-level keys except name/mode/metadata are axis values. + +Prints a JSON result on stdout. Exit 0 if SAT, 2 if UNSAT, 3 on a usage/tooling +error (so the bash wrapper can fail-open and defer to its own checks). +""" +from __future__ import annotations + +import json +import os +import sys + +EXIT_SAT = 0 +EXIT_UNSAT = 2 +EXIT_ERROR = 3 + + +def _default_rules_path() -> str: + env = os.environ.get("FM_COMPLETENESS_RULES") + if env: + return env + return os.path.join(os.path.dirname(os.path.abspath(__file__)), + "fm-completeness.rules.json") + + +def _build_axes(z3, axes_spec): + """Compile each axis into a Z3 EnumSort plus its value-constant map.""" + consts = {} # axis -> {value_str: z3 const} + variables = {} # axis -> z3 Const (the proposed value of that axis) + for axis, values in axes_spec.items(): + values = list(values) + _sort, members = z3.EnumSort(axis, values) + consts[axis] = dict(zip(values, members)) + variables[axis] = z3.Const(axis + "__proposed", _sort) + return consts, variables + + +def _rule_constraint(z3, rule, consts, variables): + """Compile one hard rule to Implies(when, require AND not-forbid).""" + when = rule.get("when", {}) + require = rule.get("require", {}) + forbid = rule.get("forbid", {}) + + antecedent = [variables[k] == consts[k][v] for k, v in when.items()] + consequent = [variables[k] == consts[k][v] for k, v in require.items()] + consequent += [variables[k] != consts[k][v] for k, v in forbid.items()] + + body = z3.And(*consequent) if consequent else z3.BoolVal(True) + if antecedent: + return z3.Implies(z3.And(*antecedent), body) + return body + + +def _verify_hard(z3, spec, fact_axes, consts, variables): + """Return the list of hard rules whose constraint is violated by the facts. + + A rule is violated iff (facts AND rule) is UNSAT — i.e. no world with these + facts can also satisfy the rule. Checking per rule names every violation, + not just one minimal unsat core. + """ + fact_asserts = [variables[k] == consts[k][v] for k, v in fact_axes.items()] + violated = [] + for rule in spec.get("hard_rules", []): + solver = z3.Solver() + for assertion in fact_asserts: + solver.add(assertion) + solver.add(_rule_constraint(z3, rule, consts, variables)) + if solver.check() == z3.unsat: + violated.append(rule) + return violated + + +def _score_soft(spec, fact_axes, metadata): + """Deterministic weighted compliance over concrete metadata (never gates).""" + total = 0 + satisfied = 0 + unmet = [] + for rule in spec.get("soft_rules", []): + when = rule.get("when", {}) + if any(fact_axes.get(k) != v for k, v in when.items()): + continue # rule does not apply to this proposal + weight = int(rule.get("weight", 1)) + total += weight + key = rule.get("require_meta") + if key is not None and metadata.get(key): + satisfied += weight + else: + unmet.append(rule["name"]) + compliance = 1.0 if total == 0 else satisfied / total + return compliance, unmet + + +def prove_consistency(z3, spec, consts, variables) -> bool: + """True iff the whole hard rule set is satisfiable over free axis values.""" + solver = z3.Solver() + for rule in spec.get("hard_rules", []): + solver.add(_rule_constraint(z3, rule, consts, variables)) + return solver.check() == z3.sat + + +def check(spec: dict, facts: dict) -> dict: + import z3 # public PyPI z3-solver; absence -> ImportError -> caller fails open + + mode = facts.get("mode", "strict") + metadata = facts.get("metadata", {}) or {} + declared = spec.get("axes", {}) + fact_axes = {k: v for k, v in facts.items() + if k not in ("name", "mode", "metadata") and k in declared} + + # A fact whose value the data file never declared is a typo, not a pass. + for axis, value in fact_axes.items(): + if value not in declared[axis]: + raise ValueError( + "fact %s=%r is not a declared value of axis %s (%s)" + % (axis, value, axis, ", ".join(declared[axis]))) + + consts, variables = _build_axes(z3, declared) + + violated = _verify_hard(z3, spec, fact_axes, consts, variables) + sat = not violated + + if mode == "graded": + compliance, unmet = _score_soft(spec, fact_axes, metadata) + else: + compliance, unmet = 1.0, [] + + if sat: + reason = "consistent with every invariant" + else: + reason = "; ".join("[%s] %s" % (r["name"], r["reason"]) for r in violated) + + return { + "sat": bool(sat), + "reason": reason, + "violated_rules": [r["name"] for r in violated], + "counterexample": None, + "compliance": compliance, + "unmet_soft": unmet, + "mode": mode, + } + + +def main() -> int: + try: + raw = sys.stdin.read() + facts = json.loads(raw) if raw.strip() else {} + except (json.JSONDecodeError, ValueError) as exc: + print(json.dumps({"error": "bad facts JSON: %s" % exc}), file=sys.stderr) + return EXIT_ERROR + + rules_path = _default_rules_path() + try: + with open(rules_path, encoding="utf-8") as handle: + spec = json.load(handle) + except OSError as exc: + print(json.dumps({"error": "cannot read rules file %s: %s" + % (rules_path, exc)}), file=sys.stderr) + return EXIT_ERROR + + try: + out = check(spec, facts) + except ImportError as exc: + print(json.dumps({"error": "z3-solver not importable: %s" % exc}), + file=sys.stderr) + return EXIT_ERROR + except (ValueError, KeyError) as exc: + print(json.dumps({"error": str(exc)}), file=sys.stderr) + return EXIT_ERROR + + print(json.dumps(out)) + return EXIT_SAT if out["sat"] else EXIT_UNSAT + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/bin/fm-completeness.rules.json b/bin/fm-completeness.rules.json new file mode 100644 index 0000000..f394232 --- /dev/null +++ b/bin/fm-completeness.rules.json @@ -0,0 +1,57 @@ +{ + "name": "firstmate_task_completeness", + "_comment": "Firstmate's task-completion invariants as DATA. Hard rules gate (a violation is UNSAT and blocks the lifecycle action); soft rules score into a compliance number and never block. Axes are the facts firstmate reads before declaring done. Rule semantics: a rule with `when` applies only when every listed axis matches; `require` demands an axis equal a value; `forbid` rejects an axis equal a value; soft `require_meta` demands a truthy metadata key. See AGENTS.md sections 1, 2, 7.", + "axes": { + "kind": ["ship", "scout", "secondmate"], + "landed": ["merged", "pushed", "local_merged", "none"], + "report": ["present", "absent"], + "worktree": ["clean", "holds_unlanded_work"], + "captain_approval": ["granted", "not_required", "pending"] + }, + "hard_rules": [ + { + "name": "SCOUT_REQUIRES_REPORT", + "when": {"kind": "scout"}, + "require": {"report": "present"}, + "reason": "scout task declared done with no report.md; the report is the deliverable (directive #3 scout carve-out)" + }, + { + "name": "SHIP_REQUIRES_LANDED", + "when": {"kind": "ship"}, + "forbid": {"landed": "none"}, + "reason": "ship task declared done but the work is not landed (not merged, pushed to a remote/fork, or merged into local main) (directive #3)" + }, + { + "name": "NO_UNLANDED_AT_TEARDOWN", + "forbid": {"worktree": "holds_unlanded_work"}, + "reason": "worktree still holds unlanded work; teardown would discard it (directive #3)" + }, + { + "name": "MERGE_NEEDS_CAPTAIN_WORD", + "when": {"landed": "merged"}, + "forbid": {"captain_approval": "pending"}, + "reason": "merge performed without the captain's explicit approval (directive #2)" + } + ], + "soft_rules": [ + { + "name": "PREFER_EVIDENCE_CITED", + "require_meta": "evidence_cited", + "weight": 3, + "reason": "deliverable does not cite file:line evidence" + }, + { + "name": "PREFER_AGENTS_MD_UPDATED", + "when": {"kind": "ship"}, + "require_meta": "agents_md_touched", + "weight": 2, + "reason": "ship task recorded no durable project knowledge in AGENTS.md" + }, + { + "name": "PREFER_BACKLOG_RECORDED", + "require_meta": "backlog_recorded", + "weight": 1, + "reason": "completion not yet recorded in the backlog" + } + ] +} diff --git a/bin/fm-merge-local.sh b/bin/fm-merge-local.sh index 0baf4e5..5bf1f4b 100755 --- a/bin/fm-merge-local.sh +++ b/bin/fm-merge-local.sh @@ -40,6 +40,22 @@ default_branch() { return 1 } +# Formal completeness gate (directive #2): the local merge IS firstmate exercising +# the captain's merge authority, so the captain's word must be asserted explicitly +# via $FM_CAPTAIN_APPROVED (granted | yes | 1 | true; or not_required under yolo). +# The gate FAILS OPEN when the solver tooling is absent, so existing setups keep +# working until they install it; once installed, an unasserted approval is blocked. +set +e +gate_out=$("$FM_ROOT/bin/fm-completeness-check.sh" --gate merge --id "$ID" 2>&1) +gate_rc=$? +set -e +if [ "$gate_rc" = 2 ] || [ "$gate_rc" = 3 ]; then + printf '%s\n' "$gate_out" >&2 + echo "REFUSED: completeness gate blocked the local merge of $ID." >&2 + echo "Assert the captain's approval explicitly, e.g. FM_CAPTAIN_APPROVED=granted bin/fm-merge-local.sh $ID" >&2 + exit 1 +fi + BRANCH="fm/$ID" git -C "$PROJ" rev-parse --verify --quiet "refs/heads/$BRANCH" >/dev/null || { echo "error: branch $BRANCH does not exist in $PROJ" >&2; exit 1; } diff --git a/bin/fm-teardown.sh b/bin/fm-teardown.sh index ddd0a6d..d90b585 100755 --- a/bin/fm-teardown.sh +++ b/bin/fm-teardown.sh @@ -415,6 +415,23 @@ if [ "$KIND" = secondmate ] && [ "$FORCE" = "--force" ]; then cleanup_firstmate_home_children "$HOME_PATH" fi +# Formal completeness gate (AGENTS.md section 7): prove the "done" claim consistent +# with the directives before the irreversible teardown. It FAILS OPEN when the +# solver tooling is absent, so the battle-tested bash checks below remain the hard +# guarantee; when present it gives a provable, named refusal first. Skipped under +# --force (the explicit discard path) and for secondmates (governed elsewhere). +if [ "$FORCE" != "--force" ] && { [ "$KIND" = ship ] || [ "$KIND" = scout ]; }; then + set +e + gate_out=$("$FM_ROOT/bin/fm-completeness-check.sh" --gate teardown --id "$ID" 2>&1) + gate_rc=$? + set -e + if [ "$gate_rc" = 2 ] || [ "$gate_rc" = 3 ]; then + printf '%s\n' "$gate_out" >&2 + echo "REFUSED: completeness gate blocked teardown of $ID." >&2 + exit 1 + fi +fi + if [ -d "$WT" ] && [ "$FORCE" != "--force" ]; then if [ "$KIND" = secondmate ]; then : diff --git a/tests/fm-completeness.test.sh b/tests/fm-completeness.test.sh new file mode 100755 index 0000000..e90258b --- /dev/null +++ b/tests/fm-completeness.test.sh @@ -0,0 +1,118 @@ +#!/usr/bin/env bash +# Tests for bin/fm-completeness-check.sh (the formal completeness gate). +# +# Two tiers: +# - Tooling-agnostic (always run): the off-switch, fail-open behavior, strict +# enforcement, argument parsing. These must hold even where z3 is not +# installed (e.g. CI), because the gate is designed to FAIL OPEN and never +# wedge the lifecycle. +# - Solver-dependent (run only when z3 imports): the actual SAT/UNSAT verdicts +# for the invariant matrix. +set -u + +ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +GATE="$ROOT/bin/fm-completeness-check.sh" +TMP_ROOT= + +fail() { printf 'not ok - %s\n' "$1" >&2; exit 1; } +pass() { printf 'ok - %s\n' "$1"; } +skip() { printf 'ok - %s # SKIP\n' "$1"; } + +cleanup() { [ -n "${TMP_ROOT:-}" ] && rm -rf "$TMP_ROOT"; } +trap cleanup EXIT +TMP_ROOT=$(mktemp -d "${TMPDIR:-/tmp}/fm-completeness-tests.XXXXXX") + +# rc_of runs the gate with the given args and echoes its exit code (never aborts). +rc_of() { + set +e + "$GATE" "$@" >/dev/null 2>&1 + local rc=$? + set -e + printf '%s' "$rc" +} + +# --- Tooling-agnostic tier --------------------------------------------------- + +# Off-switch always yields 0 regardless of facts. +if [ "$(FM_COMPLETENESS_GATE=0 "$GATE" --kind scout --report absent >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "off-switch FM_COMPLETENESS_GATE=0 exits 0" +else + fail "off-switch did not exit 0" +fi + +# Fail-open: a broken rules path is non-fatal by default. +if [ "$(FM_COMPLETENESS_RULES=/no/such.json "$GATE" --kind scout --report present >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "fail-open on missing rules file exits 0" +else + fail "missing rules file did not fail open" +fi + +# Strict mode turns the same breakage into a hard error (exit 3). +if [ "$(FM_COMPLETENESS_STRICT=1 FM_COMPLETENESS_RULES=/no/such.json "$GATE" --kind scout --report present >/dev/null 2>&1; echo $?)" = "3" ]; then + pass "FM_COMPLETENESS_STRICT=1 enforces (exit 3) on tooling breakage" +else + fail "strict mode did not enforce on tooling breakage" +fi + +# Unknown argument is a usage error. +if [ "$(rc_of --bogus x)" = "64" ]; then + pass "unknown argument exits 64" +else + fail "unknown argument did not exit 64" +fi + +# --- Solver-dependent tier --------------------------------------------------- + +if ! python3 -c "import z3" >/dev/null 2>&1; then + skip "solver matrix (z3 not importable)" + skip "solver --id derivation (z3 not importable)" + exit 0 +fi + +assert_rc() { + local want=$1 desc=$2; shift 2 + local got; got=$(rc_of "$@") + if [ "$got" = "$want" ]; then pass "$desc"; else fail "$desc (rc=$got want=$want)"; fi +} + +# Hard invariants gate (exit 2 = UNSAT/blocked, 0 = SAT/clear). +assert_rc 2 "scout without report is blocked" --kind scout --report absent +assert_rc 0 "scout with report clears" --kind scout --report present +assert_rc 2 "ship not landed is blocked" --kind ship --landed none --worktree holds_unlanded_work +assert_rc 0 "ship pushed + clean clears" --kind ship --landed pushed --worktree clean +assert_rc 2 "merge without approval is blocked" --gate merge --kind ship --landed merged --captain-approval pending +assert_rc 0 "merge with approval clears" --gate merge --kind ship --landed merged --captain-approval granted + +# Merge gate reads approval from the environment (directive #2 explicit assert). +if [ "$(FM_CAPTAIN_APPROVED=granted "$GATE" --gate merge --kind ship --landed merged >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "merge gate honors FM_CAPTAIN_APPROVED=granted" +else + fail "merge gate did not honor FM_CAPTAIN_APPROVED=granted" +fi +if [ "$("$GATE" --gate merge --kind ship --landed merged >/dev/null 2>&1; echo $?)" = "2" ]; then + pass "merge gate blocks when approval unset (defaults pending)" +else + fail "merge gate did not block with approval unset" +fi + +# Graded mode never blocks on a soft-only miss, but reports it. +assert_rc 0 "graded ship is SAT despite unmet soft rules" \ + --mode graded --kind ship --landed merged --captain-approval granted --worktree clean + +# --id derivation from a synthetic meta + report. +HOME_DIR="$TMP_ROOT/home" +mkdir -p "$HOME_DIR/state" "$HOME_DIR/data/scout-x" +printf 'kind=scout\nmode=no-mistakes\nworktree=%s\nproject=%s\n' "$TMP_ROOT/wt" "$TMP_ROOT/proj" > "$HOME_DIR/state/scout-x.meta" +# No report yet -> blocked. +if [ "$(FM_HOME="$HOME_DIR" "$GATE" --gate teardown --id scout-x >/dev/null 2>&1; echo $?)" = "2" ]; then + pass "--id scout derivation blocks with no report" +else + fail "--id scout derivation did not block without report" +fi +# Write the report -> clears. +printf '# report\n' > "$HOME_DIR/data/scout-x/report.md" +if [ "$(FM_HOME="$HOME_DIR" "$GATE" --gate teardown --id scout-x >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "--id scout derivation clears once report exists" +else + fail "--id scout derivation did not clear with report present" +fi