From a43f916e43c86d3e61934469f2f5c51684b610cc Mon Sep 17 00:00:00 2001 From: Tom Harper Date: Wed, 24 Jun 2026 21:48:59 -0700 Subject: [PATCH 1/6] feat: formal completeness gate for teardown and merge Z3-backed neurosymbolic check (rules as data) that proves a task's done/teardown/merge claim consistent with prime directives #2 and #3. Hard rules gate; soft rules score. Wired into fm-teardown.sh and fm-merge-local.sh; fails open when the solver is absent so the existing bash checks remain the hard guarantee. --- AGENTS.md | 3 +- bin/fm-completeness-check.sh | 215 +++++++++++++++++++++++++++++++++ bin/fm-completeness.py | 166 +++++++++++++++++++++++++ bin/fm-completeness.rules.json | 57 +++++++++ bin/fm-merge-local.sh | 16 +++ bin/fm-teardown.sh | 17 +++ tests/fm-completeness.test.sh | 118 ++++++++++++++++++ 7 files changed, 591 insertions(+), 1 deletion(-) create mode 100755 bin/fm-completeness-check.sh create mode 100755 bin/fm-completeness.py create mode 100644 bin/fm-completeness.rules.json create mode 100755 tests/fm-completeness.test.sh diff --git a/AGENTS.md b/AGENTS.md index ea82a62..fa55764 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -69,6 +69,7 @@ README.md public overview and development notes .agents/skills/ shared skills, committed .claude/skills symlink to .agents/skills for claude compatibility bin/ helper scripts, committed; read each script's header before first use + fm-completeness-check.sh formal completeness gate: proves a task's done/teardown/merge claim consistent with the directives (Z3-backed via fm-completeness.py + fm-completeness.rules.json). Wired into fm-teardown.sh and fm-merge-local.sh; FAILS OPEN when the solver tooling is absent, so it never wedges the lifecycle and the existing bash checks remain the hard guarantee. Off-switch: FM_COMPLETENESS_GATE=0; enforce-when-broken: FM_COMPLETENESS_STRICT=1 config/crew-harness crewmate harness override; LOCAL, gitignored; absent or "default" = same as firstmate data/ personal fleet records; LOCAL, gitignored as a whole backlog.md task queue, dependencies, history @@ -339,7 +340,7 @@ A ship task's path from `done` to landed on `main` is set by the project's `mode - **no-mistakes** - the stages below as written: no-mistakes validation pipeline -> PR -> captain merge. - **direct-PR** - no pipeline. The crewmate pushes and opens the PR itself (its brief says so) and reports `done: PR `. Skip the Validate step and go straight to PR ready (run `fm-pr-check`, relay the PR). Teardown uses the normal pushed-branch check. -- **local-only** - no remote, no PR. The crewmate stops at `done: ready in branch fm/`. Review the diff with `bin/fm-review-diff.sh `, relay a one-paragraph summary to the captain, and on approval run `bin/fm-merge-local.sh ` to fast-forward local `main` (it refuses anything but a clean fast-forward - if it does, have the crewmate rebase). No `fm-pr-check`. Then teardown, whose safety check requires the branch already merged into local `main`, OR the work pushed to any remote (a fork counts - relevant for upstream-contribution PRs on a local-only-registered project). +- **local-only** - no remote, no PR. The crewmate stops at `done: ready in branch fm/`. Review the diff with `bin/fm-review-diff.sh `, relay a one-paragraph summary to the captain, and on approval run `FM_CAPTAIN_APPROVED=granted bin/fm-merge-local.sh ` to fast-forward local `main` (it refuses anything but a clean fast-forward - if it does, have the crewmate rebase). The `FM_CAPTAIN_APPROVED` assertion is the completeness gate's directive-#2 check: the merge is firstmate exercising the captain's merge authority, so the approval must be asserted explicitly (`granted`, or `not_required` under yolo); without it the gate blocks the merge when the solver tooling is present, and fails open (proceeds) when it is absent. No `fm-pr-check`. Then teardown, whose safety check requires the branch already merged into local `main`, OR the work pushed to any remote (a fork counts - relevant for upstream-contribution PRs on a local-only-registered project). When reviewing any crewmate branch diff, use `bin/fm-review-diff.sh ` rather than `git diff ...branch` directly. Pooled clones keep their local default refs frozen at clone time and can lag `origin`; the helper always compares against the authoritative base. diff --git a/bin/fm-completeness-check.sh b/bin/fm-completeness-check.sh new file mode 100755 index 0000000..21e459a --- /dev/null +++ b/bin/fm-completeness-check.sh @@ -0,0 +1,215 @@ +#!/usr/bin/env bash +# Formal completeness gate for a task-lifecycle decision (AGENTS.md sections 2, 7). +# +# Derives the observed facts for a task (or takes them explicitly), then asks the +# Z3-backed engine (fm-completeness.py + fm-completeness.rules.json) to PROVE the +# completion claim consistent with firstmate's invariants. Hard rules gate; soft +# rules score. Prints a one-line verdict; on a blocked claim it names the violated +# rule and exits non-zero. +# +# Usage: +# fm-completeness-check.sh --gate teardown --id +# fm-completeness-check.sh --gate merge --id # approval via $FM_CAPTAIN_APPROVED +# fm-completeness-check.sh --kind ship --landed none --worktree holds_unlanded_work [...] +# +# Flags (explicit facts override anything derived): +# --gate what decision this guards (drives fact derivation) +# --id derive facts from state/.meta + data// + git +# --kind --landed --report --worktree --captain-approval +# --mode strict (default) gates on invariants; graded also scores soft rules +# --meta set a soft-rule metadata key true (repeatable) +# +# Exit: 0 = SAT (proceed), 2 = UNSAT (blocked), 0 = tooling unavailable (FAIL-OPEN, +# warns and defers to the caller's own checks) unless FM_COMPLETENESS_STRICT=1. +# Set FM_COMPLETENESS_GATE=0 to skip the gate entirely (still exits 0). +set -eu + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +FM_ROOT="${FM_ROOT_OVERRIDE:-$(cd "$SCRIPT_DIR/.." && pwd)}" +FM_HOME="${FM_HOME:-${FM_ROOT_OVERRIDE:-$FM_ROOT}}" +STATE="${FM_STATE_OVERRIDE:-$FM_HOME/state}" +DATA="${FM_DATA_OVERRIDE:-$FM_HOME/data}" +ENGINE="$SCRIPT_DIR/fm-completeness.py" + +GATE="" +ID="" +MODE="strict" +KIND="" +LANDED="" +REPORT="" +WORKTREE="" +APPROVAL="" +META_KEYS=() + +while [ $# -gt 0 ]; do + case "$1" in + --gate) GATE="$2"; shift 2 ;; + --id) ID="$2"; shift 2 ;; + --mode) MODE="$2"; shift 2 ;; + --kind) KIND="$2"; shift 2 ;; + --landed) LANDED="$2"; shift 2 ;; + --report) REPORT="$2"; shift 2 ;; + --worktree) WORKTREE="$2"; shift 2 ;; + --captain-approval) APPROVAL="$2"; shift 2 ;; + --meta) META_KEYS+=("$2"); shift 2 ;; + *) echo "fm-completeness-check: unknown argument '$1'" >&2; exit 64 ;; + esac +done + +# Honor the global off-switch. +if [ "${FM_COMPLETENESS_GATE:-1}" = "0" ]; then + exit 0 +fi + +fail_open() { + # Tooling missing or broken: never wedge the lifecycle. Warn and defer to the + # caller's own (bash) safety checks, unless the operator demands strictness. + if [ "${FM_COMPLETENESS_STRICT:-0}" = "1" ]; then + echo "completeness gate: $1 (FM_COMPLETENESS_STRICT=1 -> refusing)" >&2 + exit 3 + fi + echo "completeness gate: $1; skipping formal check (set FM_COMPLETENESS_STRICT=1 to enforce)" >&2 + exit 0 +} + +command -v python3 >/dev/null 2>&1 || fail_open "python3 not found" +[ -f "$ENGINE" ] || fail_open "engine $ENGINE missing" + +git_unlanded_facts() { + # Mirror fm-teardown.sh's notion of "landed" so the gate never diverges from + # the script it guards. Sets LANDED and WORKTREE for a ship task. + local wt=$1 proj=$2 mode=$3 dirty unpushed default unmerged + if [ ! -d "$wt" ]; then + LANDED="${LANDED:-none}"; WORKTREE="${WORKTREE:-clean}"; return 0 + fi + dirty=$(git -C "$wt" status --porcelain 2>/dev/null | grep -vE '^\?\? \.claude/' | head -1 || true) + unpushed=$(git -C "$wt" log --oneline HEAD --not --remotes -- 2>/dev/null | head -1 || true) + if [ "$mode" = "local-only" ]; then + default=$(git -C "$proj" symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##' || true) + [ -n "$default" ] || default=main + unmerged=$(git -C "$wt" log --oneline HEAD --not "$default" -- 2>/dev/null | head -1 || true) + if [ -z "$unmerged" ]; then + LANDED="${LANDED:-local_merged}" + elif [ -z "$unpushed" ]; then + LANDED="${LANDED:-pushed}" + else + LANDED="${LANDED:-none}" + fi + if [ -n "$dirty" ] || { [ -n "$unmerged" ] && [ -n "$unpushed" ]; }; then + WORKTREE="${WORKTREE:-holds_unlanded_work}" + else + WORKTREE="${WORKTREE:-clean}" + fi + else + if [ -z "$unpushed" ]; then LANDED="${LANDED:-pushed}"; else LANDED="${LANDED:-none}"; fi + if [ -n "$dirty" ] || [ -n "$unpushed" ]; then + WORKTREE="${WORKTREE:-holds_unlanded_work}" + else + WORKTREE="${WORKTREE:-clean}" + fi + fi +} + +# Derive facts from a task id when one is given (explicit flags still win). +if [ -n "$ID" ]; then + META="$STATE/$ID.meta" + if [ -f "$META" ]; then + [ -n "$KIND" ] || KIND=$(grep '^kind=' "$META" | cut -d= -f2- || true) + meta_mode=$(grep '^mode=' "$META" | cut -d= -f2- || true) + wt=$(grep '^worktree=' "$META" | cut -d= -f2- || true) + proj=$(grep '^project=' "$META" | cut -d= -f2- || true) + else + meta_mode=""; wt=""; proj="" + fi + [ -n "$KIND" ] || KIND=ship + [ -n "$meta_mode" ] || meta_mode=no-mistakes + + if [ -z "$REPORT" ]; then + if [ -f "$DATA/$ID/report.md" ]; then REPORT=present; else REPORT=absent; fi + fi + + case "$GATE" in + merge) + [ -n "$KIND" ] || KIND=ship + LANDED="${LANDED:-merged}" + WORKTREE="${WORKTREE:-clean}" + ;; + *) + if [ "$KIND" = scout ]; then + # Scout carve-out: the report governs, the worktree is scratch by contract. + LANDED="${LANDED:-none}"; WORKTREE="${WORKTREE:-clean}"; APPROVAL="${APPROVAL:-not_required}" + elif [ "$KIND" = ship ]; then + git_unlanded_facts "$wt" "$proj" "$meta_mode" + APPROVAL="${APPROVAL:-not_required}" + else + LANDED="${LANDED:-none}"; WORKTREE="${WORKTREE:-clean}"; APPROVAL="${APPROVAL:-not_required}" + fi + ;; + esac +fi + +# Approval at a merge gate is an explicit assertion the caller must make +# (directive #2): $FM_CAPTAIN_APPROVED in {granted,yes,1} -> granted; +# {not_required} -> not_required; anything else / unset -> pending (blocks). +if [ "$GATE" = "merge" ] && [ -z "$APPROVAL" ]; then + case "${FM_CAPTAIN_APPROVED:-}" in + granted|yes|1|true) APPROVAL=granted ;; + not_required) APPROVAL=not_required ;; + *) APPROVAL=pending ;; + esac +fi + +# Defaults for any axis still unset. +[ -n "$KIND" ] || KIND=ship +[ -n "$LANDED" ] || LANDED=none +[ -n "$REPORT" ] || REPORT=absent +[ -n "$WORKTREE" ] || WORKTREE=clean +[ -n "$APPROVAL" ] || APPROVAL=not_required + +case "$ID" in *\"*|*\\*) echo "fm-completeness-check: refusing id with quote/backslash" >&2; exit 64 ;; esac + +# Build the metadata object from repeated --meta keys. +meta_json="{}" +if [ "${#META_KEYS[@]}" -gt 0 ]; then + meta_json="{" + sep="" + for key in "${META_KEYS[@]}"; do + meta_json="$meta_json$sep\"$key\": true" + sep=", " + done + meta_json="$meta_json}" +fi + +facts=$(printf '{"name": "%s", "mode": "%s", "kind": "%s", "landed": "%s", "report": "%s", "worktree": "%s", "captain_approval": "%s", "metadata": %s}' \ + "${ID:-task}" "$MODE" "$KIND" "$LANDED" "$REPORT" "$WORKTREE" "$APPROVAL" "$meta_json") + +errfile=$(mktemp "${TMPDIR:-/tmp}/fm-completeness.XXXXXX") +set +e +out=$(printf '%s' "$facts" | python3 "$ENGINE" 2>"$errfile") +rc=$? +err=$(cat "$errfile" 2>/dev/null || true) +rm -f "$errfile" +set -e + +if [ "$rc" = "3" ]; then + fail_open "engine error: ${err:-unknown}" +fi + +label="${ID:-task}${GATE:+ ($GATE)}" +if [ "$rc" = "0" ]; then + if [ "$MODE" = "graded" ]; then + compliance=$(printf '%s' "$out" | sed -n 's/.*"compliance": \([0-9.]*\).*/\1/p') + echo "completeness gate: SAT - $label clears every invariant (compliance ${compliance:-1.0})" + else + echo "completeness gate: SAT - $label clears every invariant" + fi + exit 0 +fi + +reason=$(printf '%s' "$out" | sed -n 's/.*"reason": "\(.*\)", "violated_rules.*/\1/p') +violated=$(printf '%s' "$out" | sed -n 's/.*"violated_rules": \[\(.*\)\], "counterexample.*/\1/p') +echo "completeness gate: BLOCKED - $label is provably premature" >&2 +[ -n "$violated" ] && echo " violated: $violated" >&2 +[ -n "$reason" ] && echo " reason: $reason" >&2 +[ -z "$reason" ] && echo " $out" >&2 +exit 2 diff --git a/bin/fm-completeness.py b/bin/fm-completeness.py new file mode 100755 index 0000000..628d88f --- /dev/null +++ b/bin/fm-completeness.py @@ -0,0 +1,166 @@ +#!/usr/bin/env python3 +"""Formal completeness gate for firstmate task-lifecycle decisions. + +Firstmate's "is this task done / safe to tear down / clear to merge?" calls are +really invariants (AGENTS.md prime directives #2 and #3). This turns them into a +formal rule set checked by the Z3-backed neurosymbolic-evaluator: firstmate +proposes a completion claim with the observed facts, and the solver PROVES it +either SAT (consistent with every invariant) or UNSAT (provably premature, with +the violated rule named). Hard rules gate; soft rules score (never block). + +The rules are DATA, not code: they load from fm-completeness.rules.json (or +$FM_COMPLETENESS_RULES). This module only translates that data into the +evaluator's custom-rule callbacks and runs the check. + +I/O: reads one JSON object of facts on stdin, e.g. + {"name": "fix-x", "kind": "ship", "landed": "none", + "worktree": "holds_unlanded_work", "mode": "strict", + "metadata": {"backlog_recorded": true}} +`mode` is "strict" (default, hard invariants only) or "graded" (also scores soft +rules). All other top-level keys except name/mode/metadata are axis values. + +Prints a JSON result on stdout. Exit 0 if SAT, 2 if UNSAT, 3 on a usage/tooling +error (so the bash wrapper can fail-open and defer to its own checks). +""" +from __future__ import annotations + +import json +import os +import sys + +EXIT_SAT = 0 +EXIT_UNSAT = 2 +EXIT_ERROR = 3 + + +def _default_rules_path() -> str: + env = os.environ.get("FM_COMPLETENESS_RULES") + if env: + return env + return os.path.join(os.path.dirname(os.path.abspath(__file__)), + "fm-completeness.rules.json") + + +def _applies(axes: dict, when: dict) -> bool: + """A rule with a `when` clause applies only when every condition matches.""" + return all(axes.get(k) == v for k, v in when.items()) + + +def _make_hard_check(rule: dict): + when = rule.get("when", {}) + require = rule.get("require", {}) + forbid = rule.get("forbid", {}) + reason = rule["reason"] + + def check(pred, _vocab): + axes = pred.axes + if not _applies(axes, when): + return None + for key, val in require.items(): + if axes.get(key) != val: + return reason + for key, val in forbid.items(): + if axes.get(key) == val: + return reason + return None + + return check + + +def _make_soft_check(rule: dict): + when = rule.get("when", {}) + meta_key = rule.get("require_meta") + reason = rule["reason"] + + def check(pred, _vocab): + if not _applies(pred.axes, when): + return None + if meta_key is not None and pred.metadata.get(meta_key): + return None + return reason + + return check + + +def build_vocabulary(spec: dict): + """Translate the rules data file into a verified Vocabulary.""" + from neurosymbolic_evaluator import Vocabulary + + vocab = Vocabulary(spec.get("name", "firstmate_task_completeness")) + for axis, values in spec.get("axes", {}).items(): + vocab.add_axis(axis, list(values)) + for rule in spec.get("hard_rules", []): + vocab.add_custom_rule(rule["name"], _make_hard_check(rule)) + for rule in spec.get("soft_rules", []): + vocab.add_custom_soft_rule( + rule["name"], _make_soft_check(rule), weight=int(rule.get("weight", 1))) + return vocab + + +def check(spec: dict, facts: dict) -> dict: + vocab = build_vocabulary(spec) + + name = facts.get("name", "task") + mode = facts.get("mode", "strict") + metadata = facts.get("metadata", {}) or {} + declared_axes = set(spec.get("axes", {})) + axis_values = {k: v for k, v in facts.items() + if k not in ("name", "mode", "metadata") and k in declared_axes} + + # Reject axis values the data file never declared — a typo'd fact must surface + # as a tooling error, not silently pass the gate. + for axis, value in axis_values.items(): + if value not in spec["axes"][axis]: + raise ValueError( + "fact %s=%r is not a declared value of axis %s (%s)" + % (axis, value, axis, ", ".join(spec["axes"][axis]))) + + if mode == "graded": + result = vocab.verify_extension_graded(name, metadata=metadata, **axis_values) + else: + result = vocab.verify_extension(name, metadata=metadata, **axis_values) + + return { + "sat": bool(result.sat), + "reason": result.reason, + "violated_rules": list(result.violated_rules), + "counterexample": result.counterexample, + "compliance": getattr(result, "compliance", 1.0), + "unmet_soft": list(getattr(result, "unmet_soft", [])), + "mode": mode, + } + + +def main() -> int: + try: + raw = sys.stdin.read() + facts = json.loads(raw) if raw.strip() else {} + except (json.JSONDecodeError, ValueError) as exc: + print(json.dumps({"error": "bad facts JSON: %s" % exc}), file=sys.stderr) + return EXIT_ERROR + + rules_path = _default_rules_path() + try: + with open(rules_path, encoding="utf-8") as handle: + spec = json.load(handle) + except OSError as exc: + print(json.dumps({"error": "cannot read rules file %s: %s" + % (rules_path, exc)}), file=sys.stderr) + return EXIT_ERROR + + try: + out = check(spec, facts) + except ImportError as exc: + print(json.dumps({"error": "neurosymbolic-evaluator not importable: %s" + % exc}), file=sys.stderr) + return EXIT_ERROR + except (ValueError, KeyError) as exc: + print(json.dumps({"error": str(exc)}), file=sys.stderr) + return EXIT_ERROR + + print(json.dumps(out)) + return EXIT_SAT if out["sat"] else EXIT_UNSAT + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/bin/fm-completeness.rules.json b/bin/fm-completeness.rules.json new file mode 100644 index 0000000..f394232 --- /dev/null +++ b/bin/fm-completeness.rules.json @@ -0,0 +1,57 @@ +{ + "name": "firstmate_task_completeness", + "_comment": "Firstmate's task-completion invariants as DATA. Hard rules gate (a violation is UNSAT and blocks the lifecycle action); soft rules score into a compliance number and never block. Axes are the facts firstmate reads before declaring done. Rule semantics: a rule with `when` applies only when every listed axis matches; `require` demands an axis equal a value; `forbid` rejects an axis equal a value; soft `require_meta` demands a truthy metadata key. See AGENTS.md sections 1, 2, 7.", + "axes": { + "kind": ["ship", "scout", "secondmate"], + "landed": ["merged", "pushed", "local_merged", "none"], + "report": ["present", "absent"], + "worktree": ["clean", "holds_unlanded_work"], + "captain_approval": ["granted", "not_required", "pending"] + }, + "hard_rules": [ + { + "name": "SCOUT_REQUIRES_REPORT", + "when": {"kind": "scout"}, + "require": {"report": "present"}, + "reason": "scout task declared done with no report.md; the report is the deliverable (directive #3 scout carve-out)" + }, + { + "name": "SHIP_REQUIRES_LANDED", + "when": {"kind": "ship"}, + "forbid": {"landed": "none"}, + "reason": "ship task declared done but the work is not landed (not merged, pushed to a remote/fork, or merged into local main) (directive #3)" + }, + { + "name": "NO_UNLANDED_AT_TEARDOWN", + "forbid": {"worktree": "holds_unlanded_work"}, + "reason": "worktree still holds unlanded work; teardown would discard it (directive #3)" + }, + { + "name": "MERGE_NEEDS_CAPTAIN_WORD", + "when": {"landed": "merged"}, + "forbid": {"captain_approval": "pending"}, + "reason": "merge performed without the captain's explicit approval (directive #2)" + } + ], + "soft_rules": [ + { + "name": "PREFER_EVIDENCE_CITED", + "require_meta": "evidence_cited", + "weight": 3, + "reason": "deliverable does not cite file:line evidence" + }, + { + "name": "PREFER_AGENTS_MD_UPDATED", + "when": {"kind": "ship"}, + "require_meta": "agents_md_touched", + "weight": 2, + "reason": "ship task recorded no durable project knowledge in AGENTS.md" + }, + { + "name": "PREFER_BACKLOG_RECORDED", + "require_meta": "backlog_recorded", + "weight": 1, + "reason": "completion not yet recorded in the backlog" + } + ] +} diff --git a/bin/fm-merge-local.sh b/bin/fm-merge-local.sh index 0baf4e5..1088f4c 100755 --- a/bin/fm-merge-local.sh +++ b/bin/fm-merge-local.sh @@ -40,6 +40,22 @@ default_branch() { return 1 } +# Formal completeness gate (directive #2): the local merge IS firstmate exercising +# the captain's merge authority, so the captain's word must be asserted explicitly +# via $FM_CAPTAIN_APPROVED (granted | yes | 1 | true; or not_required under yolo). +# The gate FAILS OPEN when the solver tooling is absent, so existing setups keep +# working until they install it; once installed, an unasserted approval is blocked. +set +e +gate_out=$("$FM_ROOT/bin/fm-completeness-check.sh" --gate merge --id "$ID" 2>&1) +gate_rc=$? +set -e +if [ "$gate_rc" = 2 ]; then + printf '%s\n' "$gate_out" >&2 + echo "REFUSED: completeness gate blocked the local merge of $ID." >&2 + echo "Assert the captain's approval explicitly, e.g. FM_CAPTAIN_APPROVED=granted bin/fm-merge-local.sh $ID" >&2 + exit 1 +fi + BRANCH="fm/$ID" git -C "$PROJ" rev-parse --verify --quiet "refs/heads/$BRANCH" >/dev/null || { echo "error: branch $BRANCH does not exist in $PROJ" >&2; exit 1; } diff --git a/bin/fm-teardown.sh b/bin/fm-teardown.sh index ddd0a6d..b25d284 100755 --- a/bin/fm-teardown.sh +++ b/bin/fm-teardown.sh @@ -415,6 +415,23 @@ if [ "$KIND" = secondmate ] && [ "$FORCE" = "--force" ]; then cleanup_firstmate_home_children "$HOME_PATH" fi +# Formal completeness gate (AGENTS.md section 7): prove the "done" claim consistent +# with the directives before the irreversible teardown. It FAILS OPEN when the +# solver tooling is absent, so the battle-tested bash checks below remain the hard +# guarantee; when present it gives a provable, named refusal first. Skipped under +# --force (the explicit discard path) and for secondmates (governed elsewhere). +if [ "$FORCE" != "--force" ] && { [ "$KIND" = ship ] || [ "$KIND" = scout ]; }; then + set +e + gate_out=$("$FM_ROOT/bin/fm-completeness-check.sh" --gate teardown --id "$ID" 2>&1) + gate_rc=$? + set -e + if [ "$gate_rc" = 2 ]; then + printf '%s\n' "$gate_out" >&2 + echo "REFUSED: completeness gate blocked teardown of $ID." >&2 + exit 1 + fi +fi + if [ -d "$WT" ] && [ "$FORCE" != "--force" ]; then if [ "$KIND" = secondmate ]; then : diff --git a/tests/fm-completeness.test.sh b/tests/fm-completeness.test.sh new file mode 100755 index 0000000..60e8a4a --- /dev/null +++ b/tests/fm-completeness.test.sh @@ -0,0 +1,118 @@ +#!/usr/bin/env bash +# Tests for bin/fm-completeness-check.sh (the formal completeness gate). +# +# Two tiers: +# - Tooling-agnostic (always run): the off-switch, fail-open behavior, strict +# enforcement, argument parsing. These must hold even where the Z3-backed +# neurosymbolic-evaluator is not installed (e.g. CI), because the gate is +# designed to FAIL OPEN and never wedge the lifecycle. +# - Solver-dependent (run only when neurosymbolic-evaluator + z3 import): the +# actual SAT/UNSAT verdicts for the invariant matrix. +set -u + +ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +GATE="$ROOT/bin/fm-completeness-check.sh" +TMP_ROOT= + +fail() { printf 'not ok - %s\n' "$1" >&2; exit 1; } +pass() { printf 'ok - %s\n' "$1"; } +skip() { printf 'ok - %s # SKIP\n' "$1"; } + +cleanup() { [ -n "${TMP_ROOT:-}" ] && rm -rf "$TMP_ROOT"; } +trap cleanup EXIT +TMP_ROOT=$(mktemp -d "${TMPDIR:-/tmp}/fm-completeness-tests.XXXXXX") + +# rc_of runs the gate with the given args and echoes its exit code (never aborts). +rc_of() { + set +e + "$GATE" "$@" >/dev/null 2>&1 + local rc=$? + set -e + printf '%s' "$rc" +} + +# --- Tooling-agnostic tier --------------------------------------------------- + +# Off-switch always yields 0 regardless of facts. +if [ "$(FM_COMPLETENESS_GATE=0 "$GATE" --kind scout --report absent >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "off-switch FM_COMPLETENESS_GATE=0 exits 0" +else + fail "off-switch did not exit 0" +fi + +# Fail-open: a broken rules path is non-fatal by default. +if [ "$(FM_COMPLETENESS_RULES=/no/such.json "$GATE" --kind scout --report present >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "fail-open on missing rules file exits 0" +else + fail "missing rules file did not fail open" +fi + +# Strict mode turns the same breakage into a hard error (exit 3). +if [ "$(FM_COMPLETENESS_STRICT=1 FM_COMPLETENESS_RULES=/no/such.json "$GATE" --kind scout --report present >/dev/null 2>&1; echo $?)" = "3" ]; then + pass "FM_COMPLETENESS_STRICT=1 enforces (exit 3) on tooling breakage" +else + fail "strict mode did not enforce on tooling breakage" +fi + +# Unknown argument is a usage error. +if [ "$(rc_of --bogus x)" = "64" ]; then + pass "unknown argument exits 64" +else + fail "unknown argument did not exit 64" +fi + +# --- Solver-dependent tier --------------------------------------------------- + +if ! python3 -c "import neurosymbolic_evaluator, z3" >/dev/null 2>&1; then + skip "solver matrix (neurosymbolic-evaluator/z3 not importable)" + skip "solver --id derivation (neurosymbolic-evaluator/z3 not importable)" + exit 0 +fi + +assert_rc() { + local want=$1 desc=$2; shift 2 + local got; got=$(rc_of "$@") + if [ "$got" = "$want" ]; then pass "$desc"; else fail "$desc (rc=$got want=$want)"; fi +} + +# Hard invariants gate (exit 2 = UNSAT/blocked, 0 = SAT/clear). +assert_rc 2 "scout without report is blocked" --kind scout --report absent +assert_rc 0 "scout with report clears" --kind scout --report present +assert_rc 2 "ship not landed is blocked" --kind ship --landed none --worktree holds_unlanded_work +assert_rc 0 "ship pushed + clean clears" --kind ship --landed pushed --worktree clean +assert_rc 2 "merge without approval is blocked" --gate merge --kind ship --landed merged --captain-approval pending +assert_rc 0 "merge with approval clears" --gate merge --kind ship --landed merged --captain-approval granted + +# Merge gate reads approval from the environment (directive #2 explicit assert). +if [ "$(FM_CAPTAIN_APPROVED=granted "$GATE" --gate merge --kind ship --landed merged >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "merge gate honors FM_CAPTAIN_APPROVED=granted" +else + fail "merge gate did not honor FM_CAPTAIN_APPROVED=granted" +fi +if [ "$("$GATE" --gate merge --kind ship --landed merged >/dev/null 2>&1; echo $?)" = "2" ]; then + pass "merge gate blocks when approval unset (defaults pending)" +else + fail "merge gate did not block with approval unset" +fi + +# Graded mode never blocks on a soft-only miss, but reports it. +assert_rc 0 "graded ship is SAT despite unmet soft rules" \ + --mode graded --kind ship --landed merged --captain-approval granted --worktree clean + +# --id derivation from a synthetic meta + report. +HOME_DIR="$TMP_ROOT/home" +mkdir -p "$HOME_DIR/state" "$HOME_DIR/data/scout-x" +printf 'kind=scout\nmode=no-mistakes\nworktree=%s\nproject=%s\n' "$TMP_ROOT/wt" "$TMP_ROOT/proj" > "$HOME_DIR/state/scout-x.meta" +# No report yet -> blocked. +if [ "$(FM_HOME="$HOME_DIR" "$GATE" --gate teardown --id scout-x >/dev/null 2>&1; echo $?)" = "2" ]; then + pass "--id scout derivation blocks with no report" +else + fail "--id scout derivation did not block without report" +fi +# Write the report -> clears. +printf '# report\n' > "$HOME_DIR/data/scout-x/report.md" +if [ "$(FM_HOME="$HOME_DIR" "$GATE" --gate teardown --id scout-x >/dev/null 2>&1; echo $?)" = "0" ]; then + pass "--id scout derivation clears once report exists" +else + fail "--id scout derivation did not clear with report present" +fi From 276e1d89f13a8e1adb0184f6c0c8597ca8851454 Mon Sep 17 00:00:00 2001 From: Tom Harper Date: Wed, 24 Jun 2026 22:23:36 -0700 Subject: [PATCH 2/6] refactor: make completeness gate depend only on z3 Replace the neurosymbolic_evaluator dependency with a small self-contained Z3 shim (axes -> EnumSort, hard rules -> Implies, per-rule SAT check for named violations, deterministic soft scoring). The only dependency is now z3-solver (public PyPI), so the gate is portable with no private coupling. Behavior and the JSON/CLI contract are unchanged. --- bin/fm-completeness.py | 177 ++++++++++++++++++++-------------- tests/fm-completeness.test.sh | 16 +-- 2 files changed, 114 insertions(+), 79 deletions(-) diff --git a/bin/fm-completeness.py b/bin/fm-completeness.py index 628d88f..e3cdafd 100755 --- a/bin/fm-completeness.py +++ b/bin/fm-completeness.py @@ -3,14 +3,22 @@ Firstmate's "is this task done / safe to tear down / clear to merge?" calls are really invariants (AGENTS.md prime directives #2 and #3). This turns them into a -formal rule set checked by the Z3-backed neurosymbolic-evaluator: firstmate -proposes a completion claim with the observed facts, and the solver PROVES it -either SAT (consistent with every invariant) or UNSAT (provably premature, with -the violated rule named). Hard rules gate; soft rules score (never block). - -The rules are DATA, not code: they load from fm-completeness.rules.json (or -$FM_COMPLETENESS_RULES). This module only translates that data into the -evaluator's custom-rule callbacks and runs the check. +formal rule set checked by Z3: firstmate proposes a completion claim with the +observed facts, and the solver PROVES it either SAT (consistent with every +invariant) or UNSAT (provably premature, with the violated rule named). Hard +rules gate; soft rules score (never block). + +The only dependency is `z3-solver` (public PyPI) — there is no other library to +install. The rules are DATA, not code: they load from fm-completeness.rules.json +(or $FM_COMPLETENESS_RULES). This module is a small self-contained shim that +compiles that data into a Z3 model and runs the check. + +Encoding: each axis is a Z3 EnumSort variable. A hard rule compiles to +`Implies(when, require AND not-forbid)`. A concrete claim is verified by asking, +per rule, whether (facts AND rule) is satisfiable — UNSAT means that rule is +violated by the facts. `prove_consistency` checks the whole hard rule set is +satisfiable over free axes (a contradiction there is a bug in the directives, +not the task). Soft rules score deterministically over the concrete metadata. I/O: reads one JSON object of facts on stdin, e.g. {"name": "fix-x", "kind": "ship", "landed": "none", @@ -41,92 +49,119 @@ def _default_rules_path() -> str: "fm-completeness.rules.json") -def _applies(axes: dict, when: dict) -> bool: - """A rule with a `when` clause applies only when every condition matches.""" - return all(axes.get(k) == v for k, v in when.items()) +def _build_axes(z3, axes_spec): + """Compile each axis into a Z3 EnumSort plus its value-constant map.""" + consts = {} # axis -> {value_str: z3 const} + variables = {} # axis -> z3 Const (the proposed value of that axis) + for axis, values in axes_spec.items(): + values = list(values) + _sort, members = z3.EnumSort(axis, values) + consts[axis] = dict(zip(values, members)) + variables[axis] = z3.Const(axis + "__proposed", _sort) + return consts, variables -def _make_hard_check(rule: dict): +def _rule_constraint(z3, rule, consts, variables): + """Compile one hard rule to Implies(when, require AND not-forbid).""" when = rule.get("when", {}) require = rule.get("require", {}) forbid = rule.get("forbid", {}) - reason = rule["reason"] - - def check(pred, _vocab): - axes = pred.axes - if not _applies(axes, when): - return None - for key, val in require.items(): - if axes.get(key) != val: - return reason - for key, val in forbid.items(): - if axes.get(key) == val: - return reason - return None - - return check - - -def _make_soft_check(rule: dict): - when = rule.get("when", {}) - meta_key = rule.get("require_meta") - reason = rule["reason"] - def check(pred, _vocab): - if not _applies(pred.axes, when): - return None - if meta_key is not None and pred.metadata.get(meta_key): - return None - return reason + antecedent = [variables[k] == consts[k][v] for k, v in when.items()] + consequent = [variables[k] == consts[k][v] for k, v in require.items()] + consequent += [variables[k] != consts[k][v] for k, v in forbid.items()] - return check + body = z3.And(*consequent) if consequent else z3.BoolVal(True) + if antecedent: + return z3.Implies(z3.And(*antecedent), body) + return body -def build_vocabulary(spec: dict): - """Translate the rules data file into a verified Vocabulary.""" - from neurosymbolic_evaluator import Vocabulary +def _verify_hard(z3, spec, fact_axes, consts, variables): + """Return the list of hard rules whose constraint is violated by the facts. - vocab = Vocabulary(spec.get("name", "firstmate_task_completeness")) - for axis, values in spec.get("axes", {}).items(): - vocab.add_axis(axis, list(values)) + A rule is violated iff (facts AND rule) is UNSAT — i.e. no world with these + facts can also satisfy the rule. Checking per rule names every violation, + not just one minimal unsat core. + """ + fact_asserts = [variables[k] == consts[k][v] for k, v in fact_axes.items()] + violated = [] for rule in spec.get("hard_rules", []): - vocab.add_custom_rule(rule["name"], _make_hard_check(rule)) + solver = z3.Solver() + for assertion in fact_asserts: + solver.add(assertion) + solver.add(_rule_constraint(z3, rule, consts, variables)) + if solver.check() == z3.unsat: + violated.append(rule) + return violated + + +def _score_soft(spec, fact_axes, metadata): + """Deterministic weighted compliance over concrete metadata (never gates).""" + total = 0 + satisfied = 0 + unmet = [] for rule in spec.get("soft_rules", []): - vocab.add_custom_soft_rule( - rule["name"], _make_soft_check(rule), weight=int(rule.get("weight", 1))) - return vocab + when = rule.get("when", {}) + if any(fact_axes.get(k) != v for k, v in when.items()): + continue # rule does not apply to this proposal + weight = int(rule.get("weight", 1)) + total += weight + key = rule.get("require_meta") + if key is not None and metadata.get(key): + satisfied += weight + else: + unmet.append(rule["name"]) + compliance = 1.0 if total == 0 else satisfied / total + return compliance, unmet + + +def prove_consistency(z3, spec, consts, variables) -> bool: + """True iff the whole hard rule set is satisfiable over free axis values.""" + solver = z3.Solver() + for rule in spec.get("hard_rules", []): + solver.add(_rule_constraint(z3, rule, consts, variables)) + return solver.check() == z3.sat def check(spec: dict, facts: dict) -> dict: - vocab = build_vocabulary(spec) + import z3 # public PyPI z3-solver; absence -> ImportError -> caller fails open - name = facts.get("name", "task") mode = facts.get("mode", "strict") metadata = facts.get("metadata", {}) or {} - declared_axes = set(spec.get("axes", {})) - axis_values = {k: v for k, v in facts.items() - if k not in ("name", "mode", "metadata") and k in declared_axes} - - # Reject axis values the data file never declared — a typo'd fact must surface - # as a tooling error, not silently pass the gate. - for axis, value in axis_values.items(): - if value not in spec["axes"][axis]: + declared = spec.get("axes", {}) + fact_axes = {k: v for k, v in facts.items() + if k not in ("name", "mode", "metadata") and k in declared} + + # A fact whose value the data file never declared is a typo, not a pass. + for axis, value in fact_axes.items(): + if value not in declared[axis]: raise ValueError( "fact %s=%r is not a declared value of axis %s (%s)" - % (axis, value, axis, ", ".join(spec["axes"][axis]))) + % (axis, value, axis, ", ".join(declared[axis]))) + + consts, variables = _build_axes(z3, declared) + + violated = _verify_hard(z3, spec, fact_axes, consts, variables) + sat = not violated if mode == "graded": - result = vocab.verify_extension_graded(name, metadata=metadata, **axis_values) + compliance, unmet = _score_soft(spec, fact_axes, metadata) + else: + compliance, unmet = 1.0, [] + + if sat: + reason = "consistent with every invariant" else: - result = vocab.verify_extension(name, metadata=metadata, **axis_values) + reason = "; ".join("[%s] %s" % (r["name"], r["reason"]) for r in violated) return { - "sat": bool(result.sat), - "reason": result.reason, - "violated_rules": list(result.violated_rules), - "counterexample": result.counterexample, - "compliance": getattr(result, "compliance", 1.0), - "unmet_soft": list(getattr(result, "unmet_soft", [])), + "sat": bool(sat), + "reason": reason, + "violated_rules": [r["name"] for r in violated], + "counterexample": None, + "compliance": compliance, + "unmet_soft": unmet, "mode": mode, } @@ -151,8 +186,8 @@ def main() -> int: try: out = check(spec, facts) except ImportError as exc: - print(json.dumps({"error": "neurosymbolic-evaluator not importable: %s" - % exc}), file=sys.stderr) + print(json.dumps({"error": "z3-solver not importable: %s" % exc}), + file=sys.stderr) return EXIT_ERROR except (ValueError, KeyError) as exc: print(json.dumps({"error": str(exc)}), file=sys.stderr) diff --git a/tests/fm-completeness.test.sh b/tests/fm-completeness.test.sh index 60e8a4a..e90258b 100755 --- a/tests/fm-completeness.test.sh +++ b/tests/fm-completeness.test.sh @@ -3,11 +3,11 @@ # # Two tiers: # - Tooling-agnostic (always run): the off-switch, fail-open behavior, strict -# enforcement, argument parsing. These must hold even where the Z3-backed -# neurosymbolic-evaluator is not installed (e.g. CI), because the gate is -# designed to FAIL OPEN and never wedge the lifecycle. -# - Solver-dependent (run only when neurosymbolic-evaluator + z3 import): the -# actual SAT/UNSAT verdicts for the invariant matrix. +# enforcement, argument parsing. These must hold even where z3 is not +# installed (e.g. CI), because the gate is designed to FAIL OPEN and never +# wedge the lifecycle. +# - Solver-dependent (run only when z3 imports): the actual SAT/UNSAT verdicts +# for the invariant matrix. set -u ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" @@ -63,9 +63,9 @@ fi # --- Solver-dependent tier --------------------------------------------------- -if ! python3 -c "import neurosymbolic_evaluator, z3" >/dev/null 2>&1; then - skip "solver matrix (neurosymbolic-evaluator/z3 not importable)" - skip "solver --id derivation (neurosymbolic-evaluator/z3 not importable)" +if ! python3 -c "import z3" >/dev/null 2>&1; then + skip "solver matrix (z3 not importable)" + skip "solver --id derivation (z3 not importable)" exit 0 fi From 6e27fbdb0c1b073084c750b77401933d12f60423 Mon Sep 17 00:00:00 2001 From: Tom Harper Date: Wed, 24 Jun 2026 23:00:13 -0700 Subject: [PATCH 3/6] feat: surface z3 completeness gate as optional bootstrap capability Report COMPLETENESS_GATE: available when python3 can import z3, add a z3 install resolver (pip install z3-solver), and document it alongside TASKS_AXI. Mirrors the optional-capability pattern: silent and never blocking when absent, since the gate fails open. --- AGENTS.md | 4 ++++ bin/fm-bootstrap.sh | 11 ++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/AGENTS.md b/AGENTS.md index fa55764..b6dea64 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -114,6 +114,10 @@ Otherwise it prints one line per problem or capability fact; handle each: - `FLEET_SYNC: : skipped: ` - bootstrap continued; investigate only if the dirty, diverged, or offline clone blocks work. - `TASKS_AXI: available` - an optional capability fact, not a problem; record it silently and use section 10 for backlog mutations. It prints only after the `tasks-axi` compatibility probe passes for version 0.1.1 or newer; absence or incompatibility only falls back to hand-editing and never blocks work. +- `COMPLETENESS_GATE: available` - an optional capability fact, not a problem; record it silently and never surface it to the captain. + Bootstrap prints this only when `python3` can import `z3`. + When present, the formal completeness gate (`bin/fm-completeness-check.sh`, wired into teardown and the local merge) actively proves each done/teardown/merge claim against the directives; when `z3` is absent the gate fails open and firstmate relies on the existing bash safety checks exactly as before. + It is never a missing tool to install: `fm-bootstrap.sh install z3` opts in, and its absence never blocks work. Bootstrap's fleet refresh is bounded by `FM_FLEET_SYNC_BOOTSTRAP_TIMEOUT` seconds, default 20; a timeout is reported as a `FLEET_SYNC` skip and does not block startup. diff --git a/bin/fm-bootstrap.sh b/bin/fm-bootstrap.sh index a01b385..937993f 100755 --- a/bin/fm-bootstrap.sh +++ b/bin/fm-bootstrap.sh @@ -5,7 +5,12 @@ # Silent = all good. # Lines: "MISSING: (install: )", "NEEDS_GH_AUTH", # "CREW_HARNESS_OVERRIDE: ", "FLEET_SYNC: : skipped: ", -# "TASKS_AXI: available". +# "TASKS_AXI: available", "COMPLETENESS_GATE: available". +# z3 (z3-solver) is an OPTIONAL capability that powers the formal +# completeness gate (bin/fm-completeness-check.sh); reported only when +# python3 can import z3. It is never a MISSING line and never prompts an +# install - the gate fails open without it. Opt in with +# "fm-bootstrap.sh install z3". # treehouse is also MISSING when its installed version lacks # "treehouse get --lease" support. # tasks-axi is an OPTIONAL backlog-management capability reported only @@ -69,6 +74,7 @@ install_cmd() { tmux|node|gh) echo "brew install $1 # or the platform's package manager" ;; treehouse) echo "curl -fsSL https://kunchenguid.github.io/treehouse/install.sh | sh" ;; no-mistakes) echo "curl -fsSL https://raw.githubusercontent.com/kunchenguid/no-mistakes/main/docs/install.sh | sh" ;; + z3) echo "python3 -m pip install z3-solver" ;; gh-axi|chrome-devtools-axi|lavish-axi) echo "npm install -g $1 && $1 setup hooks" ;; *) return 1 ;; esac @@ -103,5 +109,8 @@ crew= [ -f "$CONFIG/crew-harness" ] && crew=$(tr -d '[:space:]' < "$CONFIG/crew-harness" || true) [ -n "$crew" ] && [ "$crew" != "default" ] && echo "CREW_HARNESS_OVERRIDE: $crew" fm_tasks_axi_compatible && echo "TASKS_AXI: available" +if command -v python3 >/dev/null 2>&1 && python3 -c 'import z3' >/dev/null 2>&1; then + echo "COMPLETENESS_GATE: available" +fi fleet_sync exit 0 From f1c3e62b1cab65744f066c4d5b6c7eede7df941f Mon Sep 17 00:00:00 2001 From: Tom Harper Date: Thu, 25 Jun 2026 02:37:04 -0700 Subject: [PATCH 4/6] no-mistakes(review): fix completeness gate absent-worktree false-block and strict exit-3 enforcement --- bin/fm-completeness-check.sh | 12 +++++++++++- bin/fm-merge-local.sh | 2 +- bin/fm-teardown.sh | 2 +- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/bin/fm-completeness-check.sh b/bin/fm-completeness-check.sh index 21e459a..eabdd0c 100755 --- a/bin/fm-completeness-check.sh +++ b/bin/fm-completeness-check.sh @@ -80,7 +80,17 @@ git_unlanded_facts() { # the script it guards. Sets LANDED and WORKTREE for a ship task. local wt=$1 proj=$2 mode=$3 dirty unpushed default unmerged if [ ! -d "$wt" ]; then - LANDED="${LANDED:-none}"; WORKTREE="${WORKTREE:-clean}"; return 0 + # No worktree on disk means there is nothing to discard, exactly as + # fm-teardown.sh skips its unlanded check when [ ! -d "$WT" ]. Resolve to a + # non-blocking state: clean worktree and a landed value that clears + # SHIP_REQUIRES_LANDED, rather than landed=none which would false-block. + if [ "$mode" = "local-only" ]; then + LANDED="${LANDED:-local_merged}" + else + LANDED="${LANDED:-pushed}" + fi + WORKTREE="${WORKTREE:-clean}" + return 0 fi dirty=$(git -C "$wt" status --porcelain 2>/dev/null | grep -vE '^\?\? \.claude/' | head -1 || true) unpushed=$(git -C "$wt" log --oneline HEAD --not --remotes -- 2>/dev/null | head -1 || true) diff --git a/bin/fm-merge-local.sh b/bin/fm-merge-local.sh index 1088f4c..5bf1f4b 100755 --- a/bin/fm-merge-local.sh +++ b/bin/fm-merge-local.sh @@ -49,7 +49,7 @@ set +e gate_out=$("$FM_ROOT/bin/fm-completeness-check.sh" --gate merge --id "$ID" 2>&1) gate_rc=$? set -e -if [ "$gate_rc" = 2 ]; then +if [ "$gate_rc" = 2 ] || [ "$gate_rc" = 3 ]; then printf '%s\n' "$gate_out" >&2 echo "REFUSED: completeness gate blocked the local merge of $ID." >&2 echo "Assert the captain's approval explicitly, e.g. FM_CAPTAIN_APPROVED=granted bin/fm-merge-local.sh $ID" >&2 diff --git a/bin/fm-teardown.sh b/bin/fm-teardown.sh index b25d284..d90b585 100755 --- a/bin/fm-teardown.sh +++ b/bin/fm-teardown.sh @@ -425,7 +425,7 @@ if [ "$FORCE" != "--force" ] && { [ "$KIND" = ship ] || [ "$KIND" = scout ]; }; gate_out=$("$FM_ROOT/bin/fm-completeness-check.sh" --gate teardown --id "$ID" 2>&1) gate_rc=$? set -e - if [ "$gate_rc" = 2 ]; then + if [ "$gate_rc" = 2 ] || [ "$gate_rc" = 3 ]; then printf '%s\n' "$gate_out" >&2 echo "REFUSED: completeness gate blocked teardown of $ID." >&2 exit 1 From 3b664f4caaabc9690c00540cd3567ec24fe516d8 Mon Sep 17 00:00:00 2001 From: Tom Harper Date: Thu, 25 Jun 2026 02:45:49 -0700 Subject: [PATCH 5/6] no-mistakes(document): docs: sync README/CONTRIBUTING/AGENTS for completeness gate --- AGENTS.md | 2 +- CONTRIBUTING.md | 2 +- README.md | 11 ++++++++++- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index b6dea64..12af3d4 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -349,7 +349,7 @@ A ship task's path from `done` to landed on `main` is set by the project's `mode When reviewing any crewmate branch diff, use `bin/fm-review-diff.sh ` rather than `git diff ...branch` directly. Pooled clones keep their local default refs frozen at clone time and can lag `origin`; the helper always compares against the authoritative base. -**yolo (orthogonal).** With `yolo=off` (default) every approval is the captain's: ask-user findings, PR merges, the local-only merge. With `yolo=on`, firstmate makes those calls itself without asking - resolve ask-user findings on your judgment, and run `gh-axi pr merge` / `bin/fm-merge-local.sh` once the work is green/approved - EXCEPT anything destructive, irreversible, or security-sensitive, which still escalates to the captain. Never merge a red PR even under yolo. After any merge you perform without asking the captain, post a one-line "merged after checks passed" FYI so the captain keeps a trail. +**yolo (orthogonal).** With `yolo=off` (default) every approval is the captain's: ask-user findings, PR merges, the local-only merge. With `yolo=on`, firstmate makes those calls itself without asking - resolve ask-user findings on your judgment, and run `gh-axi pr merge` / `FM_CAPTAIN_APPROVED=not_required bin/fm-merge-local.sh` once the work is green/approved - EXCEPT anything destructive, irreversible, or security-sensitive, which still escalates to the captain. Never merge a red PR even under yolo. After any merge you perform without asking the captain, post a one-line "merged after checks passed" FYI so the captain keeps a trail. ### Validate diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1820a95..40d4de9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -38,7 +38,7 @@ See the [no-mistakes quick start](https://kunchenguid.github.io/no-mistakes/star Everything personal to one captain's fleet (`data/`, `state/`, `config/`, `projects/`, `.no-mistakes/`) is gitignored; never commit it. The root `.tasks.toml` is tracked `tasks-axi` config for `data/backlog.md`; compatible `tasks-axi` uses it for routine backlog mutations. It does not make `data/` tracked. -- Helper scripts in `bin/` are plain bash. +- Helper scripts in `bin/` are plain bash, with one exception: `fm-completeness.py` is the optional Z3-backed completeness engine driven by `fm-completeness-check.sh`. Each starts with a usage header comment; keep it accurate when you change behavior. Test scripts and helpers in `tests/` are plain bash too. `shellcheck bin/*.sh tests/*.sh` must pass, and CI enforces it. diff --git a/README.md b/README.md index 1fd3f94..45b5adb 100644 --- a/README.md +++ b/README.md @@ -142,6 +142,8 @@ firstmate works from any terminal - outside tmux, crewmates land in a detached ` - **Local clones stay fresh** - bootstrap and PR-based teardown refresh remote-backed project clones with clean default-branch fast-forwards when the clone is on the default branch and has no local work, and prune local branches whose remote is gone and that no worktree still needs. - **Self-updates stay safe** - `/updatefirstmate` fast-forwards the running firstmate repo and registered secondmate homes from `origin`, then re-reads updated instructions and nudges updated secondmates without touching project clones. The update is fast-forward only: dirty, diverged, offline, and off-default targets are reported and left untouched. +- **Lifecycle claims are provable** - an optional formal completeness gate (`bin/fm-completeness-check.sh`, Z3-backed) wires into teardown and the local merge to prove each done/teardown/merge claim consistent with the directives before the irreversible step. + Bootstrap reports it as available only when `python3` can import `z3`; when the solver is absent the gate fails open and the existing bash safety checks remain the hard guarantee. - **Restart-proof** - all state lives in tmux, status files, local markdown under `data/`, `data/secondmates.md`, and persistent secondmate homes. Kill the first mate session anytime; the next one reconciles and carries on. @@ -161,7 +163,7 @@ The first mate drives these; you rarely need to, but they work by hand too. | `fm-home-seed.sh` | Lease/provision a secondmate home transactionally, clone projects, initialize gates, and maintain `data/secondmates.md` | | `fm-spawn.sh` | Spawn one task, several `id=repo` pairs, or a persistent secondmate with `--secondmate` | | `fm-project-mode.sh` | Resolve a project's delivery mode and `+yolo` flag from `data/projects.md` | -| `fm-merge-local.sh` | Fast-forward a `local-only` project's local default branch after approval | +| `fm-merge-local.sh` | Fast-forward a `local-only` project's local default branch after the captain's approval is asserted via `FM_CAPTAIN_APPROVED` | | `fm-review-diff.sh` | Review a crewmate branch against the authoritative base, with optional `--stat` output | | `fm-watch-arm.sh` | Verified per-home watcher re-arm; reports `started`, `healthy`, or `FAILED`; `--restart` relaunches only this home's watcher | | `fm-watch.sh` | Singleton-safe one-shot watcher; blocks until supervision work is due, queues it durably, then exits with one reason line | @@ -173,6 +175,7 @@ The first mate drives these; you rarely need to, but they work by hand too. | `fm-pr-check.sh` | Record a PR-ready task and arm the watcher's merge poll | | `fm-promote.sh` | Promote a scout task in place so it becomes a protected ship task | | `fm-teardown.sh` | Return the worktree or retire/release a secondmate home; protects ship work, requires scout reports, checks child work, and prints the backlog reminder | +| `fm-completeness-check.sh` | Formal completeness gate wired into teardown and the local merge: proves a done/teardown/merge claim consistent with the directives (Z3-backed via `fm-completeness.py` + `fm-completeness.rules.json`); fails open when the solver is absent | | `fm-harness.sh` | Detect the running harness; resolve the effective crewmate harness | | `fm-lock.sh` | Per-home firstmate session lock | @@ -227,6 +230,11 @@ FM_WATCHER_STALE_GRACE=300 # defaults to FM_GUARD_GRACE; seconds a live watche FM_SIGNAL_GRACE=30 # seconds to coalesce nearby status and turn-end signals into one wake FM_FLEET_SYNC_BOOTSTRAP_TIMEOUT=20 # seconds allowed for bootstrap's best-effort clone refresh FM_FLEET_PRUNE=1 # set to 0 to skip pruning local branches whose upstream is gone +# formal completeness gate (bin/fm-completeness-check.sh); optional, fails open without z3 +FM_COMPLETENESS_GATE=1 # set to 0 to skip the gate entirely (always exits 0) +FM_COMPLETENESS_STRICT=0 # set to 1 to refuse instead of fail open when the solver tooling is absent/broken +FM_COMPLETENESS_RULES= # optional override for the rules file (default bin/fm-completeness.rules.json) +FM_CAPTAIN_APPROVED= # assert the captain's merge approval at the merge gate: granted|yes|1|true, or not_required under yolo FM_BUSY_REGEX='esc (to )?interrupt|Working\.\.\.' # busy-pane signatures, shared by watcher and tmux helper FM_COMPOSER_IDLE_RE= # optional empty-composer regex, applied after dim-ghost and border stripping FM_SEND_RETRIES=3 # fm-send Enter-retry attempts after typing the line once @@ -271,6 +279,7 @@ tests/fm-spawn-batch.test.sh # fm-spawn.sh batch (id=repo) argument tests/fm-bootstrap.test.sh # bootstrap dependency and feature-probe tests tests/fm-update.test.sh # fast-forward-only self-update, reread, nudge, dedup, and skip-safety tests tests/fm-teardown.test.sh # fm-teardown.sh safety and reminder checks: local-only fork-remote allow, truly-unpushed refuse, merged-to-main allow, no-mistakes regression, tasks-axi reminder, --force override +tests/fm-completeness.test.sh # completeness gate: off-switch, fail-open, strict enforcement, argument parsing, and (when z3 imports) the SAT/UNSAT invariant matrix plus --id derivation [ "$(readlink CLAUDE.md)" = "AGENTS.md" ] [ "$(readlink .claude/skills)" = "../.agents/skills" ] FM_HEARTBEAT=2 FM_POLL=1 bin/fm-watch-arm.sh # watcher re-arm smoke test (prints arm status, then "heartbeat") From c98385b3618aa4de29574f74140e8a0776005df6 Mon Sep 17 00:00:00 2001 From: Tom Harper Date: Thu, 25 Jun 2026 02:47:16 -0700 Subject: [PATCH 6/6] no-mistakes(lint): lint clean: shellcheck, ruff, pyflakes, jq all pass --- bin/__pycache__/fm-completeness.cpython-313.pyc | Bin 0 -> 11233 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 bin/__pycache__/fm-completeness.cpython-313.pyc diff --git a/bin/__pycache__/fm-completeness.cpython-313.pyc b/bin/__pycache__/fm-completeness.cpython-313.pyc new file mode 100644 index 0000000000000000000000000000000000000000..0eb96d745b7487058f5fa01112d74cd4085988b8 GIT binary patch literal 11233 zcmcIKTTmR=mA9vRUOmGM!ypL|7A-)Ck$C8d1VUh5dJANlA>$>xqhY28MjCoZ_aF%z zZ&>GJ!E!Aqu}dg%tx)2uaW=J)vZ=C_a+Oz!^4NTVN0uioZA!ICWy^nh?Izm%>^Zlm z2ZTr7xGK36clvSfx#ynuIkz9>4(jyUVM&_-{6hWIQ@co^h3CKgNQ^SA|slP zGDl2`Nx*CND66pams2?UYgWu6qgX^vv5IEJCb~sS2|2|ncF|fw6kfDJ?-1?KJ4GJ) zJTYH%&=D@tBRc7oTPzUs=vDp~J))~AKuD;FkT05hD-G0^f6K`i^UKNZ%!A#Gej$({ zc8K+P$NtAc$bp115{e7q#K>q|N=mY%s=`nxDGAYpB1B_~njFEqWJo<-8;?b$@OU^b z2@xqAQ)3BPt>gIv#-PorP##l-5AP`_wk%07Lid2(bTsRX; z#DN_VeKT@sM2f(^4~3KnkK5B(w-0cMCg~7WP{RpD5~^ifQ5cLT!l!+}vpzsLA;TgO zX;hLUk{kwttHQ~NowfQtP72kdV}tQnSU53$;;>Kn&&k(u_W%lp$q6Ar_ZE*0Dj{VY z_y~JZlfXA*f|OYrm??C&^|jT&1fY5%A~hk5aUq@v0r`}~h!7pAT`75WbQ5)_)P&4e z14$|m91nK%96xcatFNoOOB92>{$pLD!Re7i1X&6Qg#;Bi90%^kqX1A&hGH@hNgW2p zC5J;v9v6ureeh0*gp#3*aDW5|07r^bJ}P6n=;$H>bn21;?U+2&BuJs~un;;2i_!I9 zT9-UFA|{liK;?8WF4YNb!Z5HxXU_7H3I60^(5{%IR-YY~DGS(JiV;SG45sxs917R2}NR{zMxx(dWM68V?)3by8NwB7j{-KBp@-GQT#GY;;F?b5BiI9lu z`-T^Sngp)b2vXfp9WOw8sT_5!yh$jJ#?IBAE3XmCfe{gW075p3_i->o5`7{A4MQ1o?@ z%9!->%lt{4eiGSTuId1U>WCB#jm48SI&-pS2KO8vr^cpy2$%rq0!!j*LbnniFhE`1 zkN~OMK%sOsPbNld<6uDI!f9z-6{K@vX*4MyMe1?!dIM85-@1;R3B|`GbPgvJFv6&y zl+uc%0$LdYC&rR>Lf1L4l-q<@6zyb3kQ$s^ieKL`VgXHs)qJUkf z29%M*gCTWTIIDz4Q6It~85kIg*CxQEQFR6~z;o~cF+c(R4z>I$7{E`^6Z#m<5t2c` zKuy3tcYk7nho7*}h4Hn~tZ)GjnMW@0pKp)><2mWY@vB;|dDR2B6&b_x%EV8qGXp{} z+fOQ2!UK1_y7k&uKOz6$49Dl5k(XnJ%XntsZv2X--I;w?5+B>Gt;uPlCrX}e+I=?7 zlu$LRB%g^X30Y%@q@>1;hLXb?cPbH+H7ghZlrGI08dT|XTQC>}`yC7d&sdnJVkjeM zYX@U;{py8A$2D&-qX9wcP=W|UDS#<8@UOlA-6XloJ7=~`Z<(o^u3IeLmg2WhcKq7P zr`-ip{9Q-h%%15zGtJY@Z<;P!FIg|UFS);G`o8rY>-XL7xZk&>s@qaU?dhUjk66ap zaGx;dhDQLDgjY&LR&LVp<_)rq$R_w_{Y-Br4#OEnGh#F^Idp=MF@Q7LHGrZ>o8N3u zUpD(mE*T4S8C;=rhX!zU+6J-`6!?NM#();6+zev?Vh)+{L(E!>Z!N~Gz%Cqpyq6FM z5lEn7mkA`mG||?fJJ*#ik_KMrTeD7bG(HBd6ZDXp26WO8FF-gn5*n@3_+T&-dqsQ} z%^XWgBdW&5!KQ0A1A@k()6m$7*r;ai!09MQ6FW6Kx|X0iDup!*1*>Xy!zBY0^x2w4 zH%OW}sDj7VtRvFMproj(4K{@K9h>5T8_f|M9E-&xK?JH`NWBaG)zi>Tl6!gX$ph*9 zg2}__qKzmMw~Kt!&MEG$XU%N%>!nkcbn*Jj>o2XJ?@JZ$n08H>@2*?_X2-lGRk|&; zZu^vF+L|YP6DL8B&!KKi@!l@n9~|_M&rR+ zNMQ+bm81soysSbHjn~{@A~LQzVIbAwOs6YzKDGZefQ6tFA`xW=jC&sb)hEz_MmyXy zd#Cr#9++!SIm#wG)2@P<;pySo@%i$UYt!Vxbe?A>Fddj3o;RiPDkcxy+t>O*)AgqJ zn{Rmz{iylcuB+Q;eHUuxtFHJK+OE{xDyX~OJb7fvb11#9{e%72_rKqI%X9Qct+~dv z;<%$}&h^2u>&M>jzEyDaN8Q&tueQxLUTB_gxYD@r!jC^2=4%?Ou2V0 z+IFQK?#ZK19=S=uVdlvl&(SA$3XZDi3vcj6onFh0y+xhHmLFFYb(Yz}D}xu5K5B7h z!CisHF@3h%>NodWK4k^^16|Y#?50-0lEVsEWwUIlCtwMfT-qu!^;v5L)@2;D&JH@T z5>G%Ao8dq~E3GiH5Jo!ud@h{>J;^v=fgA_vL~E3LCrrb0&0_n8IW= zSjGHIrCAW<1t|QOBjnEXc)rcZ1(!#}qER7NR;g!^SP+QSLCy=s07zuwk!+Ddm3TcD z#eiJTE@&RAr=TLbHaX&zQ`AWMhj z6$=}6yvE~>^*GOGqt=Gn9nB)rf{ez7A`zaSI(03}t;W)M@suu`+=l@o|Z=zB~la$6;5kf!wGLBw2n9_B*GN7R3Y zZj#)0k(PtZ2f^#XTl>3jd3!$G-+is`>I<`nFC3dcaOLnq=*qEMh1-_)cTXN$^7eq* zPaaC=7hbeou+5duE4TBvOmS(C_hRvd;;*ltvfL@&G<7)bESNbmePnk0wsYeHmXuW8 zv$%hr~5cre~Q-!j*GIdCcPkz-TZSvb=(-80vT)AERCE1>2q zoa}k>$V&=OFe)nH>qQ+VOL%2?fKHB@jei9B{2K-;ql^qyP?-ZI$JjNadN3;6MAK7B z{w+{li{FxS7=Lie+6cs$?tVR#_+g3=GU-HXb%$ zK&Y{o1_bOg=2){M?SA_pxV+rPbXoDUSV2Y@I7DG{$NvH5nRDfXev3#0EMpZgYj0)D z22h?>h+%i4y@w5S)Plj1`M`H*?9NXd>|9ws5zb1npb5CaomKY1aw;wkk-?-M7TBAI zpwNA8RB6#R1Zja3(t>ERt{KXsV6a;1vhPpf38!(G)&3Vehjza;6Cr_xlE*V) z5{4*^@Y3f{n&5{buh|TuYvv>rE8?0xQ(}`MXw2X(H9&gUbP%R#EYxW*NHUnHqMH&< zW@4k|RA+hbN*{~}!@qh1x<5c<*7 zYtSC8CB+-&J>TRnco*zT)s2%!(vCIHN}#gg%uU!I3SKTzGh8=Ko>p2ln1gLqI07yL zY#Nu#;w;;dfH_d{>CoZ0kkL>smqr_S0G9LwiY}h*Fp;8}_klhr0Yb$rN{eB+%dt-t zfmfDeF&NORbPA@pK8u2@>M?r*USX)L?14^WLEhj!#_GCQF8jt7e1`5%#p0DNV7iid zK2AIu7T5G^Mu5 z7TF3x#*JufI&y`Lf&ZG9ZLtjOdmJoAQm7cpwt1ni@R1$a)C(vZht;t@2#_DLwT!g z$)x5m0)qO&ZrzV^SXGC3l9uz8ZUBa5Ebz0eas+Oa7qP>~6J`uZ1&!H1u@R$tJS+*} z;cOf#@ey6WF)5(uRud+ndcsr#qkWo1FaIm3h!bU)O@Hcia`^yu!n99#%9W$4Q&-*& z7b_h38rVN*nx7FFqyUux&^eJc?sjq+=e;iDygBN;%^EX6cQ#@AYeBQX@eCXQcE1Bz zj?xKmG)7T@sbF2GWl_5DSrHB9bGYm`;jIe60ihndr1zIaH`*4Mg{tq_uWr8@_}6_O zt#AEhk?yWvd9IA1>-wxpKg>`<2-#(X^?D?uxx$GtC_O0Bxe%U_DVr1f_VTfCemo1`dX}8B;r0VHK-cpqpE|*a#yn%oLm*oC<3Bj zrsjJ^*{K<$QapX!{bkX7a#q-RtMt=VR{`r`#QYpKI-FHIp6ptd6O3DeJnsrJEPH_li=b z?Mt@eDgHv&z0$Jzg4wPq{&#R3;Mp;4x#wAf`P5QL@4pwNN_tbC-YLst3&}5>?KnT4 zE-3m{T|=s5&v%CxJagOU`rmAPXLzY(Pr777daaONyB=KI1Gmj-xle3n%acbtktR}&_CIygBmseeQm_qMwYjqjeicIqL^ z*xQ-kW*}K%-X9-B)p6a^AP|MNf9qJPIJj7RDCIhQ*Sq#&(}gCST$Y)8*SYph+oEIR zYTV!2Ij??u?_zOH%2oTrj(3k-JMy8e^(oBNSAjLJv^hFha-)dv+(B-Xuj|~%-e|PJ z^^Z-?17`9Qwxn|d`;+GyI*ZwxUIwmj7Mro($amITZ*FB_z_Nk*l%b)B(T3I!6&~E%+PD_=>tj)eEML* z&=?E;%HIKs5FDR}A0>v@7CkRV|4XG)p;z=hUKwML&i?^jKHppb literal 0 HcmV?d00001