Skip to content

feat(completeness): add formal completeness gate for teardown and merge#81

Open
tomharper wants to merge 6 commits into
kunchenguid:mainfrom
tomharper:feat-completeness-gate
Open

feat(completeness): add formal completeness gate for teardown and merge#81
tomharper wants to merge 6 commits into
kunchenguid:mainfrom
tomharper:feat-completeness-gate

Conversation

@tomharper

Copy link
Copy Markdown

What Changed

  • Added a Z3-backed completeness gate (bin/fm-completeness-check.sh, bin/fm-completeness.py, bin/fm-completeness.rules.json) that proves each task's done/teardown/merge claim consistent with the directives, wired into fm-teardown.sh and fm-merge-local.sh; it fails open when the solver is absent and is controlled by FM_COMPLETENESS_GATE (off-switch) and FM_COMPLETENESS_STRICT (enforce-when-broken).
  • Surfaced the gate through fm-bootstrap.sh as an optional capability (COMPLETENESS_GATE: available), emitted only when python3 can import z3.
  • Fixed the absent-worktree false-block (treating nothing-on-disk as clean to match the bash semantics) and made the wired callers halt on strict-mode exit 3; added tests/fm-completeness.test.sh and synced README/CONTRIBUTING/AGENTS docs.

Risk Assessment

✅ Low: The change is well-bounded, both prior-round findings are correctly fixed, and the new gate faithfully mirrors the bash safety checks it guards (verified across local-only, non-local-only, absent-worktree, and squash-to-fork cases) while failing open when the solver tooling is absent.

Testing

Baseline test suite (tests/fm-completeness.test.sh) passes all 15 assertions with z3 4.15.2 present, exercising the solver tier rather than just the fail-open stubs. I then captured operator-visible CLI transcripts showing the gate's actual verdict lines for every hard invariant (SAT vs BLOCKED with the named violated directive rule), plus fail-open/strict/off-switch behavior and the two fixes in the latest commit (strict exit-3 enforcement and the absent-worktree non-false-block). Finally I drove the real fm-merge-local.sh against a sandbox local-only repo to prove the directive-#2 wiring changes actual git state only after captain approval is asserted. No UI surface is involved — this is a CLI/lifecycle gate, so CLI transcripts and persisted git state are the appropriate end-user evidence. Working tree left clean; all sandboxes removed.

Evidence: Completeness gate verdict transcript (SAT/BLOCKED across all invariants)
===== Formal completeness gate — operator-visible verdicts (z3 4.15.2) =====

### 1. SHIP task, work landed + clean worktree -> SAT (teardown proceeds)
completeness gate: SAT - task (teardown) clears every invariant
  [exit 0]

### 2. SHIP task, work NOT landed -> BLOCKED with named directive-#3 rule
completeness gate: BLOCKED - task (teardown) is provably premature
  violated: "SHIP_REQUIRES_LANDED", "NO_UNLANDED_AT_TEARDOWN"
  reason: [SHIP_REQUIRES_LANDED] ship task declared done but the work is not landed (not merged, pushed to a remote/fork, or merged into local main) (directive #3); [NO_UNLANDED_AT_TEARDOWN] worktree still holds unlanded work; teardown would discard it (directive #3)
  [exit 2]

### 3. SCOUT task, no report.md yet -> BLOCKED (report is the deliverable)
completeness gate: BLOCKED - task (teardown) is provably premature
  violated: "SCOUT_REQUIRES_REPORT"
  reason: [SCOUT_REQUIRES_REPORT] scout task declared done with no report.md; the report is the deliverable (directive #3 scout carve-out)
  [exit 2]

### 4. SCOUT task, report present -> SAT
completeness gate: SAT - task (teardown) clears every invariant
  [exit 0]

### 5. MERGE gate, captain approval unset -> BLOCKED (directive #2)
completeness gate: BLOCKED - task (merge) is provably premature
  violated: "MERGE_NEEDS_CAPTAIN_WORD"
  reason: [MERGE_NEEDS_CAPTAIN_WORD] merge performed without the captain's explicit approval (directive #2)
  [exit 2]

### 6. MERGE gate, FM_CAPTAIN_APPROVED=granted -> SAT
completeness gate: SAT - task (merge) clears every invariant
  [exit 0]

### 7. GRADED mode surfaces a compliance score over soft rules (never blocks)
completeness gate: SAT - task clears every invariant (compliance 0.0)
  [exit 0]
Evidence: Fail-open, strict enforcement, off-switch, and the two latest-commit fixes
===== Fail-open + strict enforcement + the two latest-commit fixes =====

### A. FAIL-OPEN: tooling broken (bad rules path) -> warns, exits 0, defers to bash checks
completeness gate: engine error: {"error": "cannot read rules file /no/such.json: [Errno 2] No such file or directory: '/no/such.json'"}; skipping formal check (set FM_COMPLETENESS_STRICT=1 to enforce)
  [exit 0]

### B. STRICT enforcement (latest-commit fix): same breakage with FM_COMPLETENESS_STRICT=1 -> hard refusal exit 3
completeness gate: engine error: {"error": "cannot read rules file /no/such.json: [Errno 2] No such file or directory: '/no/such.json'"} (FM_COMPLETENESS_STRICT=1 -> refusing)
  [exit 3]

### C. Off-switch: FM_COMPLETENESS_GATE=0 -> gate disabled entirely, exit 0 even on a would-be-blocked claim
  [exit 0]

### D. ABSENT-WORKTREE FALSE-BLOCK FIX (latest commit): a ship task whose worktree is already gone
        must NOT false-block teardown. Build a synthetic home with meta pointing at a nonexistent worktree.
    meta worktree=/var/folders/pj/563x6y950m9ccbkkgrt5h3y00000gn/T/tmp.jfDIfpRAwh/gone-wt (does not exist on disk)
completeness gate: SAT - ship-x (teardown) clears every invariant
  [exit 0 -> expect 0 SAT, not a false block]

### D2. local-only variant, worktree also absent -> resolves to local_merged, still SAT
completeness gate: SAT - ship-lo (teardown) clears every invariant
  [exit 0 -> expect 0 SAT]
Evidence: End-to-end: real fm-merge-local.sh refused without approval, fast-forwards main with approval
===== fm-merge-local.sh end-to-end (directive #2: captain merge authority) =====

### Attempt 1: NO approval asserted -> the real merge script is REFUSED by the gate
●━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
●  WATCHER DOWN - SUPERVISION IS OFF
●  1 task(s) in flight, but no watcher has a fresh beacon (last beat: never, grace 300s).
●  Trust bin/fm-watch-arm.sh for the true state: it confirms a live watcher and a fresh beacon, or fails loudly.
●  Re-arm it NOW: run bin/fm-watch-arm.sh as the harness-tracked background task (never a shell & that gets reaped).
●━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
completeness gate: BLOCKED - demo-1 (merge) is provably premature
  violated: "MERGE_NEEDS_CAPTAIN_WORD"
  reason: [MERGE_NEEDS_CAPTAIN_WORD] merge performed without the captain's explicit approval (directive #2)
REFUSED: completeness gate blocked the local merge of demo-1.
Assert the captain's approval explicitly, e.g. FM_CAPTAIN_APPROVED=granted bin/fm-merge-local.sh demo-1
  [merge script exit 1]
  main still at:     00374d7 init

### Attempt 2: FM_CAPTAIN_APPROVED=granted -> gate clears, merge fast-forwards main
●━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
●  WATCHER DOWN - SUPERVISION IS OFF
●  1 task(s) in flight, but no watcher has a fresh beacon (last beat: never, grace 300s).
●  Trust bin/fm-watch-arm.sh for the true state: it confirms a live watcher and a fresh beacon, or fails loudly.
●  Re-arm it NOW: run bin/fm-watch-arm.sh as the harness-tracked background task (never a shell & that gets reaped).
●━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
merged fm/demo-1 into local main (00374d7 -> 895120a) in /var/folders/pj/563x6y950m9ccbkkgrt5h3y00000gn/T/tmp.qKgkSW4q1a/home/projects/demo
  [merge script exit 0]
  main now at:     895120a the change
  file.txt contents: v2
Evidence: Test suite result (15/15 pass, solver tier active)
ok - off-switch FM_COMPLETENESS_GATE=0 exits 0
ok - fail-open on missing rules file exits 0
ok - FM_COMPLETENESS_STRICT=1 enforces (exit 3) on tooling breakage
ok - unknown argument exits 64
ok - scout without report is blocked
ok - scout with report clears
ok - ship not landed is blocked
ok - ship pushed + clean clears
ok - merge without approval is blocked
ok - merge with approval clears
ok - merge gate honors FM_CAPTAIN_APPROVED=granted
ok - merge gate blocks when approval unset (defaults pending)
ok - graded ship is SAT despite unmet soft rules
ok - --id scout derivation blocks with no report
ok - --id scout derivation clears once report exists

Pipeline

Updates from git push no-mistakes

⏭️ **intent** - skipped

✅ No issues found.

🔧 **Rebase** - 1 issue found → auto-fixed ✅
  • ⚠️ AGENTS.md - merge conflict rebasing onto origin/main

🔧 Fix applied.
✅ Re-checked - no issues remain.

⚠️ **Review** - 1 info
  • ⚠️ bin/fm-completeness-check.sh:83 - When the worktree directory is absent, git_unlanded_facts sets LANDED=none, which trips the SHIP_REQUIRES_LANDED hard rule (forbid landed=none) and blocks ship teardown (exit 2). This diverges from fm-teardown.sh, which guards its unlanded check with if [ -d "$WT" ] and skips it entirely when the worktree is gone, allowing teardown. On z3-installed setups this false-blocks legitimate cleanup of a task whose worktree was already removed/returned (e.g. an interrupted prior teardown). Resolve the absent-worktree case to a non-blocking state (treat 'nothing on disk to discard' as landed/clean) to match the bash semantics it mirrors.
  • ⚠️ bin/fm-teardown.sh:428 - The wired callers only treat gate exit code 2 as a refusal (if [ "$gate_rc" = 2 ]). Under FM_COMPLETENESS_STRICT=1, fail_open exits 3 to signal 'refuse on broken tooling', but both fm-teardown.sh:428 and fm-merge-local.sh:52 fall through on exit 3 and proceed. This makes strict mode silently ineffective at the exact call sites it is meant to harden, contradicting the documented enforce-when-broken guarantee. Halt on exit 3 as well (or treat any non-zero gate exit as a stop).

🔧 Fix: fix completeness gate absent-worktree false-block and strict exit-3 enforcement
1 info still open:

  • ℹ️ bin/fm-completeness.py:119 - prove_consistency() is defined and documented in the module docstring as the check that the whole hard rule set is satisfiable over free axes, but check()/main() never call it. As a result a self-contradictory pair of hard rules in fm-completeness.rules.json would go undetected — every concrete claim would then be reported UNSAT (blocked) with no signal that the rule data itself is broken. Either wire it into check() (e.g. surface a distinct 'rules inconsistent' error so the wrapper fails open) or drop the unused function and its docstring reference.
✅ **Test** - passed

✅ No issues found.

  • bash tests/fm-completeness.test.sh — all 15 assertions pass including the solver-dependent tier (z3 4.15.2 importable)
  • Operator-visible verdict transcript via bin/fm-completeness-check.sh across the full invariant matrix (ship landed/unlanded, scout with/without report, merge with/without approval, graded compliance score)
  • Fail-open (FM_COMPLETENESS_RULES=/no/such.json -> exit 0), strict enforcement (FM_COMPLETENESS_STRICT=1 -> exit 3), off-switch (FM_COMPLETENESS_GATE=0 -> exit 0)
  • Latest-commit absent-worktree fix: --gate teardown --id against synthetic meta whose worktree path does not exist -> SAT (no false block) for both no-mistakes and local-only modes
  • End-to-end real-script wiring: bin/fm-merge-local.sh demo-1 on a sandbox local-only git repo — refused (main unchanged) without approval, fast-forwarded main to the change with FM_CAPTAIN_APPROVED=granted
✅ **Document** - passed

✅ No issues found.

✅ **Lint** - passed

✅ No issues found.

✅ **Push** - passed

✅ No issues found.

Z3-backed neurosymbolic check (rules as data) that proves a task's
done/teardown/merge claim consistent with prime directives kunchenguid#2 and kunchenguid#3.
Hard rules gate; soft rules score. Wired into fm-teardown.sh and
fm-merge-local.sh; fails open when the solver is absent so the existing
bash checks remain the hard guarantee.
Replace the neurosymbolic_evaluator dependency with a small self-contained
Z3 shim (axes -> EnumSort, hard rules -> Implies, per-rule SAT check for
named violations, deterministic soft scoring). The only dependency is now
z3-solver (public PyPI), so the gate is portable with no private coupling.
Behavior and the JSON/CLI contract are unchanged.
Report COMPLETENESS_GATE: available when python3 can import z3, add a
z3 install resolver (pip install z3-solver), and document it alongside
TASKS_AXI. Mirrors the optional-capability pattern: silent and never
blocking when absent, since the gate fails open.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant