Follow-up to #387, which set up mutation testing and hardened the type checker and verifier (its stated scope): infrastructure + MUTATION.md + the README badge, the baseline (80.8% core), the checker hardening (#788), and the verifier + smt hardening (#793) — core 80.8% → 83.3%. Two pieces remain to take mutation coverage beyond the soundness core; they're tracked here so #387 can close on its scope.
1. Full-sweep reliability (prerequisite)
The full 10,620-mutant soundness-core sweep deadlocks on this setup — a mutmut 3.6 / Python 3.14 long-run worker hang. It failed four separate times (at 30–55% progress, including overnight and while actively watched), whereas targeted runs over a handful of functions complete reliably in minutes. This blocks any large sweep — including the whole-vera/ measurement below and the #387 timeout probe (now folded into #792). Options to investigate:
- per-module / chunked sweeps — run
only_mutate one module at a time and stitch the per-module CSVs, instead of one marathon;
- pin the sweep environment — the hang appeared on Python 3.14; try an older interpreter or a different
mutmut point release for sweep runs;
- a progress watchdog that restarts a stalled run (note: "resume after a hard kill" also proved unreliable here — a plain
mutmut run re-invoke discarded the banked verdicts; see MUTATION.md).
2. Whole-vera/ sweep
Once large sweeps are reliable, extend mutation measurement past the soundness core to codegen/, wasm/, transform.py, parser.py, the CLI, etc. The [tool.mutmut] only_mutate currently scopes to verifier.py / smt.py / checker/* / obligations/*. #421's execute() decomposition unblocked this — it no longer inflates a ~774 MB mutant file mutmut can't index.
Related: #792 (deep-layer soundness residual — smt translate layer + the aggregation cluster + _has_nat_origin — and the verifier timeout probe).
Follow-up to #387, which set up mutation testing and hardened the type checker and verifier (its stated scope): infrastructure +
MUTATION.md+ the README badge, the baseline (80.8% core), the checker hardening (#788), and the verifier + smt hardening (#793) — core 80.8% → 83.3%. Two pieces remain to take mutation coverage beyond the soundness core; they're tracked here so #387 can close on its scope.1. Full-sweep reliability (prerequisite)
The full 10,620-mutant soundness-core sweep deadlocks on this setup — a
mutmut3.6 / Python 3.14 long-run worker hang. It failed four separate times (at 30–55% progress, including overnight and while actively watched), whereas targeted runs over a handful of functions complete reliably in minutes. This blocks any large sweep — including the whole-vera/measurement below and the #387 timeout probe (now folded into #792). Options to investigate:only_mutateone module at a time and stitch the per-module CSVs, instead of one marathon;mutmutpoint release for sweep runs;mutmut runre-invoke discarded the banked verdicts; seeMUTATION.md).2. Whole-
vera/sweepOnce large sweeps are reliable, extend mutation measurement past the soundness core to
codegen/,wasm/,transform.py,parser.py, the CLI, etc. The[tool.mutmut]only_mutatecurrently scopes toverifier.py/smt.py/checker/*/obligations/*. #421'sexecute()decomposition unblocked this — it no longer inflates a ~774 MB mutant file mutmut can't index.Related: #792 (deep-layer soundness residual — smt translate layer + the aggregation cluster +
_has_nat_origin— and the verifier timeout probe).