Skip to content

Mutation testing beyond the soundness core: whole-vera/ sweep + full-sweep reliability #795

Description

@aallan

Follow-up to #387, which set up mutation testing and hardened the type checker and verifier (its stated scope): infrastructure + MUTATION.md + the README badge, the baseline (80.8% core), the checker hardening (#788), and the verifier + smt hardening (#793) — core 80.8% → 83.3%. Two pieces remain to take mutation coverage beyond the soundness core; they're tracked here so #387 can close on its scope.

1. Full-sweep reliability (prerequisite)

The full 10,620-mutant soundness-core sweep deadlocks on this setup — a mutmut 3.6 / Python 3.14 long-run worker hang. It failed four separate times (at 30–55% progress, including overnight and while actively watched), whereas targeted runs over a handful of functions complete reliably in minutes. This blocks any large sweep — including the whole-vera/ measurement below and the #387 timeout probe (now folded into #792). Options to investigate:

  • per-module / chunked sweeps — run only_mutate one module at a time and stitch the per-module CSVs, instead of one marathon;
  • pin the sweep environment — the hang appeared on Python 3.14; try an older interpreter or a different mutmut point release for sweep runs;
  • a progress watchdog that restarts a stalled run (note: "resume after a hard kill" also proved unreliable here — a plain mutmut run re-invoke discarded the banked verdicts; see MUTATION.md).

2. Whole-vera/ sweep

Once large sweeps are reliable, extend mutation measurement past the soundness core to codegen/, wasm/, transform.py, parser.py, the CLI, etc. The [tool.mutmut] only_mutate currently scopes to verifier.py / smt.py / checker/* / obligations/*. #421's execute() decomposition unblocked this — it no longer inflates a ~774 MB mutant file mutmut can't index.

Related: #792 (deep-layer soundness residual — smt translate layer + the aggregation cluster + _has_nat_origin — and the verifier timeout probe).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttestingTest suite structure and coverage

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions