From ffb4f93ff2964e97c1d59cbaf39d431d6ec19abf Mon Sep 17 00:00:00 2001 From: T Date: Sat, 27 Jun 2026 00:19:26 +0100 Subject: [PATCH 1/5] fix(798): trap Int/Nat arithmetic overflow + Tier-1 obligation (v0.0.181) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit vera verify modelled Int/Nat as Z3 unbounded integers, so +, -, and * were total — it proved postconditions the i64/u64 runtime violated under two's-complement wraparound (ensures(result > x) for { x + 1 } verified at Tier 1, yet inc(i64.MAX) wraps to i64.MIN). Overflow was the one partial-arithmetic case treated as total by both tiers; Nat underflow and signed-division MIN/-1 already trap. Int/Nat +, -, and * are now partial operations that trap on overflow at runtime, with an auto-synthesised int_overflow obligation that the result stays in i64/u64 range — a two-check mirroring array bounds (#680): provably in range -> Tier 1; provably out of range -> compile error E528; dynamic operands -> runtime-guarded Tier 3 (lenient, so unguarded a + b is not a hard error). Overflow is classified at the operands' common coerced type (Int if either operand is Int, since Nat <: Int) — not one operand's self-type, nor the possibly-narrowed result. Nat subtraction stays the existing underflow obligation (E502). Codegen emits explicit WASM overflow-detection guards (WASM wraps; there is no native trap), trapping via unreachable; the precise overflow trap kind is deferred to #808. The checker's resolved-type side-table is now threaded into codegen so vera compile / vera run classify in lockstep with the verifier. Closes #798. Co-Authored-By: Claude --- AGENTS.md | 6 +- CHANGELOG.md | 9 +- CLAUDE.md | 6 +- FAQ.md | 4 +- HISTORY.md | 3 +- README.md | 2 +- ROADMAP.md | 3 +- SKILL.md | 2 +- TESTING.md | 25 +- docs/SKILL.md | 2 +- docs/index.html | 4 +- docs/index.md | 4 +- docs/llms-full.txt | 15 +- docs/llms.txt | 4 +- pyproject.toml | 2 +- scripts/check_spec_examples.py | 2 +- spec/06-contracts.md | 3 + spec/11-compilation.md | 2 + tests/conformance/ch04_int_overflow.vera | 53 +++ tests/conformance/manifest.json | 15 + tests/test_codegen.py | 54 +-- tests/test_int_overflow.py | 83 +++++ tests/test_int_overflow_codegen.py | 432 +++++++++++++++++++++++ tests/test_int_overflow_differential.py | 336 ++++++++++++++++++ tests/test_verifier.py | 65 +++- uv.lock | 2 +- vera/__init__.py | 2 +- vera/cli.py | 20 +- vera/codegen/api.py | 16 + vera/codegen/closures.py | 3 + vera/codegen/core.py | 13 + vera/codegen/functions.py | 3 + vera/errors.py | 1 + vera/obligations/core.py | 6 + vera/verifier.py | 185 ++++++++++ vera/wasm/context.py | 19 + vera/wasm/operators.py | 380 ++++++++++++++++++++ 37 files changed, 1701 insertions(+), 85 deletions(-) create mode 100644 tests/conformance/ch04_int_overflow.vera create mode 100644 tests/test_int_overflow.py create mode 100644 tests/test_int_overflow_codegen.py create mode 100644 tests/test_int_overflow_differential.py diff --git a/AGENTS.md b/AGENTS.md index aec03119..41de415b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -8,7 +8,7 @@ Read `SKILL.md` for the full language reference. It covers syntax, slot referenc ### Conformance programs as reference -The conformance suite in `tests/conformance/` contains 92 small programs — often one per language feature — that serve as minimal working examples. Most are self-contained; the exception is `ch07_cross_module_contracts.vera`, which imports its `ch07_cross_module_contracts_lib.vera` companion to exercise cross-module contracts. Each program must pass its declared verification level (see `manifest.json` for mappings: `parse`, `check`, `verify`, or `run`). When you need to see how a specific construct works (e.g. effect handlers, match expressions, closures), check the corresponding conformance program before reading the spec. +The conformance suite in `tests/conformance/` contains 93 small programs — often one per language feature — that serve as minimal working examples. Most are self-contained; the exception is `ch07_cross_module_contracts.vera`, which imports its `ch07_cross_module_contracts_lib.vera` companion to exercise cross-module contracts. Each program must pass its declared verification level (see `manifest.json` for mappings: `parse`, `check`, `verify`, or `run`). When you need to see how a specific construct works (e.g. effect handlers, match expressions, closures), check the corresponding conformance program before reading the spec. ### Workflow @@ -182,7 +182,7 @@ Each stage is a module with a single public API function (`parse_file`, `transfo pytest tests/ -v # Run all tests (see TESTING.md) pytest tests/test_conformance.py -v # Conformance suite only mypy vera/ # Type-check the compiler -python scripts/check_conformance.py # All 92 conformance programs must pass +python scripts/check_conformance.py # All 93 conformance programs must pass python scripts/check_examples.py # All 35 examples must pass ``` @@ -192,7 +192,7 @@ When implementing a new language feature, write the conformance program *first* ### Invariants -- All 92 conformance programs in `tests/conformance/` must pass their declared level +- All 93 conformance programs in `tests/conformance/` must pass their declared level - All 35 examples in `examples/` must pass `vera check` and `vera verify` - `mypy vera/` must be clean - `pytest tests/ -v` must pass diff --git a/CHANGELOG.md b/CHANGELOG.md index 076f1db7..ca4927a2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ## [Unreleased] +## [0.0.181] - 2026-06-27 + +### Fixed + +- **Verifier + codegen soundness: `@Int` / `@Nat` arithmetic overflow now traps** (batch 4 of the [#392](https://github.com/aallan/vera/issues/392) `smt.py` soundness audit). The verifier modelled `@Int` / `@Nat` as Z3's *unbounded* integers, so `+` / `-` / `*` were total — it proved postconditions the i64 / u64 runtime then violated under two's-complement wraparound: `ensures(@Int.result > @Int.0)` for `{ @Int.0 + 1 }` verified at Tier 1, yet `inc(i64.MAX)` wraps to `i64.MIN`. `@Int` / `@Nat` `+` / `-` / `*` are now *partial* operations that **trap** on overflow at runtime (consistent with `@Nat` underflow and signed-division `MIN / -1`), and the verifier auto-synthesises an `int_overflow` obligation that the result stays in range — a two-check mirroring array bounds: provably in range → Tier 1; provably out of range (e.g. a literal `i64.MAX + 1`) → a compile error (**E528**); dynamic operands → a runtime-guarded Tier-3 overflow trap. Overflow is classified at the operands' **common (coerced) type** (`@Int` if either operand is `@Int`), so a literal-left `5 + @Int.0` and an `@Int + 1` narrowed into a `@Nat` slot are both i64 adds — not the literal's `@Nat` self-type nor the narrowed result. `@Nat` subtraction stays the existing underflow obligation (E502). The runtime overflow trap is currently the generic `unreachable` kind; a precise `overflow` kind is tracked as [#808](https://github.com/aallan/vera/issues/808). ([#798](https://github.com/aallan/vera/issues/798)) + ## [0.0.180] - 2026-06-26 ### Fixed @@ -2598,7 +2604,8 @@ Small docs sweep — closes six aging documentation issues in one PR. No code c - Grammar: handler body simplified to avoid LALR reduce/reduce conflict - `pyproject.toml`: corrected build backend, package discovery, PEP 639 compliance -[Unreleased]: https://github.com/aallan/vera/compare/v0.0.180...HEAD +[Unreleased]: https://github.com/aallan/vera/compare/v0.0.181...HEAD +[0.0.181]: https://github.com/aallan/vera/compare/v0.0.180...v0.0.181 [0.0.180]: https://github.com/aallan/vera/compare/v0.0.179...v0.0.180 [0.0.179]: https://github.com/aallan/vera/compare/v0.0.178...v0.0.179 [0.0.178]: https://github.com/aallan/vera/compare/v0.0.177...v0.0.178 diff --git a/CLAUDE.md b/CLAUDE.md index 5dfeca1b..5200487a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -53,7 +53,7 @@ VERA_JS_COVERAGE=1 pytest tests/test_browser.py -v # Browser tests with JS cove VERA_EAGER_GC=1 vera run file.vera # Force GC on every alloc (see ENVIRONMENT.md, debug knob for #593-class GC-rooting bugs) mypy vera/ # Type-check the compiler itself -python scripts/check_conformance.py # Verify all 92 conformance programs pass their declared level +python scripts/check_conformance.py # Verify all 93 conformance programs pass their declared level python scripts/check_examples.py # Verify all 35 examples parse + check + verify python scripts/check_examples_readme.py # Verify vera run commands in examples/README.md python scripts/check_spec_examples.py # Verify spec code blocks parse @@ -82,7 +82,7 @@ See [`TOOLCHAIN.md`](TOOLCHAIN.md) for the CLI cookbook — driving the toolchai - `vera/` — Reference compiler: grammar, parser, AST, transformer, type checker, verifier, codegen, CLI - `examples/` — 35 example Vera programs (all must pass `vera check` and `vera verify`) - `tests/` — Test suite (unit tests + conformance suite) -- `tests/conformance/` — 92 conformance programs validating every language feature against the spec +- `tests/conformance/` — 93 conformance programs validating every language feature against the spec - `scripts/` — CI and validation scripts ## Writing Vera code @@ -119,7 +119,7 @@ Before changing code — **adding or removing** — write the test that proves y ## What not to break - Pre-commit hooks run mypy + pytest + conformance suite + example validation on every commit -- All 92 conformance programs in `tests/conformance/` must pass their declared level +- All 93 conformance programs in `tests/conformance/` must pass their declared level - All 35 examples in `examples/` must pass `vera check` and `vera verify` - Version must stay in sync across `vera/__init__.py`, `pyproject.toml`, and `CHANGELOG.md` - All tests must pass: `pytest tests/ -v` diff --git a/FAQ.md b/FAQ.md index 4699acc4..be3710dd 100644 --- a/FAQ.md +++ b/FAQ.md @@ -173,7 +173,7 @@ None of this is Vera-specific, but it validates the design choices. The thesis i This is a real concern. LLMs are trained on trillions of tokens of Python, TypeScript, and JavaScript. A MojoBench study (NAACL 2025) found that even fine-tuned models achieved only 30–35% improvement over base models on Mojo code generation, illustrating the cold-start problem for new languages. -Vera's approach has three parts. First, the agent-facing documentation (SKILL.md) is designed to be dropped into a model's context window, so the model works from the language specification rather than training data recall. Second, Vera's syntax is deliberately simple and regular — fewer constructs, each with exactly one canonical form — which reduces the surface area a model needs to learn. Third, the conformance test suite (92 programs covering every language feature) gives models concrete examples to learn from and conform to. Simon Willison argued in December 2025 that a language-agnostic conformance suite is the single most important tool for LLM adoption of a new language — LLMs can learn new languages remarkably well when given tests to conform to. +Vera's approach has three parts. First, the agent-facing documentation (SKILL.md) is designed to be dropped into a model's context window, so the model works from the language specification rather than training data recall. Second, Vera's syntax is deliberately simple and regular — fewer constructs, each with exactly one canonical form — which reduces the surface area a model needs to learn. Third, the conformance test suite (93 programs covering every language feature) gives models concrete examples to learn from and conform to. Simon Willison argued in December 2025 that a language-agnostic conformance suite is the single most important tool for LLM adoption of a new language — LLMs can learn new languages remarkably well when given tests to conform to. ## How does Vera compare to Dafny / Lean / Koka / F*? @@ -214,7 +214,7 @@ The reference compiler is under active development. The current release includes - A seven-stage pipeline: parse, transform, resolve, typecheck, verify, compile, execute - A 13-chapter formal specification -- Over 3,000 unit tests plus a 92-program conformance suite +- Over 3,000 unit tests plus a 93-program conformance suite - 35 working example programs - 164 built-in functions covering strings, arrays, math, parsing, and data types - Four built-in abilities (Eq, Ord, Hash, Show) with constrained generics and ADT auto-derivation diff --git a/HISTORY.md b/HISTORY.md index a02de1d4..ffe60430 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -391,6 +391,7 @@ Proof obligations became first-class records with a warm Z3 session, and the LSP | v0.0.178 | 25 Jun | **`vera builtins/effects/errors --json` compiler introspection subcommands make the compiler the source of truth for its own built-ins, effects, and error codes, each with a best-effort `since` version** ([#539](https://github.com/aallan/vera/issues/539)). | | v0.0.179 | 26 Jun | **The `smt.py` soundness-audit batch lands: signed `/` and `%` truncate toward zero, a body `assert(P)` carries a Tier-1 obligation (loud `E507`), contract-position divisions get a `div_zero` obligation, and a prior `assert`/`assume` now discharges later obligations plus the postcondition at Tier 1** ([#392](https://github.com/aallan/vera/issues/392)). | | v0.0.180 | 26 Jun | **`@Float64` contracts are verified against Z3's IEEE-754 FloatingPoint sort instead of mathematical reals, so Tier-1 proofs respect `NaN`/`Inf`/signed-zero/rounding and `%` (C `fmod`) and match the runtime, so unsound relational / reflexive Float64 contracts are now correctly rejected** ([#797](https://github.com/aallan/vera/issues/797)). | +| v0.0.181 | 27 Jun | **`@Int` / `@Nat` arithmetic overflow now traps: `+` / `-` / `*` are partial operations carrying an auto-synthesised `int_overflow` obligation, so a result that provably leaves i64 / u64 range is a compile error (E528) and a dynamic one is a runtime-guarded trap, instead of the verifier proving postconditions the two's-complement runtime then violated** ([#798](https://github.com/aallan/vera/issues/798)). | --- @@ -406,4 +407,4 @@ Proof obligations became first-class records with a warm Z3 session, and the LSP | Spec chapters | 7 | 10 | 11 | 12 | 13 | 13 | 13 | 13 | | Code coverage | — | — | — | 90% | 91% | 96% | 96% | 95% | -Total: **1,400+ commits, 180 tagged releases, 77 active development days.** +Total: **1,400+ commits, 181 tagged releases, 77 active development days.** diff --git a/README.md b/README.md index 7c47c8bd..4190469a 100644 --- a/README.md +++ b/README.md @@ -215,7 +215,7 @@ cp /path/to/vera/SKILL.md ~/.claude/skills/vera-language/SKILL.md ## Project status -Vera is in **active development** at v0.0.180 — 1,400+ commits, 180 releases, 4,950 tests, 91% code coverage, 92 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. +Vera is in **active development** at v0.0.181 — 1,400+ commits, 181 releases, 5,145 tests, 91% code coverage, 93 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. The reference compiler — parser, AST, type checker, contract verifier (Z3), WASM code generator, module system, browser runtime, and runtime contract insertion — is working. The language specification is in draft across [13 chapters](spec/). diff --git a/ROADMAP.md b/ROADMAP.md index 64b8a3a4..1392d52d 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,7 +8,7 @@ Priority lives in this file and nowhere else — issues carry kind and area labe ## Where we are -4,950 tests, 92 conformance programs, 35 examples, 13 spec chapters. +5,145 tests, 93 conformance programs, 35 examples, 13 spec chapters. ## The roadmap @@ -22,6 +22,7 @@ The infrastructure that catches the next regression before a user does, plus the |---|---| | [#392](https://github.com/aallan/vera/issues/392) | Audit the `smt.py` Z3 translation layer for soundness — a bug here silently bypasses verification. | | [#807](https://github.com/aallan/vera/issues/807) | Tier-1 model the deferred-but-modelable `@Float64` builtins from #797. **Blocked by #798**: `int_to_float`/`float_to_int` cross the Int↔Float boundary that the unbounded-`Int` idealisation (#798) makes unsound, so they wait on that fix; `float_clamp` is unblocked but needs a verify-vs-run codegen differential first. The four format/parse builtins (`float_to_string`, `parse_float64`, `decimal_from_float`/`_to_float`) are Tier-3 by necessity, not deferred. | +| [#808](https://github.com/aallan/vera/issues/808) | Give the `@Int` / `@Nat` arithmetic-overflow runtime trap (#798) a precise `overflow` trap kind, so a dynamic overflow surfaces as a dedicated trap message rather than the generic `unreachable` kind it currently shares. | | [#592](https://github.com/aallan/vera/issues/592) | End-to-end behavioural tests for the five UTF-8 decode sites currently pinned only by structural greps. | | [#645](https://github.com/aallan/vera/issues/645) | Explicit `encoding='utf-8'` at every text-mode file call, with a pre-commit check to hold the line. | | [#657](https://github.com/aallan/vera/issues/657) | Convert `INVARIANT_DEFENSIVE` sites and audit `PROPAGATE` cleanup (follow-up to the #626 error-handling taxonomy). | diff --git a/SKILL.md b/SKILL.md index 84653487..3cfa5efc 100644 --- a/SKILL.md +++ b/SKILL.md @@ -134,7 +134,7 @@ Every diagnostic has a stable error code grouped by compiler phase: - **E120–E176** — Type check: core + expressions (type mismatches, slot resolution, operators) - **E200–E233** — Type check: calls (unresolved functions, argument mismatches, module calls) - **E300–E335** — Type check: control flow (if/match, patterns, effect handlers) -- **E500–E525** — Verification (contract violations, undecidable fallbacks) +- **E500–E528** — Verification (contract violations, undecidable fallbacks, primitive-operation safety incl. arithmetic overflow) - **E600–E614** — Codegen (unsupported features, typed holes block compilation) - **E700–E702** — Testing (contract violations, input generation, execution errors) diff --git a/TESTING.md b/TESTING.md index f23382ce..11712fd7 100644 --- a/TESTING.md +++ b/TESTING.md @@ -6,9 +6,9 @@ This is the single source of truth for Vera's testing infrastructure, coverage d | Metric | Value | |--------|-------| -| **Tests** | 4,950 across 41 files (~62,000 lines of test code; 4,918 passed + 16 stress, 16 skipped) | +| **Tests** | 5,145 across 44 files (~63,000 lines of test code; 5,113 passed + 16 stress, 16 skipped) | | **Compiler code coverage** | 95% Python, 61% JavaScript — 91% combined (CI minimum: 80%) | -| **Conformance programs** | 92 programs across 9 spec chapters, validating every language feature | +| **Conformance programs** | 93 programs across 9 spec chapters, validating every language feature | | **Example programs** | 35, all validated through `vera check` + `vera verify` | | **Spec code blocks** | 164 parseable blocks from 13 spec chapters: 86 parse, 72 type-check, 71 verify | | **README code blocks** | 13 Vera blocks (12 validated, 1 allowlisted future syntax) | @@ -39,7 +39,7 @@ VERA_EAGER_GC=1 pytest tests/test_codegen_closures.py::TestClosureReturnShadowPu mypy vera/ # strict mode # Validation scripts -python scripts/check_conformance.py # conformance suite (92 programs, see manifest.json) +python scripts/check_conformance.py # conformance suite (93 programs, see manifest.json) python scripts/check_examples.py # 35 example programs python scripts/check_spec_examples.py # spec code blocks python scripts/check_readme_examples.py # README code blocks @@ -58,11 +58,14 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_parser.py` | 129 | 968 | Grammar rules, operator precedence, parse errors | | `test_ast.py` | 128 | 1,122 | AST transformation, node structure, serialisation, string escape sequences, ability declarations | | `test_checker.py` | 548 | 6,347 | Type synthesis, slot resolution, effects, effect subtyping, contracts, exhaustiveness, cross-module typing, visibility, error codes, string built-ins, generic rejection, IO operation types, Markdown types, Regex types, abilities, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, removed legacy name regression | -| `test_obligations.py` | 282 | 1083 | Reified proof obligations + warm `VerificationSession` (#222 Phase A): full-corpus differential oracle (warm session == cold `verify()` on diagnostics, summary, and obligation stream, plus warm-twice determinism, across all 35 examples and every verify/run-level conformance program), summary↔obligation tier-bookkeeping consistency, per-kind unit tests (requires / ensures / decreases / nat_sub / call_pre statuses, counterexamples, error codes), content-key stability + same-text-two-sites span disambiguation, session solver reuse, type-error short-circuit, ADT-registry resync between programs; plus the Phase B incremental suite — identical-source full replay, callee-body-edit replays callers while callee-contract-edit invalidates them, span-shift and ADT-edit conservative invalidation, cross-program isolation, timeout-status never cached (monkeypatched solver), FIFO eviction bound; plus the #727 dedup pin — a violating call in a let RHS records exactly one E501 diagnostic and one call_pre obligation | -| `test_verifier.py` | 473 | 9,239 | Z3 verification, counterexamples, tier classification, call-site preconditions, branch-aware preconditions, pipe operator, cross-module contracts, match/ADT verification, decreases verification, mutual recursion, refined Bool/String/Float64 param sorts, **@Nat subtraction underflow obligation** (#520 — Path-A obligation discharge via requires/path-conditions/path-aware Z3 refutation, pure-literal exclusion, Int-Int and Nat-Int → Int exemptions), **@Nat binding-site narrowing obligation** (#552 — Tier-1 `value >= 0` at let/call-arg/effect-op-arg/ctor-field/match-bind/literal-destructure narrowing, #520 double-emit disjointness, E503 counterexample-witness pin; #747 — generic-instantiation, ADT sub-pattern, non-literal tuple-destructure, and cross-module imported-ctor narrowing, with concrete-site Tier-3 narrowings (incl. the if-expr-source SMT gap, recorded per-component) classified as codegen-guarded `tier3_runtime`, generic function-formal call-args guarded on the monomorphised callee (keyed on `call_target`, not the erased generic decl), and the two genuinely-unguarded residuals — effect-op-arg narrowings (no codegen guard yet, #754) and generic-instantiated constructor-field narrowings (constructors carry no per-field `@Nat` mono metadata) — surfacing the `E504` unguarded warning; #749 — IndexExpr/InterpolatedString walker-recursion pins, `_fresh_slot_var` nat-alias unit test, `_narrows_into_nat` verifier/codegen soundness parity), **refinement-predicate verification** (#746 — Tier-1 discharge at let/call-arg/ctor-field/effect-op-arg/match-bind/tuple-destructure narrowings and refined return positions, E505 violation with counterexample, E506 Tier-3 for untranslatable/non-primitive-base predicates, the R3 already-refined-source exemption via predicate-AST equality, the R5 violating-return pin, the refinement-over-`@Nat` `>= 0 && P` conjoin, and the **refined-ADT-sub-pattern arm-fact carry** — a `Some(@PosInt)` bind on `Option` carries the field's source-type predicate into the arm body so a downstream `@Nat` narrowing discharges instead of a false E503, and reaches call **preconditions** in the arm body via the SMT match-translation fact hook so `Some(@PosInt) -> needs_positive(@PosInt.0)` discharges instead of a false E501, while a genuine `Option` narrowing stays obligated (E505/E501), never silently assumed; an **alias-base refined return** (`{ @Age \| @Age.0 >= 18 }`) is assumed by the caller via the predicate binder name; and a **refined return from a match arm** (`Some(@PosInt) -> @PosInt.0`) discharges via a global `arm-matched => source-fact` SMT implication on both the normal and generic-fast paths, a violating `Option` payload still E505), **per-monomorphization generic verification** (#732 — an instantiated `forall` body is verified per concrete instantiation: a generic-body `@Nat` underflow is caught (naming the instantiation, with a counterexample) or discharged when guarded, the never-instantiated residual stays Tier-3 `E520`, collapsed-type-var (`A=B=Int`) De Bruijn reindex soundness, one-diagnostic dedup across multiple instantiations, and discovery of a generic reached only through a `decreases(...)` measure, plus a mixed-`tier3`/`timeout` aggregate-label completeness pin and a recursive-generic-clone source-name pin so `decreases` resolves on the monomorphised instance), **primitive-operation safety obligations** (#680 — division/modulo by-zero `E526` and array-index-bounds `E527`, the in-bounds/out-of-bounds two-check with float-exemption, honest Tier-3 for opaque lengths, the off-by-one and `i >= 0` lower-bound pins, walker recursion through array-literal / assert / interpolated-string / let-destructure positions, the opaque-shadow Tier-3 routing that stops an untranslatable `let`/destructure from silently discharging a stale same-type-outer divisor, De Bruijn-correct `E526`/`E527` fix hints rendered from the actual operands, Tier-1 projection of literal-constructor destructure components (`let Tuple<@Int, @Int> = Tuple(10, 6); _ / @Int.1` discharges `10 != 0` rather than shadowing to Tier-3, while a non-literal source stays a tracked Tier-3 shadow), a tracked placeholder pushed for *every* destructured component so same-type De Bruijn positions don't collapse (`Tuple(10, )` keeps `@Int.0` on the opaque second component, never silently discharging against the literal `10`), `_contains_opaque_shadow` recursion so a *compound* divisor / subtraction operand embedding a shadow (`shadow + 1`) falls to Tier-3 instead of a spurious `E526`/`E502` counterexample, and tracked shadowing of match-arm pattern slots bound over an untranslatable (effect-op) scrutinee so a primitive op in the arm falls to Tier-3 rather than silently discharging against a stale same-name outer (`match Source.next(()) { Some(@Int) -> 1 / @Int.0 }`); plus a **multi-agent shadow/projection audit battery** (`TestShadowAuditDivision680` / `TestShadowAuditSubtraction680` / `TestShadowAuditIndex680` / `TestDestructureDeBruijnAlignment680` / `TestShadowAuditInteractions680`) — 57 differential tests pinning the invariant trichotomy (safe → verified, opaque/shadow → Tier-3, provably-unsafe → loud) across compound shadows, modulo parity, multi-component De Bruijn alignment with value-distinct siblings, intervening-type namespaces, stacked / nested destructures, opaque match scrutinees, nested blocks, and intra-block scoping; **mutation-validated** — every shadow/projection test was confirmed to flip RED when its target machinery (`_is_opaque_shadow` / `_contains_opaque_shadow` / the De Bruijn placeholder push / the `let`-shadow / the match-arm `_fresh_pattern_env` / literal projection) is deliberately broken, so none is green-for-the-wrong-reason) | +| `test_obligations.py` | 284 | 1083 | Reified proof obligations + warm `VerificationSession` (#222 Phase A): full-corpus differential oracle (warm session == cold `verify()` on diagnostics, summary, and obligation stream, plus warm-twice determinism, across all 35 examples and every verify/run-level conformance program), summary↔obligation tier-bookkeeping consistency, per-kind unit tests (requires / ensures / decreases / nat_sub / call_pre statuses, counterexamples, error codes), content-key stability + same-text-two-sites span disambiguation, session solver reuse, type-error short-circuit, ADT-registry resync between programs; plus the Phase B incremental suite — identical-source full replay, callee-body-edit replays callers while callee-contract-edit invalidates them, span-shift and ADT-edit conservative invalidation, cross-program isolation, timeout-status never cached (monkeypatched solver), FIFO eviction bound; plus the #727 dedup pin — a violating call in a let RHS records exactly one E501 diagnostic and one call_pre obligation | +| `test_verifier.py` | 473 | 9,270 | Z3 verification, counterexamples, tier classification, call-site preconditions, branch-aware preconditions, pipe operator, cross-module contracts, match/ADT verification, decreases verification, mutual recursion, refined Bool/String/Float64 param sorts, **@Nat subtraction underflow obligation** (#520 — Path-A obligation discharge via requires/path-conditions/path-aware Z3 refutation, pure-literal exclusion, Int-Int and Nat-Int → Int exemptions), **@Nat binding-site narrowing obligation** (#552 — Tier-1 `value >= 0` at let/call-arg/effect-op-arg/ctor-field/match-bind/literal-destructure narrowing, #520 double-emit disjointness, E503 counterexample-witness pin; #747 — generic-instantiation, ADT sub-pattern, non-literal tuple-destructure, and cross-module imported-ctor narrowing, with concrete-site Tier-3 narrowings (incl. the if-expr-source SMT gap, recorded per-component) classified as codegen-guarded `tier3_runtime`, generic function-formal call-args guarded on the monomorphised callee (keyed on `call_target`, not the erased generic decl), and the two genuinely-unguarded residuals — effect-op-arg narrowings (no codegen guard yet, #754) and generic-instantiated constructor-field narrowings (constructors carry no per-field `@Nat` mono metadata) — surfacing the `E504` unguarded warning; #749 — IndexExpr/InterpolatedString walker-recursion pins, `_fresh_slot_var` nat-alias unit test, `_narrows_into_nat` verifier/codegen soundness parity), **refinement-predicate verification** (#746 — Tier-1 discharge at let/call-arg/ctor-field/effect-op-arg/match-bind/tuple-destructure narrowings and refined return positions, E505 violation with counterexample, E506 Tier-3 for untranslatable/non-primitive-base predicates, the R3 already-refined-source exemption via predicate-AST equality, the R5 violating-return pin, the refinement-over-`@Nat` `>= 0 && P` conjoin, and the **refined-ADT-sub-pattern arm-fact carry** — a `Some(@PosInt)` bind on `Option` carries the field's source-type predicate into the arm body so a downstream `@Nat` narrowing discharges instead of a false E503, and reaches call **preconditions** in the arm body via the SMT match-translation fact hook so `Some(@PosInt) -> needs_positive(@PosInt.0)` discharges instead of a false E501, while a genuine `Option` narrowing stays obligated (E505/E501), never silently assumed; an **alias-base refined return** (`{ @Age \| @Age.0 >= 18 }`) is assumed by the caller via the predicate binder name; and a **refined return from a match arm** (`Some(@PosInt) -> @PosInt.0`) discharges via a global `arm-matched => source-fact` SMT implication on both the normal and generic-fast paths, a violating `Option` payload still E505), **per-monomorphization generic verification** (#732 — an instantiated `forall` body is verified per concrete instantiation: a generic-body `@Nat` underflow is caught (naming the instantiation, with a counterexample) or discharged when guarded, the never-instantiated residual stays Tier-3 `E520`, collapsed-type-var (`A=B=Int`) De Bruijn reindex soundness, one-diagnostic dedup across multiple instantiations, and discovery of a generic reached only through a `decreases(...)` measure, plus a mixed-`tier3`/`timeout` aggregate-label completeness pin and a recursive-generic-clone source-name pin so `decreases` resolves on the monomorphised instance), **primitive-operation safety obligations** (#680 — division/modulo by-zero `E526` and array-index-bounds `E527`, the in-bounds/out-of-bounds two-check with float-exemption, honest Tier-3 for opaque lengths, the off-by-one and `i >= 0` lower-bound pins, walker recursion through array-literal / assert / interpolated-string / let-destructure positions, the opaque-shadow Tier-3 routing that stops an untranslatable `let`/destructure from silently discharging a stale same-type-outer divisor, De Bruijn-correct `E526`/`E527` fix hints rendered from the actual operands, Tier-1 projection of literal-constructor destructure components (`let Tuple<@Int, @Int> = Tuple(10, 6); _ / @Int.1` discharges `10 != 0` rather than shadowing to Tier-3, while a non-literal source stays a tracked Tier-3 shadow), a tracked placeholder pushed for *every* destructured component so same-type De Bruijn positions don't collapse (`Tuple(10, )` keeps `@Int.0` on the opaque second component, never silently discharging against the literal `10`), `_contains_opaque_shadow` recursion so a *compound* divisor / subtraction operand embedding a shadow (`shadow + 1`) falls to Tier-3 instead of a spurious `E526`/`E502` counterexample, and tracked shadowing of match-arm pattern slots bound over an untranslatable (effect-op) scrutinee so a primitive op in the arm falls to Tier-3 rather than silently discharging against a stale same-name outer (`match Source.next(()) { Some(@Int) -> 1 / @Int.0 }`); plus a **multi-agent shadow/projection audit battery** (`TestShadowAuditDivision680` / `TestShadowAuditSubtraction680` / `TestShadowAuditIndex680` / `TestDestructureDeBruijnAlignment680` / `TestShadowAuditInteractions680`) — 57 differential tests pinning the invariant trichotomy (safe → verified, opaque/shadow → Tier-3, provably-unsafe → loud) across compound shadows, modulo parity, multi-component De Bruijn alignment with value-distinct siblings, intervening-type namespaces, stacked / nested destructures, opaque match scrutinees, nested blocks, and intra-block scoping; **mutation-validated** — every shadow/projection test was confirmed to flip RED when its target machinery (`_is_opaque_shadow` / `_contains_opaque_shadow` / the De Bruijn placeholder push / the `let`-shadow / the match-arm `_fresh_pattern_env` / literal projection) is deliberately broken, so none is green-for-the-wrong-reason) | | `test_soundness_392.py` | 36 | 584 | #392 audit batches 1–2 — verifier soundness/completeness fixes: signed div/mod truncate toward zero (#799), body `assert(P)` carries a Tier-1 obligation (#800), divisions in contract predicates carry a `div_zero` obligation (#801), and the #804 assume-half of #800's `assert` rule — a prior `assert`/`assume` discharges later obligations (including a later call's precondition) + the postcondition at Tier 1, removing false E501/E503/E500/E505. Test-first: each fails on the pre-fix verifier | +| `test_int_overflow.py` | 3 | 83 | #798 — `@Int`/`@Nat` arithmetic-overflow obligations (part of the #392 `smt.py` soundness audit): `+`/`-`/`*` on `@Int`/`@Nat` now emit an `int_overflow` obligation (the analog of `nat_sub`/`div_zero`) rather than modelling the operands as Z3's unbounded integers, so a `ensures(@Int.result > @Int.0)` over `@Int.0 + 1` no longer proves a contract the i64/u64 runtime violates under two's-complement wraparound. Unbounded operands leave the obligation undischarged (Tier-3, runtime-guarded); operand bounds that prove the result stays in range discharge it at Tier 1. Test-first: each fails on the pre-fix verifier (no `int_overflow` obligation emitted) | +| `test_int_overflow_codegen.py` | 44 | 432 | #798 Stage 3 — runtime overflow-trap codegen: the codegen emits a guard at *exactly* the `@Int`/`@Nat` `+`/`-`/`*` sites the verifier obligates, so `vera run`/`vera compile` programs trap on overflow instead of silently wrapping at the i64/u64 boundary. The trap is a bare `unreachable` (Option A — matches the #520 `nat_sub` and #552 nat-bind precedent), classified `kind="unreachable"` today (a precise `"overflow"` trap kind via a host import is a follow-up). Test-first: every `*_traps` test fails on the pre-Stage-3 codegen (the op wraps silently → no trap → `execute` returns a value); every `*_no_trap` test passes before and after (safe arithmetic unchanged) | +| `test_int_overflow_differential.py` | 141 | 336 | #798 Stage 3 verifier↔codegen classification differential (cross-component soundness rule): the codegen overflow guard must fire at exactly the sites the verifier obligates *and* classify each site's operand type (`@Int` i64 vs `@Nat` u64) identically — else a Tier-1-clean program traps spuriously or a wrapping op slips through unguarded. Over a corpus exercising all five operand combos plus the literal-left ambiguity (which the pre-fix codegen mis-classified as `@Nat`), asserts the verifier's per-site gated classification equals the codegen's site for site, both sides driven by the same `ast.span_key` | | `test_monomorphize_differential.py` | 13 | 682 | #732 differential soundness: the verifier's per-monomorphization instantiation discovery covers every instantiation codegen emits (name coverage + per-generic count), over real generic programs (conformance ch02/ch09, `examples/generics.vera`) plus inline cases for the soundness-critical scenarios — collapsed type vars, **prelude combinator emission** (`option_map`), transitive generics, a generic whose type arg is fixed only by a **where-helper's return** (a `Float64`-returning helper, so the unresolved-var `"Bool"` phantom default cannot mask a miss), a generic whose type arg is fixed only by an **imported constructor** (`id2(MkBox(7))` — the verifier's mono-context must include `_module_constructors`, else it phantom-defaults and misses codegen's `id2`), a generic whose type arg is fixed only by an **imported function's return** (`id_g(make_int(...))` — the verifier's mono-context must seed `fn_ret_types` from imported functions, else it phantom-defaults and misses codegen's `id_g`, plus a **private-shadow** case pinning the imported-fn seeding stays unfiltered like codegen since filtering would diverge into a false Tier-1), and a generic reached only through a **contract clause or `where` helper** (codegen must seed Pass 1.5 from the shared node-level walk, not just `decl.body`, or it skips the clone → `CodegenSkip` at run time) — so a missed instantiation (a false Tier-1) is caught. Guards against a vacuous pass when codegen emits nothing, plus a **determinism guard** (`vera compile --wat` is byte-stable across `PYTHONHASHSEED` — the mono worklist sorts its instantiation sets) | -| `test_codegen.py` | 1,213 | 21,093 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\, Exn\ handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked), **@Nat narrowing runtime guards** (#552 let site; #747 extends to tuple-destructure / top-level match-bind / ADT sub-pattern / concrete ctor-field / call-arg sites — `i64.lt_s; unreachable` net, @Int targets exempt), **refinement-predicate runtime guards** (#746 — primitive-base (incl. `@Bool` / `@Float64`) refined params/returns traverse a `$vera.contract_fail` predicate guard at the function boundary: violating value traps, valid passes, call-args guarded transitively, generic refined returns guarded on the monomorphised instance; non-primitive `@Array` refinements (`array_length(...) > 0`) guarded too, the alias-aware `@Nat` base conjoins its implicit `>= 0`, and erased `@Unit` refinements emit no guard; **tuple-component boundary guards** decompose a `Tuple` param/return at the FFI boundary — loading each refined / `@Nat` component from the heap value and guarding it recursively for nested tuples, so a violating component traps instead of laundering past the verifier's component assumption; a **generic tuple alias** (`type Box = Tuple`) substitutes its argument so `Box` still guards its component, and a mutually-recursive (infinite) tuple alias **fails closed with E617** rather than silently emitting partial guards; a **refinement OVER a tuple** (`type Pair = { @Tuple \| true }`) is unwrapped so its components are guarded too, recursively through nested `Tuple`) | +| `test_codegen.py` | 1,213 | 21,107 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\, Exn\ handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked), **@Nat narrowing runtime guards** (#552 let site; #747 extends to tuple-destructure / top-level match-bind / ADT sub-pattern / concrete ctor-field / call-arg sites — `i64.lt_s; unreachable` net, @Int targets exempt), **refinement-predicate runtime guards** (#746 — primitive-base (incl. `@Bool` / `@Float64`) refined params/returns traverse a `$vera.contract_fail` predicate guard at the function boundary: violating value traps, valid passes, call-args guarded transitively, generic refined returns guarded on the monomorphised instance; non-primitive `@Array` refinements (`array_length(...) > 0`) guarded too, the alias-aware `@Nat` base conjoins its implicit `>= 0`, and erased `@Unit` refinements emit no guard; **tuple-component boundary guards** decompose a `Tuple` param/return at the FFI boundary — loading each refined / `@Nat` component from the heap value and guarding it recursively for nested tuples, so a violating component traps instead of laundering past the verifier's component assumption; a **generic tuple alias** (`type Box = Tuple`) substitutes its argument so `Box` still guards its component, and a mutually-recursive (infinite) tuple alias **fails closed with E617** rather than silently emitting partial guards; a **refinement OVER a tuple** (`type Pair = { @Tuple \| true }`) is unwrapped so its components are guarded too, recursively through nested `Tuple`) | | `test_codegen_contracts.py` | 32 | 570 | Runtime pre/postconditions, contract fail messages, old/new state postconditions | | `test_codegen_monomorphize.py` | 71 | 1,320 | Generic instantiation, type inference, monomorphization edge cases, ability constraint satisfaction (Eq/Ord/Hash/Show), operation rewriting (eq/compare), show/hash dispatch, ADT auto-derivation, array operations (slice/map/filter/fold) | | `test_codegen_closures.py` | 50 | 1,618 | Closure lifting, captured variables, higher-order functions, iterative-builder shadow-stack regressions (#570), closure return-value shadow-push balance for both i32-pair and i32-ADT branches across array_map and array_mapi, plus VERA_EAGER_GC injection self-test (#593), IndexExpr-of-FnCall element-type inference (#614), non-contiguous capture and walker-order miscompiles (#615) | @@ -86,7 +89,7 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_markdown.py` | 59 | 393 | Markdown parser: block/inline parsing, rendering, round-trips, edge cases | | `test_lsp.py` | 94 | 1211 | LSP transport + coordinate layer (#222 Phase C) and language features (#222 Phase D): parametrized code-point↔UTF-16 goldens incl. astral-plane fixtures and surrogate-pair snapping, Span (1-based, exclusive-end) and SourceLocation (0-based col) → LSP Range conversions, point→token-range widening, DocumentStore open/change/close + index invalidation, an in-process handler-drive test, and one stdio end-to-end round-trip against the real `vera lsp` subprocess (initialize → didOpen → shutdown → exit) pinning serverInfo + textDocumentSync capabilities; plus the Phase D feature suite — parse-error single-diagnostic path, type-error verification short-circuit, tier=3 in E520 diagnostic data, per-function tier Hint synthesis (and its suppression for functions with violated obligations), smallest-enclosing-span hover, De Bruijn slot goto (most-recent-parameter jump, out-of-range None, off-slot None), and typed-hole completion (inside/after hole, away-from-hole None); plus the Phase E speculativeEdit suite — identical-text all-unchanged, breaking edit surfaces newly_undischarged (violated nat_sub) with canonical state untouched, strengthening edit surfaces newly_discharged, parse/type errors report ok:false, deleted functions report removed, proof_delta purity; plus the Phase F1 proposeEdit suite — the apply gate (clean and strengthening edits apply, breaking and non-compiling edits refuse), force overriding both gates with the delta still reported, wiring against a structural fake server (apply round-trip with exact full-document replacement range, refuse touches no canonical state, unopened-URI clamp sentinel), and full-document-range goldens (trailing-newline virtual line, UTF-16 end column); plus the Phase F2 strengthenContract suite — splice goldens (first-clause-only replacement with byte-identical remainder, ensures variant, unknown-fn None), the call-site audit pin (tightened precondition refused with newly_undischarged call_pre items, canonical state untouched), provable-ensures strengthening applies, and the three splice-target refusal paths (no analysis, unparseable document, unknown function); plus the Phase F3 addEffect suite — transitive-caller closure goldens (diamond in declaration order, leaf, unknown-fn None, recursion appears once), effect-row rewrite goldens (pure to singleton set, source-preserving append, already-present None, base-name identity blocking State next to State), diamond propagation applying one multi-site candidate with the bystander untouched, mixed append/replace rows with already-satisfied callers skipped, the fully-satisfied no-op shape, and the two refusal paths; plus the #728 instruction-contract suite — the LSP message carries description, rationale, and the Fix: paragraph (also pinning single E501 emission at the LSP surface), and a bare diagnostic maps to the description alone | | `test_browser.py` | 106 | 2,117 | Browser parity: Python/wasmtime vs Node.js/JS-runtime output equivalence across IO, State, contracts, Markdown, Regex, and all compilable examples | -| `test_conformance.py` | 460 | 102 | Parametrized conformance suite: parse, check, verify, run, format idempotency across 92 programs | +| `test_conformance.py` | 465 | 102 | Parametrized conformance suite: parse, check, verify, run, format idempotency across 93 programs | | `test_prelude.py` | 24 | 422 | Prelude injection: Option/Result/array operation detection, combinator shadowing, type aliases, end-to-end compilation | | `test_readme.py` | 2 | 79 | README code sample parsing | | `test_html.py` | 4 | 187 | HTML landing page code samples: parse, check, verify | @@ -124,7 +127,7 @@ Each conformance program declares the deepest pipeline stage it must pass: | `parse` | Source text is syntactically valid | 0 | | `check` | Parses and type-checks cleanly | 4 | | `verify` | Type-checks and all contracts verified by Z3 | 8 | -| `run` | Compiles to WASM and executes correctly | 80 | +| `run` | Compiles to WASM and executes correctly | 81 | Almost all programs are at the `run` level — they compile and execute, producing correct results. Four programs (`ch07_cross_module_contracts_lib`, `ch09_http`, `ch09_inference`, `ch03_typed_holes`) are at the `check` level. Eight programs (`ch03_slot_let_chains`, `ch03_slot_noncommutative`, `ch04_primitive_obligations`, `ch07_cross_module_contracts`, `ch07_io_read_char`, `ch07_io_sleep`, `ch07_random_effect`, `ch09_math_builtins`) are at the `verify` level, using Z3-provable contracts. @@ -168,7 +171,7 @@ tests/conformance/ ├── ch01_int_literals.vera # Chapter 1: Integer literals ├── ch01_float_literals.vera # Chapter 1: Float64 literals ├── ch01_string_escapes.vera # Chapter 1: String escape sequences -├── ... # 92 programs total, organized by spec chapter +├── ... # 93 programs total, organized by spec chapter ├── ch07_state_handler.vera # Chapter 7: State effect handler ├── ch07_exn_handler.vera # Chapter 7: Exn effect handler ├── ch09_numeric_builtins.vera # Chapter 9: Numeric built-in functions @@ -202,7 +205,7 @@ The manifest is the machine-readable feature inventory — agents can query it t ### Running the conformance suite ```bash -# Via pytest (parametrized — 450 tests) +# Via pytest (parametrized — 465 tests) pytest tests/test_conformance.py -v # Via standalone script (used in CI and pre-commit) @@ -299,7 +302,7 @@ How Vera language features (by spec chapter) map to test files and example progr | Ch 2: Types | Refinement types | test_codegen, test_verifier | ch02_refinement_types | refinement_types, safe_divide | | Ch 2: Types | Generics (`forall`) | test_codegen_monomorphize, test_checker | ch02_generics | generics | | Ch 3: Slots | `@T.n` references, De Bruijn indexing | test_checker, test_codegen | ch03_slot_basic, ch03_slot_indexing, ch03_slot_result | all 35 examples | -| Ch 4: Expressions | Arithmetic, comparison, boolean, unary ops | test_codegen, test_checker | ch04_arithmetic, ch04_comparison, ch04_boolean_ops | factorial, absolute_value | +| Ch 4: Expressions | Arithmetic, comparison, boolean, unary ops | test_codegen, test_checker | ch04_arithmetic, ch04_comparison, ch04_boolean_ops, ch04_int_overflow | factorial, absolute_value | | Ch 4: Expressions | If/else, let, match, pipe operator | test_codegen, test_checker | ch04_if_else, ch04_let_binding, ch04_match_basic, ch04_match_nested, ch04_pipe_operator | pattern_matching | | Ch 4: Expressions | String and array builtins | test_codegen | ch04_string_builtins, ch04_array_ops | string_ops | | Ch 5: Functions | Declarations, recursion, mutual recursion | test_codegen, test_checker | ch05_basic_function, ch05_recursion, ch05_mutual_recursion | factorial, mutual_recursion | diff --git a/docs/SKILL.md b/docs/SKILL.md index 67b723ac..b0422406 100644 --- a/docs/SKILL.md +++ b/docs/SKILL.md @@ -134,7 +134,7 @@ Every diagnostic has a stable error code grouped by compiler phase: - **E120–E176** — Type check: core + expressions (type mismatches, slot resolution, operators) - **E200–E233** — Type check: calls (unresolved functions, argument mismatches, module calls) - **E300–E335** — Type check: control flow (if/match, patterns, effect handlers) -- **E500–E525** — Verification (contract violations, undecidable fallbacks) +- **E500–E528** — Verification (contract violations, undecidable fallbacks, primitive-operation safety incl. arithmetic overflow) - **E600–E614** — Codegen (unsupported features, typed holes block compilation) - **E700–E702** — Testing (contract violations, input generation, execution errors) diff --git a/docs/index.html b/docs/index.html index c5ff5e76..91d5d2e9 100644 --- a/docs/index.html +++ b/docs/index.html @@ -268,7 +268,7 @@

A programming language designed for LLMs to write, not For Agents → SKILL.md

- v0.0.180 + v0.0.181 CI

@@ -631,7 +631,7 @@

This page is also a machine-readable specificati Vera is under active development

- A complete compiler with 164 built-in functions, seven algebraic effects (IO, Http, State, Exceptions, Async, Inference, Random), contract-driven testing via Z3, a language server with agent-facing proof deltas, and a 13-chapter specification. A 92-program conformance suite and 35 worked examples are validated against the spec on every pull request. All of it is developed openly on GitHub and released under the MIT licence. + A complete compiler with 164 built-in functions, seven algebraic effects (IO, Http, State, Exceptions, Async, Inference, Random), contract-driven testing via Z3, a language server with agent-facing proof deltas, and a 13-chapter specification. A 93-program conformance suite and 35 worked examples are validated against the spec on every pull request. All of it is developed openly on GitHub and released under the MIT licence.

diff --git a/docs/index.md b/docs/index.md index 0d1a15cb..9c30c61d 100644 --- a/docs/index.md +++ b/docs/index.md @@ -4,7 +4,7 @@ From the Latin *veritas* — truth. In Vera, verification is a first-class citizen. -**Current version:** [0.0.180](https://github.com/aallan/vera/releases/tag/v0.0.180) · [GitHub](https://github.com/aallan/vera) · [SKILL.md](https://veralang.dev/SKILL.md) (agent language reference) +**Current version:** [0.0.181](https://github.com/aallan/vera/releases/tag/v0.0.181) · [GitHub](https://github.com/aallan/vera) · [SKILL.md](https://veralang.dev/SKILL.md) (agent language reference) ## Why? @@ -230,7 +230,7 @@ For other models: point them at [`SKILL.md`](https://veralang.dev/SKILL.md) via ## Status -Vera is under [active development](https://raw.githubusercontent.com/aallan/vera/main/ROADMAP.md). A complete compiler with 164 built-in functions, seven algebraic effects (IO, Http, State, Exceptions, Async, Inference, Random), contract-driven testing via [Z3](https://www.microsoft.com/en-us/research/project/z3-3/), and a 13-chapter specification. A 92-program conformance suite and 35 worked examples are validated against the spec on every pull request. All of it is developed openly on [GitHub](https://github.com/aallan/vera) and released under the MIT licence. +Vera is under [active development](https://raw.githubusercontent.com/aallan/vera/main/ROADMAP.md). A complete compiler with 164 built-in functions, seven algebraic effects (IO, Http, State, Exceptions, Async, Inference, Random), contract-driven testing via [Z3](https://www.microsoft.com/en-us/research/project/z3-3/), and a 13-chapter specification. A 93-program conformance suite and 35 worked examples are validated against the spec on every pull request. All of it is developed openly on [GitHub](https://github.com/aallan/vera) and released under the MIT licence. ## Links diff --git a/docs/llms-full.txt b/docs/llms-full.txt index 1828076e..8381dbab 100644 --- a/docs/llms-full.txt +++ b/docs/llms-full.txt @@ -2,7 +2,7 @@ > Vera is a statically typed, purely functional programming language designed for large language models to write. It uses typed slot references (@T.n) instead of variable names, requires contracts on every function, and compiles to WebAssembly. -This file contains the core Vera language documentation — language reference, agent instructions, FAQ, error codes, and formal grammar — compiled into a single document. Version 0.0.180. For the full documentation index including the 13-chapter specification and supplementary docs, see llms.txt. +This file contains the core Vera language documentation — language reference, agent instructions, FAQ, error codes, and formal grammar — compiled into a single document. Version 0.0.181. For the full documentation index including the 13-chapter specification and supplementary docs, see llms.txt. ======================================================================== @@ -140,7 +140,7 @@ Every diagnostic has a stable error code grouped by compiler phase: - **E120–E176** — Type check: core + expressions (type mismatches, slot resolution, operators) - **E200–E233** — Type check: calls (unresolved functions, argument mismatches, module calls) - **E300–E335** — Type check: control flow (if/match, patterns, effect handlers) -- **E500–E525** — Verification (contract violations, undecidable fallbacks) +- **E500–E528** — Verification (contract violations, undecidable fallbacks, primitive-operation safety incl. arithmetic overflow) - **E600–E614** — Codegen (unsupported features, typed holes block compilation) - **E700–E702** — Testing (contract violations, input generation, execution errors) @@ -2356,7 +2356,7 @@ Read `SKILL.md` for the full language reference. It covers syntax, slot referenc ### Conformance programs as reference -The conformance suite in `tests/conformance/` contains 92 small programs — often one per language feature — that serve as minimal working examples. Most are self-contained; the exception is `ch07_cross_module_contracts.vera`, which imports its `ch07_cross_module_contracts_lib.vera` companion to exercise cross-module contracts. Each program must pass its declared verification level (see `manifest.json` for mappings: `parse`, `check`, `verify`, or `run`). When you need to see how a specific construct works (e.g. effect handlers, match expressions, closures), check the corresponding conformance program before reading the spec. +The conformance suite in `tests/conformance/` contains 93 small programs — often one per language feature — that serve as minimal working examples. Most are self-contained; the exception is `ch07_cross_module_contracts.vera`, which imports its `ch07_cross_module_contracts_lib.vera` companion to exercise cross-module contracts. Each program must pass its declared verification level (see `manifest.json` for mappings: `parse`, `check`, `verify`, or `run`). When you need to see how a specific construct works (e.g. effect handlers, match expressions, closures), check the corresponding conformance program before reading the spec. ### Workflow @@ -2530,7 +2530,7 @@ Each stage is a module with a single public API function (`parse_file`, `transfo pytest tests/ -v # Run all tests (see TESTING.md) pytest tests/test_conformance.py -v # Conformance suite only mypy vera/ # Type-check the compiler -python scripts/check_conformance.py # All 92 conformance programs must pass +python scripts/check_conformance.py # All 93 conformance programs must pass python scripts/check_examples.py # All 35 examples must pass ``` @@ -2540,7 +2540,7 @@ When implementing a new language feature, write the conformance program *first* ### Invariants -- All 92 conformance programs in `tests/conformance/` must pass their declared level +- All 93 conformance programs in `tests/conformance/` must pass their declared level - All 35 examples in `examples/` must pass `vera check` and `vera verify` - `mypy vera/` must be clean - `pytest tests/ -v` must pass @@ -2964,7 +2964,7 @@ None of this is Vera-specific, but it validates the design choices. The thesis i This is a real concern. LLMs are trained on trillions of tokens of Python, TypeScript, and JavaScript. A MojoBench study (NAACL 2025) found that even fine-tuned models achieved only 30–35% improvement over base models on Mojo code generation, illustrating the cold-start problem for new languages. -Vera's approach has three parts. First, the agent-facing documentation (SKILL.md) is designed to be dropped into a model's context window, so the model works from the language specification rather than training data recall. Second, Vera's syntax is deliberately simple and regular — fewer constructs, each with exactly one canonical form — which reduces the surface area a model needs to learn. Third, the conformance test suite (92 programs covering every language feature) gives models concrete examples to learn from and conform to. Simon Willison argued in December 2025 that a language-agnostic conformance suite is the single most important tool for LLM adoption of a new language — LLMs can learn new languages remarkably well when given tests to conform to. +Vera's approach has three parts. First, the agent-facing documentation (SKILL.md) is designed to be dropped into a model's context window, so the model works from the language specification rather than training data recall. Second, Vera's syntax is deliberately simple and regular — fewer constructs, each with exactly one canonical form — which reduces the surface area a model needs to learn. Third, the conformance test suite (93 programs covering every language feature) gives models concrete examples to learn from and conform to. Simon Willison argued in December 2025 that a language-agnostic conformance suite is the single most important tool for LLM adoption of a new language — LLMs can learn new languages remarkably well when given tests to conform to. ## How does Vera compare to Dafny / Lean / Koka / F*? @@ -3005,7 +3005,7 @@ The reference compiler is under active development. The current release includes - A seven-stage pipeline: parse, transform, resolve, typecheck, verify, compile, execute - A 13-chapter formal specification -- Over 3,000 unit tests plus a 92-program conformance suite +- Over 3,000 unit tests plus a 93-program conformance suite - 35 working example programs - 164 built-in functions covering strings, arrays, math, parsing, and data types - Four built-in abilities (Eq, Ord, Hash, Show) with constrained generics and ADT auto-derivation @@ -3156,6 +3156,7 @@ Every diagnostic has a stable error code. Codes are grouped by compiler phase: - **E525**: Cannot verify termination metric - **E526**: Division or modulo by zero - **E527**: Array index out of bounds +- **E528**: Arithmetic overflow - **E600**: Unsupported parameter type - **E601**: Unsupported return type - **E602**: Unsupported body expression type diff --git a/docs/llms.txt b/docs/llms.txt index 3d53c4f2..bf0d45a7 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -4,7 +4,7 @@ Vera uses De Bruijn indexing for bindings: `@Int.0` is the most recent `Int` binding, `@Int.1` the one before. There are no variable names. Contracts are mandatory — every function must declare `requires(...)`, `ensures(...)`, and `effects(...)`. The Z3 SMT solver verifies contracts statically where possible; remaining contracts become runtime assertions. All side effects (IO, Http, State, Exceptions, Async, Inference, Random) are tracked in the type system via algebraic effects. -Current version: 0.0.180. The reference compiler is written in Python. Install with `pip install -e .` from the repository. +Current version: 0.0.181. The reference compiler is written in Python. Install with `pip install -e .` from the repository. ## Homepage @@ -55,4 +55,4 @@ Current version: 0.0.180. The reference compiler is written in Python. Install w - [TESTING.md](https://raw.githubusercontent.com/aallan/vera/main/TESTING.md): Test suite architecture, coverage data, and test conventions. - [KNOWN_ISSUES.md](https://raw.githubusercontent.com/aallan/vera/main/KNOWN_ISSUES.md): Known bugs and limitations. - [CONTRIBUTING.md](https://raw.githubusercontent.com/aallan/vera/main/CONTRIBUTING.md): Contribution guidelines. -- [Conformance Suite](https://github.com/aallan/vera/tree/main/tests/conformance): 92 programs validating every language feature against the spec. +- [Conformance Suite](https://github.com/aallan/vera/tree/main/tests/conformance): 93 programs validating every language feature against the spec. diff --git a/pyproject.toml b/pyproject.toml index 15f94b3a..7aaaf397 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "vera" -version = "0.0.180" +version = "0.0.181" description = "Vera: a programming language designed for LLMs, with full contracts, algebraic effects, and typed slot references" readme = "README.md" license = "MIT" diff --git a/scripts/check_spec_examples.py b/scripts/check_spec_examples.py index 6b02fb30..6ea16b19 100644 --- a/scripts/check_spec_examples.py +++ b/scripts/check_spec_examples.py @@ -142,7 +142,7 @@ ("05-functions.md", 324): "FRAGMENT", # fn(A -> B) in param position # Chapter 6 — inline function type in type alias - ("06-contracts.md", 344): "FRAGMENT", # type SafeDiv = fn(...) + fn apply_div + ("06-contracts.md", 347): "FRAGMENT", # type SafeDiv = fn(...) + fn apply_div # Chapter 7 — anonymous function at top level ("07-effects.md", 116): "FRAGMENT", # effect Logger + anonymous fn body diff --git a/spec/06-contracts.md b/spec/06-contracts.md index 0765fb2b..79ec26de 100644 --- a/spec/06-contracts.md +++ b/spec/06-contracts.md @@ -244,6 +244,7 @@ The verifier checks the contracts the programmer wrote, and **auto-synthesises** | `@Int` value into a `@Nat` slot | `value >= 0` | E503 | | `a / b`, `a % b` (Int / Nat) | `b != 0` | E526 | | `arr[i]` (`Array`) | `0 <= i < array_length(arr)` | E527 | +| `a + b`, `a - b`, `a * b` (Int / Nat) | result within i64 / u64 range — no overflow | E528 | To discharge an operation obligation, the programmer encodes the constraint in a precondition (`requires(@Int.0 != 0)`), a guarding `if` (whose path condition holds in the relevant branch), or a refinement type (`{ @Int | @Int.0 != 0 }`). A function that performs `@Int.1 / @Int.0` with `requires(true)` therefore no longer verifies cleanly: the unguarded divisor is a compile error (E526). @@ -251,6 +252,8 @@ To discharge an operation obligation, the programmer encodes the constraint in a The `@Nat` obligations (E502 / E503) carry the most nuance, spanning many binding sites. The verifier emits an E502 obligation `lhs >= rhs` at every `@Nat - @Nat` subtraction site (see [#520](https://github.com/aallan/vera/issues/520)), and an E503 obligation `value >= 0` where an `@Int` value narrows into a `@Nat` **binding** slot — `let`, call-argument, effect-operation-argument, constructor-field, top-level match-bind, and literal-tuple-destructure sites (see [#552](https://github.com/aallan/vera/issues/552)), plus the generic-instantiation, ADT sub-pattern, non-literal-destructure, and cross-module imported-constructor sites (see [#747](https://github.com/aallan/vera/issues/747)). (A narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The codegen mirrors the subtraction obligation and the `@Nat` binding sites with runtime guards — every concrete site (`let`, destructure, match-bind, sub-pattern, concrete constructor field, concrete call-argument) plus **generic function-formal calls**, which guard on the monomorphised callee (the mangled instance `pick$Nat` carries concrete `@Nat` flags). Two sites stay unguarded, both still obligated statically, so a Tier-3 narrowing the solver cannot discharge at either surfaces an E504 warning: the **effect-operation argument**, whose runtime guard is deferred (see [#754](https://github.com/aallan/vera/issues/754)), and the **generic-instantiated constructor field**, since constructor layouts carry no per-field `@Nat` metadata to monomorphise. Division, modulo, and array indexing now follow the same auto-synthesis pattern ([#680](https://github.com/aallan/vera/issues/680)); lifting dynamic or closure-captured array bounds from a runtime-guarded Tier 3 to a Tier-1 proof is part of the Tier 2 verification work in [#427](https://github.com/aallan/vera/issues/427). +**Integer overflow** ([#798](https://github.com/aallan/vera/issues/798)). `@Int` is a signed 64-bit machine integer and `@Nat` an unsigned one; `+` / `-` / `*` wrap at the i64 / u64 boundary. Like `@Nat` underflow and signed-division `MIN / -1`, an overflowing operation is a *partial* operation that **traps** at runtime rather than silently wrapping, so each `@Int` / `@Nat` `+` / `-` / `*` carries an obligation that the result stays in range. It is classified at the operands' **common (coerced) type** — `@Int` if either operand is `@Int` (since `@Nat <: @Int`), else `@Nat` — not one operand's self-type (a non-negative literal is `@Nat`, but `5 + @Int.0` is an i64 add) nor the possibly-narrowed result type (an `@Int.0 + 1` stored into a `@Nat` slot is still an i64 add). A two-check mirrors array indexing: the result provably in range → **Tier 1**; provably out of range (e.g. a literal `i64.MAX + 1`) → a compile error (**E528**); otherwise — dynamic operands — a runtime-guarded **Tier 3** trap. `@Nat` subtraction is excluded — it is the underflow obligation (E502) above, never a high-overflow. + Runtime traps for unguarded primitives are Vera-native: each trap carries a kind label (`divide_by_zero`, `out_of_bounds`, etc.), a per-kind Fix paragraph naming the precondition that would have prevented it, and a source backtrace — so a missing static guarantee is still a recoverable signal. ### 6.4.4 SMT Solver Integration diff --git a/spec/11-compilation.md b/spec/11-compilation.md index 5e006221..eef57d5f 100644 --- a/spec/11-compilation.md +++ b/spec/11-compilation.md @@ -53,6 +53,8 @@ The one operation that can violate the invariant despite well-typed operands is The trap is classified as `kind="unreachable"` rather than a dedicated `kind="underflow"` because the `unreachable` instruction is the lightest-weight trap mechanism and adding a dedicated kind requires new host-import scaffolding (mirroring `vera.contract_fail`); a precise underflow diagnostic via `vera verify` is the recommended path until the dedicated kind lands. +Arithmetic **overflow** of `@Int`/`@Nat` addition, subtraction, and multiplication (`+`/`-`/`*`) is handled the same way ([#798](https://github.com/aallan/vera/issues/798)). Because `Int`/`Nat` share the i64/u64 representation, these operations can wrap under two's-complement arithmetic, so each such site whose operands have `@Int`/`@Nat` provenance carries a Tier-1 proof obligation `result` stays in range. When Z3 discharges it the codegen emits the plain WASM op; when it cannot, the function drops to Tier 3 and the codegen emits a guarded op that traps on overflow with the same `unreachable` net. The overflow trap is likewise classified `kind="unreachable"` (Option A) rather than a dedicated `kind="overflow"` for the same reason — a precise overflow trap kind needs the same host-import scaffolding and is a follow-up; the obligation itself stays sound because the function traps on overflow before it can return a wrapped value. + Authors lift Tier-3 functions back to Tier 1 by adding `requires lhs >= rhs` clauses. Subtraction sites that do not produce a `Nat`-typed result (e.g., `@Int - @Int`) are not obligation-checked — they may produce negative values, which is well-defined for `Int`. `@Byte` arithmetic is not currently permitted by the type checker (`Byte` is excluded from `NUMERIC_TYPES` in `vera/types.py`), so the underflow obligation needs no `Byte` extension today; allowing `@Byte` arithmetic with both underflow and overflow guards is tracked speculatively as [#564](https://github.com/aallan/vera/issues/564). The verifier checks the `@Nat >= 0` invariant at subtraction sites (in any position, including a function's return expression) and at binding sites where an `@Int` value narrows into a `@Nat` slot — `let` bindings, call arguments, constructor fields, top-level match binds, and literal-tuple destructures, plus the pure-literal `let @Nat = 0 - 1` case the subtraction obligation defers ([#552](https://github.com/aallan/vera/issues/552)). [#747](https://github.com/aallan/vera/issues/747) extended the narrowing obligation to the projection and instantiation sites — ADT sub-pattern binds (`match opt { Some(@Nat) -> }` on `Option`), non-literal tuple destructures, generic constructor / effect-operation / function formals instantiated to `@Nat`, and imported ADT constructors — so every narrowing **binding site** is now statically obligated. (A bare `@Int`→`@Nat` narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The Tier-3 runtime guard backs every binding site except two — the effect-operation argument, and the generic-instantiated constructor field (which erases to i64 with no per-field guard) — and a tripped guard reports a generic trap rather than the `requires(... >= 0)` fix. The effect-op guard and the dedicated trap kind are tracked as [#754](https://github.com/aallan/vera/issues/754), the generic constructor field as [#757](https://github.com/aallan/vera/issues/757) (see §11.17). ### 11.2.2 Unit as Void diff --git a/tests/conformance/ch04_int_overflow.vera b/tests/conformance/ch04_int_overflow.vera new file mode 100644 index 00000000..bde2eabd --- /dev/null +++ b/tests/conformance/ch04_int_overflow.vera @@ -0,0 +1,53 @@ +-- Conformance: @Int / @Nat arithmetic overflow obligation (Chapter 4) +-- +-- Tests the proof obligation introduced by #798: every `+` / `-` / `*` on +-- `@Int` / `@Nat` carries a Tier-1 obligation that the result stays within the +-- i64 / u64 range; a site the solver cannot prove safe falls to a runtime +-- overflow trap (spec/06 §6.4.3, alongside @Nat underflow and signed division; +-- the partial-arithmetic story in spec/04 §4.4 and spec/11 §11.2.1). Overflow +-- is classified at the operands' COMMON (coerced) type — `@Int` if either +-- operand is `@Int` (since `@Nat <: @Int`), else `@Nat`. +-- +-- Every site here verifies at Tier 1: the overflow obligation is discharged +-- from explicit `requires` bounds, a path condition, or trivial literal folding. +-- Discharged via explicit requires: bounded operands keep `a + b` in i64. +public fn add_bounded(@Int, @Int -> @Int) + requires(@Int.0 >= 0 && @Int.0 < 1000 && @Int.1 >= 0 && @Int.1 < 1000) + ensures(@Int.result == @Int.0 + @Int.1) + effects(pure) +{ + @Int.0 + @Int.1 +} + +-- Discharged for @Nat (u64 range): a bounded product stays within u64. +public fn nat_mul_bounded(@Nat, @Nat -> @Nat) + requires(@Nat.0 < 1000 && @Nat.1 < 1000) + ensures(@Nat.result == @Nat.0 * @Nat.1) + effects(pure) +{ + @Nat.0 * @Nat.1 +} + +-- Discharged via path condition: the then-branch is reached only when both +-- `0 <= @Int.0` and `@Int.0 < 1000`, so `@Int.0 * 2 < 2000` is in i64 range. +public fn doubler(@Int -> @Int) + requires(true) + ensures(true) + effects(pure) +{ + if @Int.0 >= 0 && @Int.0 < 1000 then { + @Int.0 * 2 + } else { + 0 + } +} + +-- Top-level entry: inline literal arithmetic, discharged trivially (the +-- constant-folded result is provably in range). +public fn main(@Unit -> @Int) + requires(true) + ensures(@Int.result == 30) + effects(pure) +{ + 10 + 20 +} diff --git a/tests/conformance/manifest.json b/tests/conformance/manifest.json index 4a95ab03..36ecaecb 100644 --- a/tests/conformance/manifest.json +++ b/tests/conformance/manifest.json @@ -319,6 +319,21 @@ "path_condition_discharge" ] }, + { + "id": "ch04_int_overflow", + "file": "ch04_int_overflow.vera", + "chapter": 4, + "title": "@Int / @Nat arithmetic overflow obligation (#798)", + "level": "run", + "spec_ref": "Section 6.4.3", + "features": [ + "int_overflow", + "overflow_obligation", + "tier1_discharge", + "requires_discharge", + "path_condition_discharge" + ] + }, { "id": "ch04_comparison", "file": "ch04_comparison.vera", diff --git a/tests/test_codegen.py b/tests/test_codegen.py index 21814a39..1f97c7d8 100644 --- a/tests/test_codegen.py +++ b/tests/test_codegen.py @@ -16541,13 +16541,27 @@ def test_guard_emitted_in_wat_for_nat_sub(self) -> None: f"got: {body!r}" ) - def test_int_subtract_emits_no_guard(self) -> None: - """`@Int - @Int` does not get the guard — Int can be negative. - - Sister to the structural test above: the guard fires only on - sites where the result is statically @Nat AND at least one - operand has @Nat origin. Int-Int sites must emit a bare - `i64.sub` with no `i64.lt_s`/`unreachable` pair adjacent. + def test_int_subtract_emits_no_nat_underflow_guard(self) -> None: + """`@Int - @Int` does not get the #520 *nat-sub underflow* guard. + + Sister to the structural test above: the #520 nat-sub guard fires only + on sites where the result is statically @Nat AND at least one operand + has @Nat origin. @Int - @Int is not such a site (Int can be negative), + so it must NOT carry the nat-sub guard — which would wrongly trap on a + legitimate negative result like ``5 - 10``. + + Note (#798): @Int - @Int now DOES carry the *overflow* guard, a + distinct mechanism — the two-XOR signed-overflow test + ``((a^b) & (a^r)) < 0`` followed by ``unreachable``. That guard does + not fire on ``5 - 10`` (in range), so the runtime behaviour this test + cares about (no spurious trap on a negative Int result) is unchanged. + The discriminator below is therefore the *shape*: the #520 guard + compares the two operands directly (``i64.lt_s`` on lhs/rhs straight + after loading them, no ``i64.xor``), whereas the #798 overflow guard + is XOR-based. We assert the overflow-guard shape is present and the + nat-sub shape (an ``i64.lt_s`` not preceded by the XOR sign-test) is + not the mechanism, by pinning runtime behaviour: ``5 - 10 = -5`` must + return cleanly, never trap. """ src = """ private fn int_sub(@Int, @Int -> @Int) @@ -16555,7 +16569,7 @@ def test_int_subtract_emits_no_guard(self) -> None: ensures(true) effects(pure) { - @Int.0 - @Int.1 + @Int.1 - @Int.0 } public fn main(@Unit -> @Int) @@ -16574,19 +16588,19 @@ def test_int_subtract_emits_no_guard(self) -> None: body = wat[int_sub_idx:body_end] # i64.sub must be present (it's the actual subtraction). assert "i64.sub" in body - # But the guard pieces must NOT be — Int subtraction is unguarded. - # Banning *both* `i64.lt_s` and `i64.lt_u` (regex - # `\bi64\.lt_[su]\b`) defends against a future codegen flip - # to unsigned-comparison or any other compare-then-trap - # variant; the previous `not in body` substring check would - # have silently passed if the guard mechanism changed. - assert not re.search(r"\bi64\.lt_[su]\b", body), ( - f"Unexpected `i64.lt_[su]` in int_sub body — Int subtraction " - f"should not have an underflow guard. Body:\n{body}" + # The #798 overflow guard IS present — and is XOR-based (the signed + # overflow test), distinguishing it from the #520 nat-sub guard which + # never uses i64.xor. + assert "i64.xor" in body, ( + f"Expected the #798 overflow guard's i64.xor sign-test in " + f"int_sub body. Body:\n{body}" ) - assert "unreachable" not in body, ( - f"Unexpected `unreachable` in int_sub body — Int subtraction " - f"should not emit a trap. Body:\n{body}" + # The #520 nat-sub underflow guard would trap on a legitimate negative + # result; pin that it does NOT — main() computes int_sub(5, 10) = 5-10 + # = -5 and must return it cleanly, never trap. + assert _run(src) == -5, ( + "Int subtraction of 5 - 10 must return -5, not trap — the #520 " + "nat-sub underflow guard must not apply to @Int - @Int." ) def test_pure_literal_subtract_emits_no_guard(self) -> None: diff --git a/tests/test_int_overflow.py b/tests/test_int_overflow.py new file mode 100644 index 00000000..acf54ff6 --- /dev/null +++ b/tests/test_int_overflow.py @@ -0,0 +1,83 @@ +"""Regression tests for #798 — @Int/@Nat arithmetic overflow obligations. + +Part of the #392 `smt.py` soundness audit. Before #798 the verifier modelled +`@Int`/`@Nat` as Z3's *unbounded* integers, so `+`/`-`/`*` were treated as total +operations — it proved contracts the i64/u64 runtime then violated under +two's-complement wraparound. The canonical probe: + + public fn inc(@Int -> @Int) + requires(true) ensures(@Int.result > @Int.0) effects(pure) + { @Int.0 + 1 } + +`vera verify` proved `result > input` for all inputs, yet `inc(MAX_i64)` traps +(`x + 1` wraps to `MIN_i64 < x`). + +Per the #798 decision (overflow is a *trapping* partial operation, consistent +with `@Nat` underflow and signed-div `MIN/-1`), every `+`/`-`/`*` on +`@Int`/`@Nat` now emits an ``int_overflow`` obligation — the analog of +``nat_sub`` / ``div_zero``. It discharges at Tier 1 when operand bounds prove +the result stays in range, else falls to Tier-3 (a runtime overflow trap guards +it). The postcondition itself stays Tier-1 sound, because the function now traps +on overflow before it can return a wrapped value. + +Written test-first: each FAILS on the pre-fix verifier, where NO ``int_overflow`` +obligation is emitted at all (overflow is silently assumed). +""" + +from __future__ import annotations + +from vera.parser import parse_to_ast +from vera.checker import typecheck_with_artifacts +from vera.verifier import VerifyResult, verify + + +def _verify(source: str) -> VerifyResult: + ast = parse_to_ast(source) + _diags, arts = typecheck_with_artifacts(ast, source) + return verify( + ast, source, + expr_types=arts.expr_semantic_types, + expr_target_types=arts.expr_target_types, + ) + + +class TestIntOverflowObligations798: + def test_int_add_unbounded_emits_undischarged_overflow_obligation(self) -> None: + # `x + 1` on an unbounded @Int overflows i64 at MAX_i64, so an + # int_overflow obligation must be emitted and left UNdischarged (Tier-3, + # runtime-guarded). Pre-fix: no such obligation — overflow assumed. + result = _verify(""" +public fn inc(@Int -> @Int) + requires(true) ensures(@Int.result > @Int.0) effects(pure) +{ @Int.0 + 1 } +""") + ovf = [o for o in result.obligations if o.kind == "int_overflow"] + assert ovf, [(o.kind, o.status) for o in result.obligations] + assert all(o.status == "tier3" for o in ovf), [(o.kind, o.status) for o in ovf] + + def test_nat_add_unbounded_emits_undischarged_overflow_obligation(self) -> None: + # Same shape at the u64 ceiling for @Nat. + result = _verify(""" +public fn inc(@Nat -> @Nat) + requires(true) ensures(@Nat.result > @Nat.0) effects(pure) +{ @Nat.0 + 1 } +""") + ovf = [o for o in result.obligations if o.kind == "int_overflow"] + assert ovf, [(o.kind, o.status) for o in result.obligations] + assert all(o.status == "tier3" for o in ovf), [(o.kind, o.status) for o in ovf] + + def test_overflow_obligation_discharged_when_operands_bounded(self) -> None: + # Bounded operands → result provably in i64 range → the overflow + # obligation discharges at Tier 1 (exercises the discharge path, not just + # the Tier-3 fallback). A default that happens to skip emission would + # leave `ovf` empty and fail here too. + result = _verify(""" +public fn add_small(@Int, @Int -> @Int) + requires(@Int.0 >= 0 && @Int.0 < 100 && @Int.1 >= 0 && @Int.1 < 100) + ensures(true) effects(pure) +{ @Int.0 + @Int.1 } +""") + ovf = [o for o in result.obligations if o.kind == "int_overflow"] + assert ovf and all(o.status == "verified" for o in ovf), [ + (o.kind, o.status) for o in result.obligations + ] diff --git a/tests/test_int_overflow_codegen.py b/tests/test_int_overflow_codegen.py new file mode 100644 index 00000000..5ce859fb --- /dev/null +++ b/tests/test_int_overflow_codegen.py @@ -0,0 +1,432 @@ +"""Runtime overflow-trap codegen tests for #798 (Stage 3). + +Stage 2 made the verifier emit an ``int_overflow`` obligation at every +``@Int`` / ``@Nat`` ``+`` / ``-`` / ``*`` site (``@Nat`` subtraction is the +separate ``nat_sub`` underflow, excluded). Stage 3 (this file) makes the +codegen emit a runtime guard at *exactly* those sites so ``vera run`` / +``vera compile`` programs trap on overflow instead of silently wrapping at the +i64 / u64 boundary. + +The trap is a bare ``unreachable`` (Option A in the design doc — matches the +#520 ``nat_sub`` and #552 nat-bind precedent), so it classifies as +``kind="unreachable"`` today. A precise ``"overflow"`` trap kind via a host +import is a deliberate follow-up. + +Written test-first: every ``*_traps`` test FAILS on the pre-Stage-3 codegen +(the op wraps silently → no trap → ``execute`` returns a value), and every +``*_no_trap`` test passes both before and after (safe arithmetic is unchanged). + +Constants: + I64_MAX = 9223372036854775807 ( 2^63 - 1 ) + I64_MIN = -9223372036854775808 (-2^63 ) + U64_MAX = 18446744073709551615 ( 2^64 - 1 ) +""" + +from __future__ import annotations + +import tempfile +from pathlib import Path + +import pytest +import wasmtime + +from vera.checker import typecheck_with_artifacts +from vera.codegen import compile as codegen_compile +from vera.codegen import execute +from vera.parser import parse_to_ast + +I64_MAX = 9223372036854775807 +I64_MIN = -9223372036854775808 +U64_MAX = 18446744073709551615 + + +def _compile_with_types(source: str): + """Compile via the same artifact-threaded path as ``cmd_run``. + + The overflow guard's Int/Nat classifier consults the checker's resolved + type table (``expr_semantic_types``), so codegen must be handed it — a bare + ``transform -> compile`` (no typecheck) would leave the table empty and the + classifier blind, exactly the gap the verifier<->codegen differential pins. + """ + with tempfile.NamedTemporaryFile( + mode="w", suffix=".vera", delete=False, encoding="utf-8", + ) as f: + f.write(source) + path = f.name + try: + ast = parse_to_ast(source) + diags, arts = typecheck_with_artifacts(ast, source, file=path) + errors = [d for d in diags if d.severity == "error"] + assert not errors, f"typecheck errors: {[d.description for d in errors]}" + result = codegen_compile( + ast, source=source, file=path, + expr_semantic_types=arts.expr_semantic_types, + ) + errs = [d for d in result.diagnostics if d.severity == "error"] + assert not errs, f"codegen errors: {[d.description for d in errs]}" + return result + finally: + Path(path).unlink(missing_ok=True) + + +_MASK64 = (1 << 64) - 1 + + +def _run(source: str, fn: str, args: list[int]) -> int: + result = _compile_with_types(source) + exec_result = execute(result, fn_name=fn, args=args) + assert exec_result.value is not None + return exec_result.value + + +def _assert_traps(source: str, fn: str, args: list[int]) -> None: + """Assert running ``fn(args)`` traps at the WASM level (overflow guard).""" + result = _compile_with_types(source) + with pytest.raises( + (wasmtime.WasmtimeError, wasmtime.Trap, RuntimeError) + ): + execute(result, fn_name=fn, args=args) + + +def _assert_no_trap(source: str, fn: str, args: list[int], expect: int) -> None: + # wasmtime returns i64 results signed; a @Nat value above 2^63 comes back + # negative. Compare modulo 2^64 so the unsigned intent is checked without + # caring how the host marshals the sign bit. + assert _run(source, fn, args) & _MASK64 == expect & _MASK64 + + +# Dynamic-operand fixtures: args reach the runtime guard (the verifier honestly +# defers to Tier 3, so the guard is what fires — not a verify-time E528). +_INT_ADD = """ +public fn add(@Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.1 + @Int.0 } +""" + +_INT_SUB = """ +public fn sub(@Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.1 - @Int.0 } +""" + +_INT_MUL = """ +public fn mul(@Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.1 * @Int.0 } +""" + +_NAT_ADD = """ +public fn add(@Nat, @Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ @Nat.1 + @Nat.0 } +""" + +_NAT_MUL = """ +public fn mul(@Nat, @Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ @Nat.1 * @Nat.0 } +""" + + +# Literal-LEFT @Int: the site is classified on the EXPRESSION's resolved type +# (@Int / i64), not the literal's @Nat self-type — so the overflow is caught +# (#798 RISK-6 fix). +_INT_LITERAL_LEFT = """ +public fn f(@Int -> @Int) + requires(true) ensures(true) effects(pure) +{ 5 + @Int.0 } +""" + + +class TestLiteralLeftIntOverflow798: + """A positive-literal LEFT operand of an @Int add is guarded at the i64 + range — the literal's own @Nat self-type must not widen the site to u64.""" + + def test_literal_left_int_add_traps_at_i64_range(self) -> None: + # 5 + I64_MAX = 2^63 + 4 overflows i64 → must trap. Were the site + # mis-classified @Nat (u64), 2^63+4 would be "in range" and slip through. + _assert_traps(_INT_LITERAL_LEFT, "f", [I64_MAX]) + + def test_literal_left_small_sum_no_trap(self) -> None: + _assert_no_trap(_INT_LITERAL_LEFT, "f", [10], 15) + + +class TestIntAddOverflow798: + """@Int ADD traps on i64 overflow; safe sums pass through.""" + + def test_max_plus_one_traps(self) -> None: + _assert_traps(_INT_ADD, "add", [I64_MAX, 1]) + + def test_max_plus_max_traps(self) -> None: + _assert_traps(_INT_ADD, "add", [I64_MAX, I64_MAX]) + + def test_min_plus_neg_one_traps(self) -> None: + _assert_traps(_INT_ADD, "add", [I64_MIN, -1]) + + def test_max_plus_zero_no_trap(self) -> None: + _assert_no_trap(_INT_ADD, "add", [I64_MAX, 0], I64_MAX) + + def test_small_sum_no_trap(self) -> None: + _assert_no_trap(_INT_ADD, "add", [5, 3], 8) + + def test_min_plus_max_is_neg_one_no_trap(self) -> None: + # Operands differ in sign → cannot overflow → must NOT trap. + _assert_no_trap(_INT_ADD, "add", [I64_MIN, I64_MAX], -1) + + +class TestIntSubOverflow798: + """@Int SUB is order-sensitive; the guard pins minuend - subtrahend.""" + + def test_min_minus_one_traps(self) -> None: + # add args are (first=@Int.1, second=@Int.0); body is @Int.1 - @Int.0, + # so sub(a, b) computes a - b. + _assert_traps(_INT_SUB, "sub", [I64_MIN, 1]) + + def test_max_minus_neg_one_traps(self) -> None: + _assert_traps(_INT_SUB, "sub", [I64_MAX, -1]) + + def test_zero_minus_min_traps(self) -> None: + # 0 - I64_MIN = 2^63, overflows. + _assert_traps(_INT_SUB, "sub", [0, I64_MIN]) + + def test_min_minus_min_is_zero_no_trap(self) -> None: + _assert_no_trap(_INT_SUB, "sub", [I64_MIN, I64_MIN], 0) + + def test_max_minus_max_is_zero_no_trap(self) -> None: + _assert_no_trap(_INT_SUB, "sub", [I64_MAX, I64_MAX], 0) + + def test_small_diff_no_trap(self) -> None: + _assert_no_trap(_INT_SUB, "sub", [5, 3], 2) + + +class TestIntMulOverflow798: + """@Int MUL is the dangerous one — INT_MIN/-1 and 0 special cases.""" + + def test_min_times_neg_one_traps(self) -> None: + # The special case: I64_MIN * -1 = 2^63 overflows. MUST trap and MUST + # NOT surface as a native i64.div_s INT_MIN/-1 trap. + _assert_traps(_INT_MUL, "mul", [I64_MIN, -1]) + + def test_neg_one_times_min_traps(self) -> None: + # Symmetric operand order — both must trap. + _assert_traps(_INT_MUL, "mul", [-1, I64_MIN]) + + def test_max_times_two_traps(self) -> None: + _assert_traps(_INT_MUL, "mul", [I64_MAX, 2]) + + def test_min_times_two_traps(self) -> None: + _assert_traps(_INT_MUL, "mul", [I64_MIN, 2]) + + def test_two_pow_32_squared_traps(self) -> None: + _assert_traps(_INT_MUL, "mul", [2**32, 2**32]) + + def test_zero_times_min_no_trap(self) -> None: + # The a==0 branch — without it, r/a is 0/0 → div-by-zero (wrong trap). + _assert_no_trap(_INT_MUL, "mul", [0, I64_MIN], 0) + + def test_min_times_zero_no_trap(self) -> None: + _assert_no_trap(_INT_MUL, "mul", [I64_MIN, 0], 0) + + def test_one_times_max_no_trap(self) -> None: + _assert_no_trap(_INT_MUL, "mul", [1, I64_MAX], I64_MAX) + + def test_neg_one_times_max_no_trap(self) -> None: + # -1 * I64_MAX = I64_MIN + 1, in range. + _assert_no_trap(_INT_MUL, "mul", [-1, I64_MAX], I64_MIN + 1) + + def test_min_times_one_no_trap(self) -> None: + _assert_no_trap(_INT_MUL, "mul", [I64_MIN, 1], I64_MIN) + + def test_two_pow_31_squared_no_trap(self) -> None: + # 2^31 * 2^31 = 2^62, in range. + _assert_no_trap(_INT_MUL, "mul", [2**31, 2**31], 2**62) + + def test_small_product_no_trap(self) -> None: + _assert_no_trap(_INT_MUL, "mul", [3, 5], 15) + + +class TestNatAddOverflow798: + """@Nat ADD traps on u64 carry-out.""" + + def test_max_plus_one_traps(self) -> None: + _assert_traps(_NAT_ADD, "add", [U64_MAX, 1]) + + def test_max_plus_max_traps(self) -> None: + _assert_traps(_NAT_ADD, "add", [U64_MAX, U64_MAX]) + + def test_half_plus_half_traps(self) -> None: + _assert_traps(_NAT_ADD, "add", [2**63, 2**63]) + + def test_max_plus_zero_no_trap(self) -> None: + _assert_no_trap(_NAT_ADD, "add", [U64_MAX, 0], U64_MAX) + + def test_small_sum_no_trap(self) -> None: + _assert_no_trap(_NAT_ADD, "add", [5, 3], 8) + + def test_half_plus_half_minus_one_no_trap(self) -> None: + # 2^63 + (2^63 - 1) = U64_MAX, exact. + _assert_no_trap(_NAT_ADD, "add", [2**63, 2**63 - 1], U64_MAX) + + +class TestNatMulOverflow798: + """@Nat MUL traps on u64 overflow; the a==0 branch dodges div-by-zero.""" + + def test_max_times_two_traps(self) -> None: + _assert_traps(_NAT_MUL, "mul", [U64_MAX, 2]) + + def test_two_pow_32_squared_traps(self) -> None: + _assert_traps(_NAT_MUL, "mul", [2**32, 2**32]) + + def test_max_times_max_traps(self) -> None: + _assert_traps(_NAT_MUL, "mul", [U64_MAX, U64_MAX]) + + def test_zero_times_max_no_trap(self) -> None: + # The a==0 branch — without it, 0/0 div-by-zero (wrong trap). + _assert_no_trap(_NAT_MUL, "mul", [0, U64_MAX], 0) + + def test_one_times_max_no_trap(self) -> None: + _assert_no_trap(_NAT_MUL, "mul", [1, U64_MAX], U64_MAX) + + def test_two_pow_32_times_below_no_trap(self) -> None: + # 2^32 * (2^32 - 1) < 2^64. + _assert_no_trap(_NAT_MUL, "mul", [2**32, 2**32 - 1], 2**32 * (2**32 - 1)) + + def test_max_times_one_no_trap(self) -> None: + _assert_no_trap(_NAT_MUL, "mul", [U64_MAX, 1], U64_MAX) + + +# ===================================================================== +# RISK A — MUL round-trip soundness (design RISK 5) +# ===================================================================== + + +def _traps(result, fn: str, args: list[int]) -> bool: + """Run a precompiled module; return True iff it trapped (any WASM fault).""" + try: + execute(result, fn_name=fn, args=args) + return False + except (wasmtime.WasmtimeError, wasmtime.Trap, RuntimeError): + return True + + +class TestMulRoundTripDifferential798: + """The MUL division round-trip must trap on EXACTLY the overflowing + products — no more, no less — checked against Python's exact ``*``. + + The single most error-prone line in the change (design RISK 5): if the + round-trip ever disagrees with the true product's range, the MUL guard is + unsound. Compile each ``mul`` once, then sweep thousands of pseudo-random + operand pairs (plus the hand-picked boundary values) and assert + ``codegen-traps(a*b) <=> (a*b outside the i64 / u64 range)``. + """ + + _BOUNDARIES_SIGNED = [ + 0, 1, -1, 2, -2, I64_MAX, I64_MIN, I64_MAX - 1, I64_MIN + 1, + 2**31, -(2**31), 2**32, -(2**32), 2**62, -(2**62), + ] + _BOUNDARIES_UNSIGNED = [ + 0, 1, 2, 3, U64_MAX, U64_MAX - 1, 2**32, 2**32 - 1, 2**63, 2**63 - 1, + 2**31, 2**16, + ] + + def test_int_mul_matches_python_exact(self) -> None: + import random + rng = random.Random(0x5EED798) + result = _compile_with_types(_INT_MUL) + # mul(a, b) computes @Int.1 * @Int.0 = first * second = a * b. + pairs: list[tuple[int, int]] = [] + for a in self._BOUNDARIES_SIGNED: + for b in self._BOUNDARIES_SIGNED: + pairs.append((a, b)) + for _ in range(4000): + pairs.append(( + rng.randint(I64_MIN, I64_MAX), + rng.randint(I64_MIN, I64_MAX), + )) + mismatches: list[tuple[int, int, bool, bool]] = [] + for a, b in pairs: + trapped = _traps(result, "mul", [a, b]) + overflows = not (I64_MIN <= a * b <= I64_MAX) + if trapped != overflows: + mismatches.append((a, b, trapped, overflows)) + assert not mismatches, ( + f"@Int MUL guard disagrees with Python exact product on " + f"{len(mismatches)} of {len(pairs)} pairs (a, b, trapped, " + f"overflows): {mismatches[:10]}" + ) + + def test_nat_mul_matches_python_exact(self) -> None: + import random + rng = random.Random(0xC0FFEE798) + result = _compile_with_types(_NAT_MUL) + pairs: list[tuple[int, int]] = [] + for a in self._BOUNDARIES_UNSIGNED: + for b in self._BOUNDARIES_UNSIGNED: + pairs.append((a, b)) + for _ in range(4000): + pairs.append(( + rng.randint(0, U64_MAX), + rng.randint(0, U64_MAX), + )) + mismatches: list[tuple[int, int, bool, bool]] = [] + for a, b in pairs: + trapped = _traps(result, "mul", [a, b]) + overflows = a * b > U64_MAX + if trapped != overflows: + mismatches.append((a, b, trapped, overflows)) + assert not mismatches, ( + f"@Nat MUL guard disagrees with Python exact product on " + f"{len(mismatches)} of {len(pairs)} pairs (a, b, trapped, " + f"overflows): {mismatches[:10]}" + ) + + def test_int_add_matches_python_exact(self) -> None: + import random + rng = random.Random(0xADD798) + result = _compile_with_types(_INT_ADD) + pairs = [(a, b) for a in self._BOUNDARIES_SIGNED + for b in self._BOUNDARIES_SIGNED] + for _ in range(3000): + pairs.append((rng.randint(I64_MIN, I64_MAX), + rng.randint(I64_MIN, I64_MAX))) + mismatches = [] + for a, b in pairs: + trapped = _traps(result, "add", [a, b]) + overflows = not (I64_MIN <= a + b <= I64_MAX) + if trapped != overflows: + mismatches.append((a, b, trapped, overflows)) + assert not mismatches, mismatches[:10] + + def test_int_sub_matches_python_exact(self) -> None: + import random + rng = random.Random(0x5B798) + result = _compile_with_types(_INT_SUB) + # sub(a, b) = @Int.1 - @Int.0 = first - second = a - b. + pairs = [(a, b) for a in self._BOUNDARIES_SIGNED + for b in self._BOUNDARIES_SIGNED] + for _ in range(3000): + pairs.append((rng.randint(I64_MIN, I64_MAX), + rng.randint(I64_MIN, I64_MAX))) + mismatches = [] + for a, b in pairs: + trapped = _traps(result, "sub", [a, b]) + overflows = not (I64_MIN <= a - b <= I64_MAX) + if trapped != overflows: + mismatches.append((a, b, trapped, overflows)) + assert not mismatches, mismatches[:10] + + def test_nat_add_matches_python_exact(self) -> None: + import random + rng = random.Random(0xADD_4A7) + result = _compile_with_types(_NAT_ADD) + pairs = [(a, b) for a in self._BOUNDARIES_UNSIGNED + for b in self._BOUNDARIES_UNSIGNED] + for _ in range(3000): + pairs.append((rng.randint(0, U64_MAX), rng.randint(0, U64_MAX))) + mismatches = [] + for a, b in pairs: + trapped = _traps(result, "add", [a, b]) + overflows = a + b > U64_MAX + if trapped != overflows: + mismatches.append((a, b, trapped, overflows)) + assert not mismatches, mismatches[:10] diff --git a/tests/test_int_overflow_differential.py b/tests/test_int_overflow_differential.py new file mode 100644 index 00000000..cea6debf --- /dev/null +++ b/tests/test_int_overflow_differential.py @@ -0,0 +1,336 @@ +"""Verifier<->codegen classification differential for #798 (Stage 3, RISK 6/7). + +The soundness contract: the codegen integer-overflow guard must fire at +*exactly* the sites the verifier emits an ``int_overflow`` obligation for, and +classify each site's operand type (``@Int`` i64 vs ``@Nat`` u64) *identically* +— otherwise a Tier-1-clean program traps spuriously, or (worse) a wrapping op +slips through unguarded. + +A green per-combo unit suite can hide a desync between the verifier's +``_overflow_int_type`` (which reads the checker's *resolved* type) and the +codegen ``_overflow_codegen_type`` (which the pre-fix design re-derived from the +AST, mis-classifying a literal-left ``@Int`` operand as ``@Nat``). This is the +required differential (project cross-component-soundness rule): for a corpus +exercising all five combos *and* the literal-left ambiguity, assert the +verifier's per-site (gated) classification equals the codegen's, site for site. + +Both sides are driven by the SAME ``ast.span_key`` so the comparison is exact. +""" + +from __future__ import annotations + +import json +from pathlib import Path + +import pytest + +from vera import ast +from vera.checker import typecheck_with_artifacts +from vera.parser import parse_to_ast +from vera.verifier import ContractVerifier +from vera.wasm import StringPool +from vera.wasm.context import WasmContext + +REPO_ROOT = Path(__file__).parent.parent +EXAMPLES_DIR = REPO_ROOT / "examples" +CONFORMANCE_DIR = REPO_ROOT / "tests" / "conformance" + + +def _corpus_files() -> list[Path]: + """Examples + every verify/run-level conformance program.""" + manifest = json.loads( + (CONFORMANCE_DIR / "manifest.json").read_text(encoding="utf-8"), + ) + conformance = [ + CONFORMANCE_DIR / entry["file"] + for entry in manifest + if entry["level"] in ("verify", "run") + ] + return sorted(EXAMPLES_DIR.glob("*.vera")) + sorted(conformance) + + +def _binary_sites(program: ast.Program) -> list[ast.BinaryExpr]: + """Every ``+`` / ``-`` / ``*`` BinaryExpr node, in source order.""" + sites: list[ast.BinaryExpr] = [] + + def walk(node: object) -> None: + if isinstance(node, ast.BinaryExpr) and node.op in ( + ast.BinOp.ADD, ast.BinOp.SUB, ast.BinOp.MUL, + ): + sites.append(node) + if isinstance(node, ast.Node): + for v in vars(node).values(): + walk(v) + elif isinstance(node, (list, tuple)): + for item in node: + walk(item) + + walk(program) + return sites + + +def _verifier_gated_type(verifier: ContractVerifier, site: ast.BinaryExpr): + """The type the verifier *obligates* this site at, mirroring the gate in + ``_walk_for_primitive_op_obligations``: classification is on the operands' + common (coerced) type (#798 — not one operand's self-type, which a literal + skews, nor the narrowed result), and ``@Nat`` SUB is excluded (nat_sub).""" + ovf = verifier._overflow_arith_type(site) + if ovf is None: + return None + if site.op == ast.BinOp.SUB and ovf == "Nat": + return None + return ovf + + +def _codegen_gated_type(ctx: WasmContext, site: ast.BinaryExpr): + """The type the codegen guard fires at, mirroring the gate in + ``_translate_binary`` — the operands' common (coerced) type (#798).""" + ovf = ctx._overflow_arith_codegen_type(site) + if ovf is None: + return None + if site.op == ast.BinOp.SUB and ovf == "Nat": + return None + return ovf + + +def _assert_in_lockstep(source: str) -> None: + program = parse_to_ast(source) + diags, arts = typecheck_with_artifacts(program, source) + errors = [d for d in diags if d.severity == "error"] + assert not errors, f"typecheck errors: {[d.description for d in errors]}" + + verifier = ContractVerifier( + source=source, + expr_types=arts.expr_semantic_types, + expr_target_types=arts.expr_target_types, + ) + ctx = WasmContext(StringPool()) + ctx.set_expr_semantic_types(arts.expr_semantic_types) + + sites = _binary_sites(program) + assert sites, "corpus program has no +/-/* sites" + mismatches = [] + for site in sites: + v = _verifier_gated_type(verifier, site) + c = _codegen_gated_type(ctx, site) + if v != c: + key = ast.span_key(site) + mismatches.append((key, site.op.name, v, c)) + assert not mismatches, ( + "verifier<->codegen overflow classification desync " + "(span, op, verifier_type, codegen_type): " + repr(mismatches) + ) + + +# Corpus: all five combos, plus the literal-left ambiguity (RISK 6) in both +# Int and Nat context, plus mixed slot/literal orderings and nested arithmetic. +_CORPUS = { + "int_add_slots": """ +public fn f(@Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.1 + @Int.0 } +""", + "int_sub_slots": """ +public fn f(@Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.1 - @Int.0 } +""", + "int_mul_slots": """ +public fn f(@Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.1 * @Int.0 } +""", + "nat_add_slots": """ +public fn f(@Nat, @Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ @Nat.1 + @Nat.0 } +""", + "nat_mul_slots": """ +public fn f(@Nat, @Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ @Nat.1 * @Nat.0 } +""", + "nat_sub_excluded": """ +public fn f(@Nat, @Nat -> @Nat) + requires(@Nat.1 >= @Nat.0) ensures(true) effects(pure) +{ @Nat.1 - @Nat.0 } +""", + # RISK 6: a literal LEFT operand whose resolved type is Int. The AST-only + # classifier would call `5` (a non-negative IntLit) @Nat → wrong (u64) + # range; the resolved-type lookup must keep it @Int in both sides. + "int_literal_left_add": """ +public fn f(@Int -> @Int) + requires(true) ensures(true) effects(pure) +{ 5 + @Int.0 } +""", + "int_literal_left_sub": """ +public fn f(@Int -> @Int) + requires(true) ensures(true) effects(pure) +{ 5 - @Int.0 } +""", + "int_literal_left_mul": """ +public fn f(@Int -> @Int) + requires(true) ensures(true) effects(pure) +{ 5 * @Int.0 } +""", + # The same literal-left shape but in @Nat context — must stay @Nat. + "nat_literal_left_add": """ +public fn f(@Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ 5 + @Nat.0 } +""", + "nat_literal_left_mul": """ +public fn f(@Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ 5 * @Nat.0 } +""", + # Nested arithmetic — every subexpression site must match. + "int_nested": """ +public fn f(@Int, @Int, @Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.2 * @Int.1 + @Int.0 - 7 } +""", + "nat_nested": """ +public fn f(@Nat, @Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ @Nat.1 * @Nat.0 + 3 } +""", + # Float arithmetic must be classified None on both sides (not guarded). + "float_not_guarded": """ +public fn f(@Float64, @Float64 -> @Float64) + requires(true) ensures(true) effects(pure) +{ @Float64.1 + @Float64.0 } +""", +} + + +class TestVerifierCodegenOverflowDifferential798: + @pytest.mark.parametrize("name", sorted(_CORPUS)) + def test_classification_in_lockstep(self, name: str) -> None: + _assert_in_lockstep(_CORPUS[name]) + + def test_literal_left_classified_identically_to_verifier(self) -> None: + """The RISK-6 case: a positive-literal LEFT operand. + + The checker records a non-negative literal's own narrow type as ``@Nat`` + (``5`` alone is ``@Nat``), but ``5 + @Int.0`` as a *whole* is ``@Int``. + Both the verifier and codegen classify on the WHOLE expression's + resolved type (#798), so both read ``@Int`` (i64) here — in lockstep and + at the correct range. Reading the operand's type instead would classify + ``@Nat`` and mis-range the site to u64. + """ + source = _CORPUS["int_literal_left_add"] + program = parse_to_ast(source) + diags, arts = typecheck_with_artifacts(program, source) + assert not [d for d in diags if d.severity == "error"] + verifier = ContractVerifier( + source=source, + expr_types=arts.expr_semantic_types, + expr_target_types=arts.expr_target_types, + ) + ctx = WasmContext(StringPool()) + ctx.set_expr_semantic_types(arts.expr_semantic_types) + site = _binary_sites(program)[0] + assert isinstance(site.left, ast.IntLit) and site.left.value == 5 + v = verifier._overflow_arith_type(site) + c = ctx._overflow_arith_codegen_type(site) + assert v == c == "Int", ( + f"literal-left classification: verifier={v}, codegen={c} (want Int)" + ) + + def test_literal_left_int_overflow_is_caught(self) -> None: + """A literal-LEFT ``@Int`` overflow is caught — the #798 fix. + + The checker synthesises a non-negative literal's type as ``@Nat``, so an + *operand*-type read of ``5 + @Int.0`` would classify ``@Nat`` and + obligate the site at u64 — silently dropping an ``@Int`` overflow at + ``[I64_MAX+1, U64_MAX]``. Classifying on the whole expression's resolved + type (``@Int``) closes that gap: the site is obligated, and codegen + guards it, at the i64 range. (Lockstep is pinned by + ``test_literal_left_classified_identically_to_verifier``; + ``test_int_overflow_codegen.py::TestLiteralLeftIntOverflow798`` checks + it end-to-end at runtime.) + """ + source = _CORPUS["int_literal_left_add"] + program = parse_to_ast(source) + diags, arts = typecheck_with_artifacts(program, source) + assert not [d for d in diags if d.severity == "error"] + verifier = ContractVerifier( + source=source, + expr_types=arts.expr_semantic_types, + expr_target_types=arts.expr_target_types, + ) + site = _binary_sites(program)[0] + # The fix: classify on the operands' common type → @Int (i64), not the + # literal left operand's @Nat self-type. + assert verifier._overflow_arith_type(site) == "Int" + + def test_narrowed_result_classified_at_i64_not_result_type(self) -> None: + """An @Int add narrowed into a @Nat slot is i64 arithmetic (#798). + + ``@Int.0 + 1`` stored as @Nat resolves the *expression* to @Nat, but the + addition runs at i64 (both operands @Int) — classifying on the result + would mis-range it to u64 and drop an i64 overflow. The operands' common + type (@Int) is the correct width. This pins the shape a result-type + classifier silently gets wrong. + """ + source = """ +public fn f(@Int -> @Nat) + requires(@Int.0 >= 0) ensures(true) effects(pure) +{ @Int.0 + 1 } +""" + program = parse_to_ast(source) + diags, arts = typecheck_with_artifacts(program, source) + assert not [d for d in diags if d.severity == "error"] + verifier = ContractVerifier( + source=source, + expr_types=arts.expr_semantic_types, + expr_target_types=arts.expr_target_types, + ) + site = _binary_sites(program)[0] + # Result resolves to @Nat (narrowed); the *arithmetic* width is @Int. + assert verifier._overflow_int_type(site) == "Nat" # the result type + assert verifier._overflow_arith_type(site) == "Int" # the i64 width + + +class TestCorpusOverflowDifferential798: + """Corpus-wide RISK-7 sweep: across every example + verify/run conformance + program, the verifier's gated ``int_overflow`` classification must equal + the codegen guard's at every ``+`` / ``-`` / ``*`` site. + + The hand-written ``_CORPUS`` above pins the tricky shapes; this sweep is + the broad safety net — a desync anywhere in real Vera code (a slot, a call + return, a nested arithmetic tree, an aliased type) fails here. + """ + + @pytest.mark.parametrize( + "path", _corpus_files(), ids=lambda p: p.name.removesuffix(".vera"), + ) + def test_corpus_classification_in_lockstep(self, path: Path) -> None: + source = path.read_text(encoding="utf-8") + program = parse_to_ast(source) + diags, arts = typecheck_with_artifacts( + program, source, file=str(path), + ) + if [d for d in diags if d.severity == "error"]: + pytest.skip(f"{path.name}: not standalone-typecheckable") + + verifier = ContractVerifier( + source=source, + file=str(path), + expr_types=arts.expr_semantic_types, + expr_target_types=arts.expr_target_types, + ) + ctx = WasmContext(StringPool()) + ctx.set_expr_semantic_types(arts.expr_semantic_types) + + mismatches = [] + for site in _binary_sites(program): + v = _verifier_gated_type(verifier, site) + c = _codegen_gated_type(ctx, site) + if v != c: + mismatches.append((ast.span_key(site), site.op.name, v, c)) + assert not mismatches, ( + f"{path.name}: verifier<->codegen overflow classification desync " + f"(span, op, verifier, codegen): {mismatches}" + ) diff --git a/tests/test_verifier.py b/tests/test_verifier.py index 213d2cbc..c2e57134 100644 --- a/tests/test_verifier.py +++ b/tests/test_verifier.py @@ -484,7 +484,10 @@ def test_linear_arithmetic_is_tier1(self) -> None: { @Int.0 + @Int.1 } """) assert result.summary.tier1_verified == 2 - assert result.summary.tier3_runtime == 0 + # #798: `@Int.0 + @Int.1` (in the body AND the ensures) each carry an + # int_overflow obligation; unbounded @Int operands → tier3 (may overflow + # i64, runtime-guarded). + assert result.summary.tier3_runtime == 2 def test_generic_function_is_tier3(self) -> None: result = _verify(""" @@ -533,8 +536,10 @@ def test_recursive_call_decreases_verified(self) -> None: # ensures(@Nat.result >= 1) — Tier 1 via modular verification # decreases(@Nat.0) — Tier 1 via termination verification # @Nat.0 - 1 underflow obligation (#520) — Tier 1 via path condition + # #798: @Nat.0 * factorial(...) multiply emits an int_overflow + # obligation; operands are unbounded so it falls to Tier 3. assert result.summary.tier1_verified == 4 - assert result.summary.tier3_runtime == 0 + assert result.summary.tier3_runtime == 1 # ===================================================================== @@ -2569,9 +2574,11 @@ def test_mixed_tiers(self) -> None: # ensures — Tier 1 via modular verification # decreases — Tier 1 via termination verification # @Nat.0 - 1 underflow obligation (#520) — Tier 1 via path condition - assert result.summary.total == 4 + # #798: @Nat.0 + f(...) add emits an int_overflow obligation; operands + # are unbounded so it falls to Tier 3. + assert result.summary.total == 5 assert result.summary.tier1_verified == 4 - assert result.summary.tier3_runtime == 0 + assert result.summary.tier3_runtime == 1 def test_multiple_functions_accumulate(self) -> None: result = _verify(""" @@ -2589,8 +2596,12 @@ def test_multiple_functions_accumulate(self) -> None: """) # f: requires(true) trivial + ensures verified = 2 Tier 1 # g: requires(true) trivial + ensures verified = 2 Tier 1 + # #798: g's `@Int.0 + 1` appears twice — once in the body and once in + # ensures(@Int.result == @Int.0 + 1) — each emitting an int_overflow + # obligation; operands are unbounded so both fall to Tier 3. assert result.summary.tier1_verified == 4 - assert result.summary.total == 4 + assert result.summary.total == 6 + assert result.summary.tier3_runtime == 2 # ===================================================================== @@ -3817,7 +3828,7 @@ def test_mutual_recursion_verified(self) -> None: assert result.summary.tier3_runtime == 0 def test_factorial_example_all_t1(self) -> None: - """factorial.vera should have zero Tier 3 contracts.""" + """factorial.vera: one Tier-3 contract (the #798 overflow guard).""" source = EXAMPLES_DIR / "factorial.vera" if not source.exists(): pytest.skip("factorial.vera not found") @@ -3825,8 +3836,11 @@ def test_factorial_example_all_t1(self) -> None: ast = parse_to_ast(text) typecheck(ast, text) result = verify(ast, text, file=str(source)) - assert result.summary.tier3_runtime == 0, ( - f"factorial.vera should have 0 T3, got {result.summary.tier3_runtime}" + # #798: the `@Nat.0 * factorial(@Nat.0 - 1)` multiply emits an + # int_overflow obligation; operands are unbounded so it falls to + # Tier 3 (runtime overflow trap). All other contracts stay Tier 1. + assert result.summary.tier3_runtime == 1, ( + f"factorial.vera should have 1 T3, got {result.summary.tier3_runtime}" ) @@ -3860,7 +3874,9 @@ def test_list_length_decreases(self) -> None: result = _verify(source) e525 = [d for d in result.diagnostics if d.error_code == "E525"] assert e525 == [], "ADT decreases should be verified (no E525)" - assert result.summary.tier3_runtime == 0 + # #798: the `1 + length(...)` add emits an int_overflow obligation; + # operands are unbounded so it falls to Tier 3 (runtime overflow trap). + assert result.summary.tier3_runtime == 1 def test_list_sum_decreases(self) -> None: """List sum with structural decreases is Tier 1.""" @@ -3887,7 +3903,7 @@ def test_list_sum_decreases(self) -> None: assert e525 == [], "ADT decreases should be verified (no E525)" def test_list_ops_all_tier1(self) -> None: - """list_ops.vera should have zero Tier 3 contracts.""" + """list_ops.vera: two Tier-3 contracts (the #798 overflow guards).""" source = EXAMPLES_DIR / "list_ops.vera" if not source.exists(): pytest.skip("list_ops.vera not found") @@ -3895,8 +3911,11 @@ def test_list_ops_all_tier1(self) -> None: ast = parse_to_ast(text) typecheck(ast, text) result = verify(ast, text, file=str(source)) - assert result.summary.tier3_runtime == 0, ( - f"list_ops.vera should have 0 T3, got {result.summary.tier3_runtime}" + # #798: the `1 + length(...)` and `@Int.0 + sum(...)` adds each emit an + # int_overflow obligation; operands are unbounded so both fall to Tier 3 + # (runtime overflow trap). The Tier-1 count is unchanged. + assert result.summary.tier3_runtime == 2, ( + f"list_ops.vera should have 2 T3, got {result.summary.tier3_runtime}" ) assert result.summary.tier1_verified == 8 @@ -4004,6 +4023,16 @@ def test_overall_tier_counts(self) -> None: a Tier-1 proof obligation. One safe (guarded) contract division and one provable body assert in the corpus each discharge to Tier 1 (+2 T1, +2 total over the pre-fix baseline of 263 / 294). + + #798: every @Int/@Nat `+`/`-`/`*` (in bodies AND contract clauses; + @Nat subtraction is excluded — that's the existing nat_sub underflow + obligation) now carries an int_overflow obligation. The corpus gains + 55 such obligations: 8 discharge at Tier 1 (all in life.vera, where + the cell-coordinate operands are provably bounded into i64 range) and + 47 fall to Tier 3 (unbounded operands → runtime overflow trap). Net: + +8 T1, +47 T3, +55 total, +0 tier3_unguarded — verified by + reconstructing the prior 265/31/296/0 baseline with int_overflow + obligations excluded. """ t1 = t3 = total = t3u = 0 for f in sorted(EXAMPLES_DIR.glob("*.vera")): @@ -4016,9 +4045,9 @@ def test_overall_tier_counts(self) -> None: total += result.summary.total t3u += sum(1 for o in result.obligations if o.status == "tier3_unguarded") - assert t1 == 265, f"Expected 265 T1, got {t1}" - assert t3 == 31, f"Expected 31 T3, got {t3}" - assert total == 296, f"Expected 296 total, got {total}" + assert t1 == 273, f"Expected 273 T1, got {t1}" + assert t3 == 78, f"Expected 78 T3, got {t3}" + assert total == 351, f"Expected 351 total, got {total}" assert t3u == 0, f"Expected 0 tier3_unguarded, got {t3u}" @@ -7855,8 +7884,10 @@ def test_destructure_unprojectable_guarded_tier3(self) -> None: assert len(binds) == 2, [(o.kind, o.status) for o in result.obligations] assert all(o.status == "tier3" for o in binds), [ o.status for o in binds] - # tier3_runtime starts at 0, so 2 guarded sites pin `+= 1` (vs `= 1`). - assert result.summary.tier3_runtime == 2, result.summary.tier3_runtime + # tier3_runtime starts at 0, so 2 guarded sites pin `+= 1` (vs `= 1`), + # plus #798: the body `@Nat.0 + @Nat.1` add emits a third Tier-3 + # int_overflow obligation (opaque operands → runtime overflow trap). + assert result.summary.tier3_runtime == 3, result.summary.tier3_runtime def test_destructure_no_narrowing_no_obligation(self) -> None: """``let Tuple<@Int,@Int> = Tuple(...)`` — neither component narrows diff --git a/uv.lock b/uv.lock index 4a7006cc..9087371d 100644 --- a/uv.lock +++ b/uv.lock @@ -1540,7 +1540,7 @@ wheels = [ [[package]] name = "vera" -version = "0.0.180" +version = "0.0.181" source = { editable = "." } dependencies = [ { name = "lark" }, diff --git a/vera/__init__.py b/vera/__init__.py index 4e51fa3d..e493ebaa 100644 --- a/vera/__init__.py +++ b/vera/__init__.py @@ -1,4 +1,4 @@ """Vera: a programming language designed for LLMs.""" -__version__ = "0.0.180" +__version__ = "0.0.181" version = __version__ diff --git a/vera/cli.py b/vera/cli.py index 97cd302a..36c8db17 100644 --- a/vera/cli.py +++ b/vera/cli.py @@ -334,7 +334,7 @@ def cmd_compile( target: str = "wasm", ) -> int: """Parse, type-check, and compile a .vera file to WebAssembly.""" - from vera.checker import typecheck + from vera.checker import typecheck_with_artifacts from vera.codegen import compile as codegen_compile from vera.resolver import ModuleResolver @@ -347,10 +347,13 @@ def cmd_compile( resolver = ModuleResolver(_root=p.parent) resolved = resolver.resolve_imports(ast, p) - # Type-check first - type_diags = resolver.errors + typecheck( + # Type-check first, retaining the resolved-type artifacts so the + # codegen integer-overflow guard (#798) classifies operands in + # lockstep with the verifier. + check_diags, artifacts = typecheck_with_artifacts( ast, source, file=str(p), resolved_modules=resolved, ) + type_diags = resolver.errors + check_diags type_errors = [d for d in type_diags if d.severity == "error"] type_warnings = [d for d in type_diags if d.severity == "warning"] @@ -371,6 +374,7 @@ def cmd_compile( # Compile (C7e: pass resolved modules for cross-module codegen) result = codegen_compile( ast, source=source, file=str(p), resolved_modules=resolved, + expr_semantic_types=artifacts.expr_semantic_types, ) errors = [d for d in result.diagnostics if d.severity == "error"] @@ -461,7 +465,7 @@ def cmd_run( ) -> int: """Parse, type-check, compile, and execute a .vera file.""" from vera.ast import FnDecl - from vera.checker import typecheck + from vera.checker import typecheck_with_artifacts from vera.codegen import compile as codegen_compile, execute from vera.resolver import ModuleResolver @@ -474,10 +478,13 @@ def cmd_run( resolver = ModuleResolver(_root=p.parent) resolved = resolver.resolve_imports(ast, p) - # Type-check - type_diags = resolver.errors + typecheck( + # Type-check, retaining the resolved-type artifacts so the codegen + # integer-overflow guard (#798) classifies operands in lockstep with + # the verifier. + check_diags, artifacts = typecheck_with_artifacts( ast, source, file=str(p), resolved_modules=resolved, ) + type_diags = resolver.errors + check_diags type_errors = [d for d in type_diags if d.severity == "error"] if type_errors: @@ -496,6 +503,7 @@ def cmd_run( # Compile (C7e: pass resolved modules for cross-module codegen) result = codegen_compile( ast, source=source, file=str(p), resolved_modules=resolved, + expr_semantic_types=artifacts.expr_semantic_types, ) if not result.ok: # pragma: no cover — codegen errors after typecheck pass diff --git a/vera/codegen/api.py b/vera/codegen/api.py index c34568c9..6c5f94b8 100644 --- a/vera/codegen/api.py +++ b/vera/codegen/api.py @@ -55,6 +55,7 @@ from vera.errors import Diagnostic from vera.resolver import ResolvedModule + from vera.types import Type # ===================================================================== @@ -180,17 +181,32 @@ def compile( source: str = "", file: str | None = None, resolved_modules: list[ResolvedModule] | None = None, + expr_semantic_types: dict[tuple[int, int, int, int], Type] | None = None, ) -> CompileResult: """Compile a type-checked Vera Program AST to WebAssembly. Returns a CompileResult with WAT text, WASM binary, exports, and any diagnostics. The program should already have passed type checking and (optionally) verification. + + ``expr_semantic_types`` is the checker's resolved-type side-table + (``CheckerArtifacts.expr_semantic_types``), keyed by ``ast.span_key``. + The #798 integer-overflow guard consults it to classify an arithmetic + operand as ``@Int`` (i64) vs ``@Nat`` (u64) using the *same* resolved + type the verifier's ``int_overflow`` obligation uses, so the runtime + guard fires at exactly the sites — and with the exact signed/unsigned + range — the verifier obligates (keeping codegen in lockstep with + ``vera/verifier.py``). When omitted (a ``transform -> compile`` caller + that skipped typecheck), the guard falls back to the AST-only classifier, + which is sound for slot-/call-typed operands but cannot disambiguate a + bare-literal operand's Int-vs-Nat context — such callers should thread + the table to stay precise. """ from vera.codegen.core import CodeGenerator gen = CodeGenerator( source=source, file=file, resolved_modules=resolved_modules, + expr_semantic_types=expr_semantic_types, ) return gen.compile_program(program) diff --git a/vera/codegen/closures.py b/vera/codegen/closures.py index 66add32b..1d0d36c9 100644 --- a/vera/codegen/closures.py +++ b/vera/codegen/closures.py @@ -232,6 +232,9 @@ def _compile_lifted_closure( # not just the WAT type — same propagation as the per-function # ctx in `functions.py`. ctx.set_fn_ret_type_exprs(self._fn_ret_type_exprs) + # #798: resolved-type side-table for the integer-overflow guard's + # Int/Nat operand classifier, inside closure bodies too. + ctx.set_expr_semantic_types(self._expr_semantic_types) # #747: per-parameter concrete-@Nat flags for the call-site # runtime narrowing guard inside closure bodies too. ctx.set_fn_nat_params(self._fn_nat_params) diff --git a/vera/codegen/core.py b/vera/codegen/core.py index 91ccd5ac..21502f56 100644 --- a/vera/codegen/core.py +++ b/vera/codegen/core.py @@ -37,6 +37,7 @@ if TYPE_CHECKING: from vera.resolver import ResolvedModule + from vera.types import Type from vera.wasm.context import WasmContext @@ -82,11 +83,23 @@ def __init__( source: str = "", file: str | None = None, resolved_modules: list[ResolvedModule] | None = None, + expr_semantic_types: ( + dict[tuple[int, int, int, int], Type] | None + ) = None, ) -> None: self.source = source self.file = file self.diagnostics: list[Diagnostic] = [] self.string_pool = StringPool() + # #798: the checker's resolved-type side-table (keyed by + # ``ast.span_key``). Threaded into every ``WasmContext`` so the + # integer-overflow guard classifies an arithmetic operand's Int/Nat + # type the same way the verifier's ``int_overflow`` obligation does. + # ``None`` when a caller skipped typecheck (the guard then falls back + # to the AST-only classifier). + self._expr_semantic_types: ( + dict[tuple[int, int, int, int], Type] | None + ) = expr_semantic_types # Registered function signatures: name -> (param_types, return_type) self._fn_sigs: dict[str, tuple[list[str | None], str | None]] = {} diff --git a/vera/codegen/functions.py b/vera/codegen/functions.py index 81adb3a6..2ea0a6b6 100644 --- a/vera/codegen/functions.py +++ b/vera/codegen/functions.py @@ -90,6 +90,9 @@ def _compile_fn( # resolve the element type of `f()[i]` when `f` returns # `Array`. ctx.set_fn_ret_type_exprs(self._fn_ret_type_exprs) + # #798: resolved-type side-table for the integer-overflow guard's + # Int/Nat operand classifier (kept in lockstep with the verifier). + ctx.set_expr_semantic_types(self._expr_semantic_types) # #747: per-parameter concrete-@Nat flags for the call-site # runtime narrowing guard. ctx.set_fn_nat_params(self._fn_nat_params) diff --git a/vera/errors.py b/vera/errors.py index 720b6e3d..52129a45 100644 --- a/vera/errors.py +++ b/vera/errors.py @@ -511,6 +511,7 @@ def diagnose_lark_error( "E525": "Cannot verify termination metric", "E526": "Division or modulo by zero", "E527": "Array index out of bounds", + "E528": "Arithmetic overflow", # E6xx — Codegen "E600": "Unsupported parameter type", "E601": "Unsupported return type", diff --git a/vera/obligations/core.py b/vera/obligations/core.py index 420e29e7..59667f09 100644 --- a/vera/obligations/core.py +++ b/vera/obligations/core.py @@ -70,6 +70,12 @@ # index is provably out of bounds; else honest tier3 # (length is uninterpreted — beyond Tier 1, see #427 — # and codegen's `out_of_bounds` trap is the guard). + "int_overflow", # @Int/@Nat `+`/`-`/`*` range obligation at one site + # (#798). Two-check like index_bounds: result provably in + # i64 (@Int) / u64 (@Nat) range -> tier-1; provably out of + # range -> loud E528; else honest tier3 (the codegen + # overflow trap is the guard). @Nat `-` is underflow + # (nat_sub), not high-overflow, so it is excluded here. "assert", # a body `assert(P)` predicate (#800, spec §6.2.5). Two- # check like index_bounds: prove P -> tier-1, prove ¬P -> # loud E507 (always traps at runtime), else tier3 (the diff --git a/vera/verifier.py b/vera/verifier.py index 1a1b0d56..4ce3cebc 100644 --- a/vera/verifier.py +++ b/vera/verifier.py @@ -53,6 +53,14 @@ ) +# i64 / u64 range bounds for the #798 integer-overflow obligation. @Int is a +# signed 64-bit machine integer, @Nat an unsigned one; `+`/`-`/`*` wrap at these +# boundaries at runtime (and, per #798, now trap). +_I64_MIN = -(2**63) +_I64_MAX = 2**63 - 1 +_U64_MAX = 2**64 - 1 + + # ===================================================================== # Public API # ===================================================================== @@ -2291,6 +2299,21 @@ def _walk_for_primitive_op_obligations( self._check_div_zero_obligation( decl, expr, smt, slot_env, assumptions, ) + elif expr.op in (ast.BinOp.ADD, ast.BinOp.SUB, ast.BinOp.MUL): + # #798: @Int/@Nat add/sub/mul wrap at the i64/u64 boundary and + # now trap, so each carries an `int_overflow` range obligation. + # @Nat subtraction is underflow (nat_sub, handled above), not + # high-overflow, so it is excluded here. `_overflow_arith_type` + # classifies on the operands' COMMON (coerced) type — the width + # the i64/u64 op runs at — so a literal-left @Int add and an + # @Int add narrowed into a @Nat slot are both i64 (#798). + ovf_type = self._overflow_arith_type(expr) + if ovf_type is not None and not ( + expr.op == ast.BinOp.SUB and ovf_type == "Nat" + ): + self._check_overflow_obligation( + decl, expr, smt, slot_env, assumptions, + ) return if isinstance(expr, ast.UnaryExpr): @@ -3346,6 +3369,112 @@ def _check_index_bounds_obligation( self.summary.tier3_runtime += 1 self._record_obligation(decl.name, "index_bounds", expr, "tier3") + def _overflow_int_type(self, expr: ast.Expr) -> str | None: + """Return ``"Int"`` / ``"Nat"`` if *expr* resolves to a wrapping machine + integer (i64 / u64), else ``None``. + + A single-expression classifier over the checker's resolved type; + :py:meth:`_overflow_arith_type` combines the two operands of an + arithmetic site into the operation's actual signed/unsigned width. + """ + ty = self._resolved_type_of(expr) + if ty is None: + return None + if self._is_nat_type(ty): + return "Nat" + if self._is_int_type(ty): + return "Int" + return None + + def _overflow_arith_type(self, expr: ast.BinaryExpr) -> str | None: + """The signed/unsigned width of the ``+``/``-``/``*`` in *expr* — the + type the i64 / u64 machine op is performed at, i.e. the operands' common + (coerced) type, NOT the possibly-narrowed result type. + + ``@Nat <: @Int``, so the width is ``Int`` if EITHER operand is ``@Int`` + (the other coerces up), else ``Nat`` if both are ``@Nat``. Classifying + on one operand's self-type is wrong for a literal (a non-negative + literal is ``@Nat``, but ``5 + @Int.0`` is an @Int add); classifying on + the *result* is wrong when it is narrowed (``@Int.0 + 1`` stored into a + ``@Nat`` slot is still an i64 add). Either mistake silently mis-ranges + the site and drops a real overflow (#798). + """ + lt = self._overflow_int_type(expr.left) + rt = self._overflow_int_type(expr.right) + if lt is None or rt is None: + return None + return "Int" if "Int" in (lt, rt) else "Nat" + + def _check_overflow_obligation( + self, + decl: ast.FnDecl, + expr: ast.BinaryExpr, + smt: SmtContext, + slot_env: SlotEnv, + assumptions: list[object], + ) -> None: + """Discharge the i64 / u64 range obligation at one ``+``/``-``/``*`` + site (#798). + + ``@Int`` / ``@Nat`` arithmetic wraps at the machine boundary, and per + #798 a wrapping op now traps at runtime, so each site carries a "result + stays in range" obligation. A two-check mirrors + :py:meth:`_check_index_bounds_obligation`, keeping ``vera verify`` + honest without false positives on dynamic operands: + + 1. ``lo <= result <= hi`` valid → **Tier 1** (bounds pin the result); + 2. else ``result < lo || result > hi`` valid → provably overflows → + **loud E528** (e.g. a literal ``MAX_i64 + 1``); + 3. else (dynamic operands) → **honest Tier 3**, guarded by the codegen + overflow trap. + + The range is i64 for ``@Int``, u64 for ``@Nat``. An untranslatable / + non-integer result, or one embedding an opaque shadow (an untranslatable + ``let`` that rebound a stale outer slot), also falls to Tier 3 — its + value is unknown, so neither a Tier-1 discharge nor a false E528. + """ + ovf_type = self._overflow_arith_type(expr) + if ovf_type is None: # pragma: no cover — guarded by the caller + return + result = smt.translate_expr(expr, slot_env) + if (result is None + or result.sort() != z3.IntSort() + or self._contains_opaque_shadow(result)): + self.summary.total += 1 + self.summary.tier3_runtime += 1 + self._record_obligation(decl.name, "int_overflow", expr, "tier3") + return + + if ovf_type == "Nat": + lo, hi = z3.IntVal(0), z3.IntVal(_U64_MAX) + else: + lo, hi = z3.IntVal(_I64_MIN), z3.IntVal(_I64_MAX) + + self.summary.total += 1 + in_range = z3.And(result >= lo, result <= hi) + safe = smt.check_valid(in_range, list(assumptions)) + if safe.status == "verified": + self.summary.tier1_verified += 1 + self._record_obligation(decl.name, "int_overflow", expr, "verified") + return + + # Not provably in range. Distinguish a real, statically-decidable bug + # (provably OUT of range) from dynamic operands we cannot decide. + out_of_range = z3.Or(result < lo, result > hi) + bad = smt.check_valid(out_of_range, list(assumptions)) + if bad.status == "verified": + self.summary.total -= 1 # don't count — it's an error + self._record_obligation( + decl.name, "int_overflow", expr, "violated", + error_code="E528", + counterexample=safe.counterexample, + ) + self._report_overflow(decl, expr, safe.counterexample) + else: + # Dynamic operands — beyond Tier 1; the codegen overflow trap guards. + self.summary.tier3_runtime += 1 + self._record_obligation(decl.name, "int_overflow", expr, "tier3") + def _fresh_slot_var( self, smt: SmtContext, te: ast.TypeExpr, ) -> object | None: @@ -4397,6 +4526,54 @@ def _report_div_by_zero( error_code="E526", ) + def _report_overflow( + self, + decl: ast.FnDecl, + expr: ast.BinaryExpr, + counterexample: dict[str, str] | None, + ) -> None: + """Emit an E528 diagnostic for a provably-overflowing arithmetic op.""" + ce_lines: list[str] = [] + if counterexample: + ce_lines.append("Counterexample:") + for name, value in sorted(counterexample.items()): + if name != "@result": + ce_lines.append(f" {name} = {value}") + ce_text = "\n ".join(ce_lines) if ce_lines else "" + + op_word = { + ast.BinOp.ADD: "addition", + ast.BinOp.SUB: "subtraction", + ast.BinOp.MUL: "multiplication", + }.get(expr.op, "operation") + operands = ast.format_expr(expr) + description = ( + f"Integer {op_word} in '{decl.name}' provably overflows." + ) + if ce_text: + description += f"\n {ce_text}" + + self._error( + expr, + description, + rationale=( + "@Int / @Nat arithmetic carries a Tier-1 proof obligation that " + "the result stays within the i64 / u64 range. The SMT solver " + "proved the result is always out of range; `+`/`-`/`*` trap at " + "runtime on overflow (#798)." + ), + fix=( + f"Constrain the operands so `{operands}` cannot overflow — add a " + f"precondition bounding them, give an operand a refinement type " + f"that bounds its range, or guard the operation with a path " + f"condition the obligation can discharge." + ), + spec_ref=( + 'Chapter 6, Section 6.4.3 "Primitive Operation Safety"' + ), + error_code="E528", + ) + def _report_assert_violation( self, decl: ast.FnDecl, @@ -5174,6 +5351,14 @@ def _is_nat_type(ty: Type) -> bool: """Check if a type is Nat (non-negative integer).""" return ty == NAT or (isinstance(ty, RefinedType) and ty.base == NAT) + @staticmethod + def _is_int_type(ty: Type) -> bool: + """Check if a type is Int (signed integer), including a refinement over + Int. Disjoint from :py:meth:`_is_nat_type`: ``@Nat`` is a distinct + unsigned machine type (despite ``Nat <: Int`` subtyping), and its + overflow range (u64) differs from ``@Int``'s (i64) (#798).""" + return ty == INT or (isinstance(ty, RefinedType) and ty.base == INT) + @staticmethod def _is_unit_refinement(ty: Type) -> bool: """Whether *ty* is a refinement over the ``@Unit`` base, which codegen diff --git a/vera/wasm/context.py b/vera/wasm/context.py index b57de1dd..591b4f15 100644 --- a/vera/wasm/context.py +++ b/vera/wasm/context.py @@ -231,6 +231,25 @@ def __init__( # codegen base's `_harvest_inference_failures` emits a # specific [E616] before falling through to [E602]. self._apply_fn_inference_failures: list[ast.Expr] = [] + # #798: the checker's resolved-type side-table (keyed by + # ``ast.span_key``), threaded from the CodeGenerator. The + # integer-overflow guard reads it to classify an arithmetic operand + # as @Int (i64) vs @Nat (u64) using the SAME resolved type the + # verifier's ``int_overflow`` obligation uses, so codegen guards + # exactly the sites — at exactly the range — the verifier obligates. + # ``None`` when typecheck was skipped (AST-only fallback). + self._expr_semantic_types: ( + dict[tuple[int, int, int, int], object] | None + ) = None + + def set_expr_semantic_types( + self, + types: dict[tuple[int, int, int, int], object] | None, + ) -> None: + """Seed the checker's resolved-type side-table for the #798 overflow + guard's Int/Nat operand classifier (mirrors the verifier's + ``_resolved_type_of`` / ``_overflow_int_type``).""" + self._expr_semantic_types = types def set_fn_ret_types( self, ret_types: dict[str, str | None], diff --git a/vera/wasm/operators.py b/vera/wasm/operators.py index dbef1f16..f7a772cf 100644 --- a/vera/wasm/operators.py +++ b/vera/wasm/operators.py @@ -95,6 +95,26 @@ def _translate_binary( # skipped verification. if op == ast.BinOp.SUB and self._is_nat_subtraction(expr): return self._emit_nat_sub_guard(left, right) + # #798: @Int/@Nat add/sub/mul wrap at the i64/u64 boundary; emit a + # runtime overflow guard mirroring the verifier's `int_overflow` + # obligation (vera/verifier.py:_check_overflow_obligation). The + # classifier consults the checker's resolved-type table so it + # guards exactly the sites — at the exact signed/unsigned range — + # the verifier obligates. @Nat subtraction is `nat_sub` underflow + # (handled above), not high-overflow, so it is excluded here, in + # lockstep with the verifier's `expr.op == SUB and ovf == "Nat"` + # exclusion. Programs that ran `vera verify` first caught any + # provable overflow statically (loud E528); this guard is the + # safety net for the `vera compile` / `vera run` paths. + # `_overflow_arith_codegen_type` classifies on the operands' COMMON + # (coerced) type — the width the i64/u64 op runs at — so a + # literal-left @Int add and an @Int add narrowed into a @Nat slot + # are both i64, in lockstep with the verifier (#798). + ovf = self._overflow_arith_codegen_type(expr) + if (op in (ast.BinOp.ADD, ast.BinOp.SUB, ast.BinOp.MUL) + and ovf is not None + and not (op == ast.BinOp.SUB and ovf == "Nat")): + return self._emit_overflow_guard(left, right, op, ovf) return left + right + [self._ARITH_OPS[op]] # Comparison — choose i32/i64/f64 based on operand types @@ -1008,3 +1028,363 @@ def _has_underflow_leaf(self, value: ast.Expr) -> bool: return any(self._has_underflow_leaf(arm.body) for arm in value.arms) return False + + # ----------------------------------------------------------------- + # @Int / @Nat integer-overflow runtime guard (#798) + # ----------------------------------------------------------------- + + # WASM literal forms for the two's-complement i64 bounds. ``i64.const`` + # accepts the value ``-9223372036854775808`` directly (the lexer reads a + # negative literal, not ``-(9223372036854775808)`` whose magnitude is out + # of the signed range), so emitting ``str(_I64_MIN_CODEGEN)`` is correct. + _I64_MIN_CODEGEN = -(2**63) + + def _overflow_codegen_type(self, expr: ast.Expr) -> str | None: + """Return ``"Int"`` / ``"Nat"`` if *expr* (the whole arithmetic + expression) is a machine integer subject to the #798 overflow guard, + else ``None``. + + This is the codegen mirror of + :py:meth:`ContractVerifier._overflow_int_type`, classifying on the + binary expression's resolved type. Classifying on the expression (not + the left operand) handles a literal correctly: a non-negative literal + carries its own narrow ``@Nat`` type, so ``5 + @Int.0`` is an ``@Int`` + (i64) add even though ``5`` is ``@Nat`` — reading the operand's type + would mis-range it to u64. + + The verifier classifies on the *checker's resolved type* + (``_resolved_type_of``). Codegen does the same FIRST — consulting the + threaded ``_expr_semantic_types`` side-table — so a bare-literal + operand whose resolved type is context-dependent (``5 + @Int.0`` is + Int, ``5 + @Nat.0`` is Nat) is classified identically to the verifier. + Without this, the AST-only fallback would call any non-negative + ``IntLit`` ``@Nat`` (the ``_is_static_nat_typed`` rule), mis-ranging a + literal-left @Int site to u64 and silently dropping an @Int overflow at + ``[I64_MAX+1, U64_MAX]`` — a verifier/codegen desync (#798 RISK 6). + + The AST-only fallback (``_is_static_nat_typed`` / ``_is_static_int_typed``) + runs only when the side-table is absent or has no entry for this span + — e.g. a ``transform -> compile`` caller that skipped typecheck. It is + sound for slot-/call-typed operands; the literal ambiguity it cannot + resolve is the documented precision gap such callers accept. + """ + resolved = self._resolved_codegen_type(expr) + if resolved is not None: + return resolved + if self._is_static_nat_typed(expr): + return "Nat" + if self._is_static_int_typed(expr): + return "Int" + return None + + def _overflow_arith_codegen_type(self, expr: ast.BinaryExpr) -> str | None: + """The codegen mirror of + :py:meth:`ContractVerifier._overflow_arith_type` — the operation's + signed/unsigned width = the operands' common (coerced) type (``Int`` if + either operand is ``@Int``, else ``@Nat``), NOT the narrowed result + type. Keeps the runtime guard in lockstep with the verifier's + obligation at every ``+``/``-``/``*`` site (#798).""" + lt = self._overflow_codegen_type(expr.left) + rt = self._overflow_codegen_type(expr.right) + if lt is None or rt is None: + return None + return "Int" if "Int" in (lt, rt) else "Nat" + + def _resolved_codegen_type(self, expr: ast.Expr) -> str | None: + """Look up *expr*'s checker-resolved type as ``"Int"`` / ``"Nat"``, + else ``None`` (no table, no entry, or a non-Int/Nat type). + + Mirrors :py:meth:`ContractVerifier._overflow_int_type` over + ``_resolved_type_of``: it dispatches on the resolved type's base so a + ``@Nat`` reached through a refinement/alias still classifies as Nat. + """ + table = self._expr_semantic_types + if table is None: + return None + key = ast.span_key(expr) + if key is None: + return None + ty = table.get(key) + if ty is None: + return None + # Avoid importing the type module at call time on every arithmetic + # site: dispatch on the resolved type's *base name*. ``PrimitiveType`` + # has a ``name``; ``RefinedType`` has a ``base`` carrying it. + base = getattr(ty, "base", ty) + name = getattr(base, "name", None) + if name == "Nat": + return "Nat" + if name == "Int": + return "Int" + return None + + def _is_static_int_typed(self, expr: ast.Expr) -> bool: + """Return True iff *expr* has static type @Int by AST shape alone. + + AST-only fallback companion to :py:meth:`_is_static_nat_typed`, used + only when the resolved-type table is unavailable. Called *after* the + @Nat check in :py:meth:`_overflow_codegen_type`, so a non-negative + ``IntLit`` (which ``_is_static_nat_typed`` already claims as @Nat) does + not reach here — only a negative ``IntLit``, an @Int slot, an @Int + function return, or an arithmetic tree of @Int operands. Conservative + False elsewhere (a @Byte / @Float / @Bool / String operand is not @Int + and must not be guarded). + """ + if isinstance(expr, ast.SlotRef): + return expr.type_name == "Int" + if isinstance(expr, ast.IntLit): + return True + if isinstance(expr, ast.BinaryExpr): + if expr.op in ( + ast.BinOp.ADD, ast.BinOp.SUB, ast.BinOp.MUL, + ast.BinOp.DIV, ast.BinOp.MOD, + ): + return (self._operand_is_int_or_nat(expr.left) + and self._operand_is_int_or_nat(expr.right)) + return False + if isinstance(expr, ast.IfExpr): + if expr.else_branch is None: + return False + return (self._is_static_int_typed(expr.then_branch) + and self._is_static_int_typed(expr.else_branch)) + if isinstance(expr, ast.Block): + return self._is_static_int_typed(expr.expr) + if isinstance(expr, ast.MatchExpr): + if not expr.arms: + return False + return all( + self._is_static_int_typed(arm.body) for arm in expr.arms + ) + if isinstance(expr, ast.FnCall): + return self._infer_fncall_vera_type(expr) == "Int" + if isinstance(expr, ast.ModuleCall): + return self._infer_fncall_vera_type( + ast.FnCall(name=expr.name, args=expr.args, span=expr.span), + ) == "Int" + return False + + def _operand_is_int_or_nat(self, expr: ast.Expr) -> bool: + """True iff *expr* is statically @Int or @Nat (AST-only). + + An arithmetic node is @Int when both operands are integral (Int or + Nat) but at least one is @Int — the Nat<:Int subtyping rule. Used by + :py:meth:`_is_static_int_typed` so ``@Int.0 + 3`` (an @Int slot plus a + non-negative @Nat-by-shape literal) still classifies @Int. + """ + return (self._is_static_nat_typed(expr) + or self._is_static_int_typed(expr)) + + def _emit_overflow_guard( + self, + left: list[str], + right: list[str], + op: ast.BinOp, + ovf: str, + ) -> list[str]: + """Dispatch to the per-(op, type) guarded arithmetic sequence (#798). + + Each sequence computes the wrapping result, checks whether the true + (unbounded) result left the i64 (@Int) / u64 (@Nat) range, traps via a + bare ``unreachable`` if so (classified ``kind="unreachable"`` by the + trap taxonomy — a precise ``"overflow"`` kind via host import is a + follow-up, mirroring the #520 / #552 guards), and leaves the wrapping + result on the stack otherwise. @Nat SUB never reaches here (excluded + by the caller; it is ``nat_sub`` underflow). + """ + if ovf == "Nat": + if op == ast.BinOp.ADD: + return self._emit_nat_add_guard(left, right) + # MUL (SUB is excluded by the caller). + return self._emit_nat_mul_guard(left, right) + # @Int. + if op == ast.BinOp.ADD: + return self._emit_int_add_guard(left, right) + if op == ast.BinOp.SUB: + return self._emit_int_sub_guard(left, right) + return self._emit_int_mul_guard(left, right) + + def _emit_int_add_guard( + self, left: list[str], right: list[str], + ) -> list[str]: + """@Int ADD, signed i64. Overflow iff ``((a^r) & (b^r)) < 0`` — + the Hacker's-Delight 2-12 test: ``a+b`` overflows iff ``a`` and ``b`` + share a sign but the wrapped result ``r`` has the opposite sign. + Leaves ``r`` on the stack.""" + a_tmp = self.alloc_local("i64") + b_tmp = self.alloc_local("i64") + r_tmp = self.alloc_local("i64") + return [ + *left, + f"local.set {a_tmp}", + *right, + f"local.set {b_tmp}", + f"local.get {a_tmp}", + f"local.get {b_tmp}", + "i64.add", + f"local.tee {r_tmp}", # stack: [r] + # (a ^ r): + f"local.get {a_tmp}", + f"local.get {r_tmp}", + "i64.xor", # stack: [r, (a^r)] + # (b ^ r): + f"local.get {b_tmp}", + f"local.get {r_tmp}", + "i64.xor", # stack: [r, (a^r), (b^r)] + "i64.and", # stack: [r, (a^r)&(b^r)] + "i64.const 0", + "i64.lt_s", # stack: [r, cond] + "if", + " unreachable", + "end", # stack: [r] + ] + + def _emit_int_sub_guard( + self, left: list[str], right: list[str], + ) -> list[str]: + """@Int SUB, signed i64, ``a - b`` (left=minuend). Overflow iff + ``((a^b) & (a^r)) < 0``: ``a-b`` overflows iff ``a`` and ``b`` differ + in sign and the result ``r`` differs in sign from ``a``. Operand + order is load-bearing (asymmetric test). Leaves ``r`` on the stack.""" + a_tmp = self.alloc_local("i64") + b_tmp = self.alloc_local("i64") + r_tmp = self.alloc_local("i64") + return [ + *left, + f"local.set {a_tmp}", + *right, + f"local.set {b_tmp}", + f"local.get {a_tmp}", + f"local.get {b_tmp}", + "i64.sub", + f"local.tee {r_tmp}", # stack: [r] + # (a ^ b): + f"local.get {a_tmp}", + f"local.get {b_tmp}", + "i64.xor", # stack: [r, (a^b)] + # (a ^ r): + f"local.get {a_tmp}", + f"local.get {r_tmp}", + "i64.xor", # stack: [r, (a^b), (a^r)] + "i64.and", # stack: [r, (a^b)&(a^r)] + "i64.const 0", + "i64.lt_s", # stack: [r, cond] + "if", + " unreachable", + "end", # stack: [r] + ] + + def _emit_int_mul_guard( + self, left: list[str], right: list[str], + ) -> list[str]: + """@Int MUL, signed i64 — the dangerous one. Division round-trip with + the ``INT_MIN * -1`` special case. + + ``overflow ⟺ a != 0 && ((a == -1 && b == INT_MIN) || (a != -1 && r/a != b))`` + + The ``a == 0`` branch avoids ``r/0``; the ``a == -1`` pre-check avoids + the native ``i64.div_s`` trap on ``INT_MIN / -1`` (testing ``b == + INT_MIN`` instead). Uses ``local.set r_tmp`` to clear the operand + stack before the nested ``if`` blocks (a value left under an ``if`` + whose arms don't symmetrically consume it is a WASM validation error), + then pushes ``r`` at the end. Leaves ``r`` on the stack.""" + a_tmp = self.alloc_local("i64") + b_tmp = self.alloc_local("i64") + r_tmp = self.alloc_local("i64") + return [ + *left, + f"local.set {a_tmp}", + *right, + f"local.set {b_tmp}", + f"local.get {a_tmp}", + f"local.get {b_tmp}", + "i64.mul", + f"local.set {r_tmp}", # stack empty + f"local.get {a_tmp}", + "i64.eqz", + "if", # a == 0 → safe, no checks + "else", + f" local.get {a_tmp}", + " i64.const -1", + " i64.eq", + " if", # a == -1 + f" local.get {b_tmp}", + f" i64.const {self._I64_MIN_CODEGEN}", + " i64.eq", + " if", # b == INT_MIN → overflow + " unreachable", + " end", + " else", # a != 0 && a != -1 → safe to divide + f" local.get {r_tmp}", + f" local.get {a_tmp}", + " i64.div_s", + f" local.get {b_tmp}", + " i64.ne", + " if", # r/a != b → overflow + " unreachable", + " end", + " end", + "end", + f"local.get {r_tmp}", # stack: [r] + ] + + def _emit_nat_add_guard( + self, left: list[str], right: list[str], + ) -> list[str]: + """@Nat ADD, unsigned u64. Overflow iff ``r list[str]: + """@Nat MUL, unsigned u64. Overflow iff ``a != 0 && r/u a != b``. + No ``-1`` / INT_MIN hazard (unsigned div only traps on divide-by-zero, + excluded by the ``a == 0`` branch). Leaves ``r`` on the stack.""" + a_tmp = self.alloc_local("i64") + b_tmp = self.alloc_local("i64") + r_tmp = self.alloc_local("i64") + return [ + *left, + f"local.set {a_tmp}", + *right, + f"local.set {b_tmp}", + f"local.get {a_tmp}", + f"local.get {b_tmp}", + "i64.mul", + f"local.set {r_tmp}", # stack empty + f"local.get {a_tmp}", + "i64.eqz", + "if", # a == 0 → safe + "else", + f" local.get {r_tmp}", + f" local.get {a_tmp}", + " i64.div_u", + f" local.get {b_tmp}", + " i64.ne", + " if", # r/u a != b → overflow + " unreachable", + " end", + "end", + f"local.get {r_tmp}", # stack: [r] + ] From 0c5c56daf01891336319deaf0bf3c6539580d38e Mon Sep 17 00:00:00 2001 From: T Date: Sat, 27 Jun 2026 01:18:25 +0100 Subject: [PATCH 2/5] =?UTF-8?q?fix(798):=20address=20review=20=E2=80=94=20?= =?UTF-8?q?E528=20test=20coverage,=20cross-module=20overflow,=20doc=20accu?= =?UTF-8?q?racy?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 2: adversarial-review + CodeRabbit findings on PR #809 (all tests/docs; no core-logic change). Tests: - Add the missing E528 'violated' regression (Int + Nat) — the loud-error arm of the two-check was untested; mutation-validated (disabling the branch flips it RED). Tighten the three existing obligation tests to assert the exact count. - Pin the Nat-subtraction exclusion at the real verifier gate (nat_sub, not int_overflow), not just the differential's re-derived helper. - Add the narrowed Int-into-Nat shape to the codegen/verifier lockstep corpus. - Cross-module coverage: thread ModuleResolver through the codegen helper (mirroring cmd_run) + a test that an imported function's overflow traps; mutation-validated against the guard hook. Docs: - spec/11 section 11.2.1: codegen is type-driven, not tier-driven — it emits the guard at every classified site regardless of the Z3 tier (fixed for both the overflow paragraph and the parallel Nat-subtraction paragraph); spell out the verifier three-way split (Tier 1 / E528 / Tier 3); drop the bare "(Option A)". - spec/06 section 6.4.3: narrow the E528 table row so Nat subtraction (E502) is not implied; fix the wrong "i64.MAX + 1" example (bare literals are Nat, in range) to "u64.MAX + 1". Same example fix in CHANGELOG. - TESTING.md: 92 -> 93 conformance count (3 stale mentions); refresh test counts (5145 -> 5151) across TESTING / README / ROADMAP. Refs #798. Co-Authored-By: Claude --- CHANGELOG.md | 2 +- README.md | 2 +- ROADMAP.md | 2 +- TESTING.md | 14 ++-- spec/06-contracts.md | 4 +- spec/11-compilation.md | 4 +- tests/test_int_overflow.py | 64 +++++++++++++-- tests/test_int_overflow_codegen.py | 103 +++++++++++++++++++++++- tests/test_int_overflow_differential.py | 11 +++ 9 files changed, 186 insertions(+), 20 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ca4927a2..078dc4f3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,7 +10,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ### Fixed -- **Verifier + codegen soundness: `@Int` / `@Nat` arithmetic overflow now traps** (batch 4 of the [#392](https://github.com/aallan/vera/issues/392) `smt.py` soundness audit). The verifier modelled `@Int` / `@Nat` as Z3's *unbounded* integers, so `+` / `-` / `*` were total — it proved postconditions the i64 / u64 runtime then violated under two's-complement wraparound: `ensures(@Int.result > @Int.0)` for `{ @Int.0 + 1 }` verified at Tier 1, yet `inc(i64.MAX)` wraps to `i64.MIN`. `@Int` / `@Nat` `+` / `-` / `*` are now *partial* operations that **trap** on overflow at runtime (consistent with `@Nat` underflow and signed-division `MIN / -1`), and the verifier auto-synthesises an `int_overflow` obligation that the result stays in range — a two-check mirroring array bounds: provably in range → Tier 1; provably out of range (e.g. a literal `i64.MAX + 1`) → a compile error (**E528**); dynamic operands → a runtime-guarded Tier-3 overflow trap. Overflow is classified at the operands' **common (coerced) type** (`@Int` if either operand is `@Int`), so a literal-left `5 + @Int.0` and an `@Int + 1` narrowed into a `@Nat` slot are both i64 adds — not the literal's `@Nat` self-type nor the narrowed result. `@Nat` subtraction stays the existing underflow obligation (E502). The runtime overflow trap is currently the generic `unreachable` kind; a precise `overflow` kind is tracked as [#808](https://github.com/aallan/vera/issues/808). ([#798](https://github.com/aallan/vera/issues/798)) +- **Verifier + codegen soundness: `@Int` / `@Nat` arithmetic overflow now traps** (batch 4 of the [#392](https://github.com/aallan/vera/issues/392) `smt.py` soundness audit). The verifier modelled `@Int` / `@Nat` as Z3's *unbounded* integers, so `+` / `-` / `*` were total — it proved postconditions the i64 / u64 runtime then violated under two's-complement wraparound: `ensures(@Int.result > @Int.0)` for `{ @Int.0 + 1 }` verified at Tier 1, yet `inc(i64.MAX)` wraps to `i64.MIN`. `@Int` / `@Nat` `+` / `-` / `*` are now *partial* operations that **trap** on overflow at runtime (consistent with `@Nat` underflow and signed-division `MIN / -1`), and the verifier auto-synthesises an `int_overflow` obligation that the result stays in range — a two-check mirroring array bounds: provably in range → Tier 1; provably out of range (e.g. a literal `u64.MAX + 1`, or `@Int.0 + 1` under `requires(@Int.0 == i64.MAX)`) → a compile error (**E528**); dynamic operands → a runtime-guarded Tier-3 overflow trap. Overflow is classified at the operands' **common (coerced) type** (`@Int` if either operand is `@Int`), so a literal-left `5 + @Int.0` and an `@Int + 1` narrowed into a `@Nat` slot are both i64 adds — not the literal's `@Nat` self-type nor the narrowed result. `@Nat` subtraction stays the existing underflow obligation (E502). The runtime overflow trap is currently the generic `unreachable` kind; a precise `overflow` kind is tracked as [#808](https://github.com/aallan/vera/issues/808). ([#798](https://github.com/aallan/vera/issues/798)) ## [0.0.180] - 2026-06-26 diff --git a/README.md b/README.md index 4190469a..d97916e8 100644 --- a/README.md +++ b/README.md @@ -215,7 +215,7 @@ cp /path/to/vera/SKILL.md ~/.claude/skills/vera-language/SKILL.md ## Project status -Vera is in **active development** at v0.0.181 — 1,400+ commits, 181 releases, 5,145 tests, 91% code coverage, 93 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. +Vera is in **active development** at v0.0.181 — 1,400+ commits, 181 releases, 5,151 tests, 91% code coverage, 93 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. The reference compiler — parser, AST, type checker, contract verifier (Z3), WASM code generator, module system, browser runtime, and runtime contract insertion — is working. The language specification is in draft across [13 chapters](spec/). diff --git a/ROADMAP.md b/ROADMAP.md index 1392d52d..023c750e 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,7 +8,7 @@ Priority lives in this file and nowhere else — issues carry kind and area labe ## Where we are -5,145 tests, 93 conformance programs, 35 examples, 13 spec chapters. +5,151 tests, 93 conformance programs, 35 examples, 13 spec chapters. ## The roadmap diff --git a/TESTING.md b/TESTING.md index 11712fd7..5ee3bd7f 100644 --- a/TESTING.md +++ b/TESTING.md @@ -6,7 +6,7 @@ This is the single source of truth for Vera's testing infrastructure, coverage d | Metric | Value | |--------|-------| -| **Tests** | 5,145 across 44 files (~63,000 lines of test code; 5,113 passed + 16 stress, 16 skipped) | +| **Tests** | 5,151 across 44 files (~63,000 lines of test code; 5,119 passed + 16 stress, 16 skipped) | | **Compiler code coverage** | 95% Python, 61% JavaScript — 91% combined (CI minimum: 80%) | | **Conformance programs** | 93 programs across 9 spec chapters, validating every language feature | | **Example programs** | 35, all validated through `vera check` + `vera verify` | @@ -61,9 +61,9 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_obligations.py` | 284 | 1083 | Reified proof obligations + warm `VerificationSession` (#222 Phase A): full-corpus differential oracle (warm session == cold `verify()` on diagnostics, summary, and obligation stream, plus warm-twice determinism, across all 35 examples and every verify/run-level conformance program), summary↔obligation tier-bookkeeping consistency, per-kind unit tests (requires / ensures / decreases / nat_sub / call_pre statuses, counterexamples, error codes), content-key stability + same-text-two-sites span disambiguation, session solver reuse, type-error short-circuit, ADT-registry resync between programs; plus the Phase B incremental suite — identical-source full replay, callee-body-edit replays callers while callee-contract-edit invalidates them, span-shift and ADT-edit conservative invalidation, cross-program isolation, timeout-status never cached (monkeypatched solver), FIFO eviction bound; plus the #727 dedup pin — a violating call in a let RHS records exactly one E501 diagnostic and one call_pre obligation | | `test_verifier.py` | 473 | 9,270 | Z3 verification, counterexamples, tier classification, call-site preconditions, branch-aware preconditions, pipe operator, cross-module contracts, match/ADT verification, decreases verification, mutual recursion, refined Bool/String/Float64 param sorts, **@Nat subtraction underflow obligation** (#520 — Path-A obligation discharge via requires/path-conditions/path-aware Z3 refutation, pure-literal exclusion, Int-Int and Nat-Int → Int exemptions), **@Nat binding-site narrowing obligation** (#552 — Tier-1 `value >= 0` at let/call-arg/effect-op-arg/ctor-field/match-bind/literal-destructure narrowing, #520 double-emit disjointness, E503 counterexample-witness pin; #747 — generic-instantiation, ADT sub-pattern, non-literal tuple-destructure, and cross-module imported-ctor narrowing, with concrete-site Tier-3 narrowings (incl. the if-expr-source SMT gap, recorded per-component) classified as codegen-guarded `tier3_runtime`, generic function-formal call-args guarded on the monomorphised callee (keyed on `call_target`, not the erased generic decl), and the two genuinely-unguarded residuals — effect-op-arg narrowings (no codegen guard yet, #754) and generic-instantiated constructor-field narrowings (constructors carry no per-field `@Nat` mono metadata) — surfacing the `E504` unguarded warning; #749 — IndexExpr/InterpolatedString walker-recursion pins, `_fresh_slot_var` nat-alias unit test, `_narrows_into_nat` verifier/codegen soundness parity), **refinement-predicate verification** (#746 — Tier-1 discharge at let/call-arg/ctor-field/effect-op-arg/match-bind/tuple-destructure narrowings and refined return positions, E505 violation with counterexample, E506 Tier-3 for untranslatable/non-primitive-base predicates, the R3 already-refined-source exemption via predicate-AST equality, the R5 violating-return pin, the refinement-over-`@Nat` `>= 0 && P` conjoin, and the **refined-ADT-sub-pattern arm-fact carry** — a `Some(@PosInt)` bind on `Option` carries the field's source-type predicate into the arm body so a downstream `@Nat` narrowing discharges instead of a false E503, and reaches call **preconditions** in the arm body via the SMT match-translation fact hook so `Some(@PosInt) -> needs_positive(@PosInt.0)` discharges instead of a false E501, while a genuine `Option` narrowing stays obligated (E505/E501), never silently assumed; an **alias-base refined return** (`{ @Age \| @Age.0 >= 18 }`) is assumed by the caller via the predicate binder name; and a **refined return from a match arm** (`Some(@PosInt) -> @PosInt.0`) discharges via a global `arm-matched => source-fact` SMT implication on both the normal and generic-fast paths, a violating `Option` payload still E505), **per-monomorphization generic verification** (#732 — an instantiated `forall` body is verified per concrete instantiation: a generic-body `@Nat` underflow is caught (naming the instantiation, with a counterexample) or discharged when guarded, the never-instantiated residual stays Tier-3 `E520`, collapsed-type-var (`A=B=Int`) De Bruijn reindex soundness, one-diagnostic dedup across multiple instantiations, and discovery of a generic reached only through a `decreases(...)` measure, plus a mixed-`tier3`/`timeout` aggregate-label completeness pin and a recursive-generic-clone source-name pin so `decreases` resolves on the monomorphised instance), **primitive-operation safety obligations** (#680 — division/modulo by-zero `E526` and array-index-bounds `E527`, the in-bounds/out-of-bounds two-check with float-exemption, honest Tier-3 for opaque lengths, the off-by-one and `i >= 0` lower-bound pins, walker recursion through array-literal / assert / interpolated-string / let-destructure positions, the opaque-shadow Tier-3 routing that stops an untranslatable `let`/destructure from silently discharging a stale same-type-outer divisor, De Bruijn-correct `E526`/`E527` fix hints rendered from the actual operands, Tier-1 projection of literal-constructor destructure components (`let Tuple<@Int, @Int> = Tuple(10, 6); _ / @Int.1` discharges `10 != 0` rather than shadowing to Tier-3, while a non-literal source stays a tracked Tier-3 shadow), a tracked placeholder pushed for *every* destructured component so same-type De Bruijn positions don't collapse (`Tuple(10, )` keeps `@Int.0` on the opaque second component, never silently discharging against the literal `10`), `_contains_opaque_shadow` recursion so a *compound* divisor / subtraction operand embedding a shadow (`shadow + 1`) falls to Tier-3 instead of a spurious `E526`/`E502` counterexample, and tracked shadowing of match-arm pattern slots bound over an untranslatable (effect-op) scrutinee so a primitive op in the arm falls to Tier-3 rather than silently discharging against a stale same-name outer (`match Source.next(()) { Some(@Int) -> 1 / @Int.0 }`); plus a **multi-agent shadow/projection audit battery** (`TestShadowAuditDivision680` / `TestShadowAuditSubtraction680` / `TestShadowAuditIndex680` / `TestDestructureDeBruijnAlignment680` / `TestShadowAuditInteractions680`) — 57 differential tests pinning the invariant trichotomy (safe → verified, opaque/shadow → Tier-3, provably-unsafe → loud) across compound shadows, modulo parity, multi-component De Bruijn alignment with value-distinct siblings, intervening-type namespaces, stacked / nested destructures, opaque match scrutinees, nested blocks, and intra-block scoping; **mutation-validated** — every shadow/projection test was confirmed to flip RED when its target machinery (`_is_opaque_shadow` / `_contains_opaque_shadow` / the De Bruijn placeholder push / the `let`-shadow / the match-arm `_fresh_pattern_env` / literal projection) is deliberately broken, so none is green-for-the-wrong-reason) | | `test_soundness_392.py` | 36 | 584 | #392 audit batches 1–2 — verifier soundness/completeness fixes: signed div/mod truncate toward zero (#799), body `assert(P)` carries a Tier-1 obligation (#800), divisions in contract predicates carry a `div_zero` obligation (#801), and the #804 assume-half of #800's `assert` rule — a prior `assert`/`assume` discharges later obligations (including a later call's precondition) + the postcondition at Tier 1, removing false E501/E503/E500/E505. Test-first: each fails on the pre-fix verifier | -| `test_int_overflow.py` | 3 | 83 | #798 — `@Int`/`@Nat` arithmetic-overflow obligations (part of the #392 `smt.py` soundness audit): `+`/`-`/`*` on `@Int`/`@Nat` now emit an `int_overflow` obligation (the analog of `nat_sub`/`div_zero`) rather than modelling the operands as Z3's unbounded integers, so a `ensures(@Int.result > @Int.0)` over `@Int.0 + 1` no longer proves a contract the i64/u64 runtime violates under two's-complement wraparound. Unbounded operands leave the obligation undischarged (Tier-3, runtime-guarded); operand bounds that prove the result stays in range discharge it at Tier 1. Test-first: each fails on the pre-fix verifier (no `int_overflow` obligation emitted) | -| `test_int_overflow_codegen.py` | 44 | 432 | #798 Stage 3 — runtime overflow-trap codegen: the codegen emits a guard at *exactly* the `@Int`/`@Nat` `+`/`-`/`*` sites the verifier obligates, so `vera run`/`vera compile` programs trap on overflow instead of silently wrapping at the i64/u64 boundary. The trap is a bare `unreachable` (Option A — matches the #520 `nat_sub` and #552 nat-bind precedent), classified `kind="unreachable"` today (a precise `"overflow"` trap kind via a host import is a follow-up). Test-first: every `*_traps` test fails on the pre-Stage-3 codegen (the op wraps silently → no trap → `execute` returns a value); every `*_no_trap` test passes before and after (safe arithmetic unchanged) | -| `test_int_overflow_differential.py` | 141 | 336 | #798 Stage 3 verifier↔codegen classification differential (cross-component soundness rule): the codegen overflow guard must fire at exactly the sites the verifier obligates *and* classify each site's operand type (`@Int` i64 vs `@Nat` u64) identically — else a Tier-1-clean program traps spuriously or a wrapping op slips through unguarded. Over a corpus exercising all five operand combos plus the literal-left ambiguity (which the pre-fix codegen mis-classified as `@Nat`), asserts the verifier's per-site gated classification equals the codegen's site for site, both sides driven by the same `ast.span_key` | +| `test_int_overflow.py` | 6 | 137 | #798 — `@Int`/`@Nat` arithmetic-overflow obligations (part of the #392 `smt.py` soundness audit): `+`/`-`/`*` on `@Int`/`@Nat` now emit an `int_overflow` obligation (the analog of `nat_sub`/`div_zero`) rather than modelling the operands as Z3's unbounded integers, so a `ensures(@Int.result > @Int.0)` over `@Int.0 + 1` no longer proves a contract the i64/u64 runtime violates under two's-complement wraparound. Unbounded operands leave the obligation undischarged (Tier-3, runtime-guarded); operand bounds that prove the result stays in range discharge it at Tier 1. Test-first: each fails on the pre-fix verifier (no `int_overflow` obligation emitted) | +| `test_int_overflow_codegen.py` | 46 | 533 | #798 Stage 3 — runtime overflow-trap codegen: the codegen emits a guard at *exactly* the `@Int`/`@Nat` `+`/`-`/`*` sites the verifier obligates, so `vera run`/`vera compile` programs trap on overflow instead of silently wrapping at the i64/u64 boundary. The trap is a bare `unreachable` (Option A — matches the #520 `nat_sub` and #552 nat-bind precedent), classified `kind="unreachable"` today (a precise `"overflow"` trap kind via a host import is a follow-up). Test-first: every `*_traps` test fails on the pre-Stage-3 codegen (the op wraps silently → no trap → `execute` returns a value); every `*_no_trap` test passes before and after (safe arithmetic unchanged) | +| `test_int_overflow_differential.py` | 142 | 347 | #798 Stage 3 verifier↔codegen classification differential (cross-component soundness rule): the codegen overflow guard must fire at exactly the sites the verifier obligates *and* classify each site's operand type (`@Int` i64 vs `@Nat` u64) identically — else a Tier-1-clean program traps spuriously or a wrapping op slips through unguarded. Over a corpus exercising all five operand combos plus the literal-left ambiguity (which the pre-fix codegen mis-classified as `@Nat`), asserts the verifier's per-site gated classification equals the codegen's site for site, both sides driven by the same `ast.span_key` | | `test_monomorphize_differential.py` | 13 | 682 | #732 differential soundness: the verifier's per-monomorphization instantiation discovery covers every instantiation codegen emits (name coverage + per-generic count), over real generic programs (conformance ch02/ch09, `examples/generics.vera`) plus inline cases for the soundness-critical scenarios — collapsed type vars, **prelude combinator emission** (`option_map`), transitive generics, a generic whose type arg is fixed only by a **where-helper's return** (a `Float64`-returning helper, so the unresolved-var `"Bool"` phantom default cannot mask a miss), a generic whose type arg is fixed only by an **imported constructor** (`id2(MkBox(7))` — the verifier's mono-context must include `_module_constructors`, else it phantom-defaults and misses codegen's `id2`), a generic whose type arg is fixed only by an **imported function's return** (`id_g(make_int(...))` — the verifier's mono-context must seed `fn_ret_types` from imported functions, else it phantom-defaults and misses codegen's `id_g`, plus a **private-shadow** case pinning the imported-fn seeding stays unfiltered like codegen since filtering would diverge into a false Tier-1), and a generic reached only through a **contract clause or `where` helper** (codegen must seed Pass 1.5 from the shared node-level walk, not just `decl.body`, or it skips the clone → `CodegenSkip` at run time) — so a missed instantiation (a false Tier-1) is caught. Guards against a vacuous pass when codegen emits nothing, plus a **determinism guard** (`vera compile --wat` is byte-stable across `PYTHONHASHSEED` — the mono worklist sorts its instantiation sets) | | `test_codegen.py` | 1,213 | 21,107 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\, Exn\ handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked), **@Nat narrowing runtime guards** (#552 let site; #747 extends to tuple-destructure / top-level match-bind / ADT sub-pattern / concrete ctor-field / call-arg sites — `i64.lt_s; unreachable` net, @Int targets exempt), **refinement-predicate runtime guards** (#746 — primitive-base (incl. `@Bool` / `@Float64`) refined params/returns traverse a `$vera.contract_fail` predicate guard at the function boundary: violating value traps, valid passes, call-args guarded transitively, generic refined returns guarded on the monomorphised instance; non-primitive `@Array` refinements (`array_length(...) > 0`) guarded too, the alias-aware `@Nat` base conjoins its implicit `>= 0`, and erased `@Unit` refinements emit no guard; **tuple-component boundary guards** decompose a `Tuple` param/return at the FFI boundary — loading each refined / `@Nat` component from the heap value and guarding it recursively for nested tuples, so a violating component traps instead of laundering past the verifier's component assumption; a **generic tuple alias** (`type Box = Tuple`) substitutes its argument so `Box` still guards its component, and a mutually-recursive (infinite) tuple alias **fails closed with E617** rather than silently emitting partial guards; a **refinement OVER a tuple** (`type Pair = { @Tuple \| true }`) is unwrapped so its components are guarded too, recursively through nested `Tuple`) | | `test_codegen_contracts.py` | 32 | 570 | Runtime pre/postconditions, contract fail messages, old/new state postconditions | @@ -102,7 +102,7 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists ## Conformance Suite -The conformance suite is a collection of 92 small, focused programs in `tests/conformance/` that systematically validate every language feature against the spec. Each program is self-contained and imports nothing, with the single exception of `ch07_cross_module_contracts.vera` which depends on `ch07_cross_module_contracts_lib.vera`. Each program tests one feature or a small group of related features. +The conformance suite is a collection of 93 small, focused programs in `tests/conformance/` that systematically validate every language feature against the spec. Each program is self-contained and imports nothing, with the single exception of `ch07_cross_module_contracts.vera` which depends on `ch07_cross_module_contracts_lib.vera`. Each program tests one feature or a small group of related features. Simon Willison [argues](https://simonwillison.net/tags/conformance-suites/) that conformance suites are a "huge unlock" for language projects — they transform development from trust-based to verification-based. The conformance suite serves as the definitive specification artifact that any implementation (or agent) can validate against. @@ -521,7 +521,7 @@ Twenty scripts in `scripts/` validate cross-cutting concerns beyond unit tests ( | Script | What it validates | |--------|-------------------| -| `check_conformance.py` | All 92 conformance programs pass their declared level (parse/check/verify/run) | +| `check_conformance.py` | All 93 conformance programs pass their declared level (parse/check/verify/run) | | `check_examples.py` | All 35 `.vera` examples pass `vera check` + `vera verify` | | `check_examples_readme.py` | Every `vera run` command in examples/README.md references an existing file and exported function | | `check_spec_examples.py` | 164 parseable code blocks from spec chapters: parse, type-check, and verify | @@ -621,7 +621,7 @@ Every push is checked by 28 configured hooks across two stages: 26 are configure | `mypy vera/` | Type-check compiler in strict mode | | `pytest tests/ -q` | Run full test suite | | `fix_allowlists.py --fix` | Auto-fix stale allowlist line numbers | -| `check_conformance.py` | All 92 conformance programs pass their declared level | +| `check_conformance.py` | All 93 conformance programs pass their declared level | | `check_examples.py` | All 35 examples pass `vera check` + `vera verify` | | `check_examples_readme.py` | `vera run` commands in `examples/README.md` reference existing files and exported functions | | `check_readme_examples.py` | README code blocks parse correctly | diff --git a/spec/06-contracts.md b/spec/06-contracts.md index 79ec26de..f885e2dc 100644 --- a/spec/06-contracts.md +++ b/spec/06-contracts.md @@ -244,7 +244,7 @@ The verifier checks the contracts the programmer wrote, and **auto-synthesises** | `@Int` value into a `@Nat` slot | `value >= 0` | E503 | | `a / b`, `a % b` (Int / Nat) | `b != 0` | E526 | | `arr[i]` (`Array`) | `0 <= i < array_length(arr)` | E527 | -| `a + b`, `a - b`, `a * b` (Int / Nat) | result within i64 / u64 range — no overflow | E528 | +| `a + b`, `a * b` (Int / Nat); `a - b` (Int) | result within i64 / u64 range — no overflow | E528 | To discharge an operation obligation, the programmer encodes the constraint in a precondition (`requires(@Int.0 != 0)`), a guarding `if` (whose path condition holds in the relevant branch), or a refinement type (`{ @Int | @Int.0 != 0 }`). A function that performs `@Int.1 / @Int.0` with `requires(true)` therefore no longer verifies cleanly: the unguarded divisor is a compile error (E526). @@ -252,7 +252,7 @@ To discharge an operation obligation, the programmer encodes the constraint in a The `@Nat` obligations (E502 / E503) carry the most nuance, spanning many binding sites. The verifier emits an E502 obligation `lhs >= rhs` at every `@Nat - @Nat` subtraction site (see [#520](https://github.com/aallan/vera/issues/520)), and an E503 obligation `value >= 0` where an `@Int` value narrows into a `@Nat` **binding** slot — `let`, call-argument, effect-operation-argument, constructor-field, top-level match-bind, and literal-tuple-destructure sites (see [#552](https://github.com/aallan/vera/issues/552)), plus the generic-instantiation, ADT sub-pattern, non-literal-destructure, and cross-module imported-constructor sites (see [#747](https://github.com/aallan/vera/issues/747)). (A narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The codegen mirrors the subtraction obligation and the `@Nat` binding sites with runtime guards — every concrete site (`let`, destructure, match-bind, sub-pattern, concrete constructor field, concrete call-argument) plus **generic function-formal calls**, which guard on the monomorphised callee (the mangled instance `pick$Nat` carries concrete `@Nat` flags). Two sites stay unguarded, both still obligated statically, so a Tier-3 narrowing the solver cannot discharge at either surfaces an E504 warning: the **effect-operation argument**, whose runtime guard is deferred (see [#754](https://github.com/aallan/vera/issues/754)), and the **generic-instantiated constructor field**, since constructor layouts carry no per-field `@Nat` metadata to monomorphise. Division, modulo, and array indexing now follow the same auto-synthesis pattern ([#680](https://github.com/aallan/vera/issues/680)); lifting dynamic or closure-captured array bounds from a runtime-guarded Tier 3 to a Tier-1 proof is part of the Tier 2 verification work in [#427](https://github.com/aallan/vera/issues/427). -**Integer overflow** ([#798](https://github.com/aallan/vera/issues/798)). `@Int` is a signed 64-bit machine integer and `@Nat` an unsigned one; `+` / `-` / `*` wrap at the i64 / u64 boundary. Like `@Nat` underflow and signed-division `MIN / -1`, an overflowing operation is a *partial* operation that **traps** at runtime rather than silently wrapping, so each `@Int` / `@Nat` `+` / `-` / `*` carries an obligation that the result stays in range. It is classified at the operands' **common (coerced) type** — `@Int` if either operand is `@Int` (since `@Nat <: @Int`), else `@Nat` — not one operand's self-type (a non-negative literal is `@Nat`, but `5 + @Int.0` is an i64 add) nor the possibly-narrowed result type (an `@Int.0 + 1` stored into a `@Nat` slot is still an i64 add). A two-check mirrors array indexing: the result provably in range → **Tier 1**; provably out of range (e.g. a literal `i64.MAX + 1`) → a compile error (**E528**); otherwise — dynamic operands — a runtime-guarded **Tier 3** trap. `@Nat` subtraction is excluded — it is the underflow obligation (E502) above, never a high-overflow. +**Integer overflow** ([#798](https://github.com/aallan/vera/issues/798)). `@Int` is a signed 64-bit machine integer and `@Nat` an unsigned one; `+` / `-` / `*` wrap at the i64 / u64 boundary. Like `@Nat` underflow and signed-division `MIN / -1`, an overflowing operation is a *partial* operation that **traps** at runtime rather than silently wrapping, so each `@Int` / `@Nat` `+` / `-` / `*` carries an obligation that the result stays in range. It is classified at the operands' **common (coerced) type** — `@Int` if either operand is `@Int` (since `@Nat <: @Int`), else `@Nat` — not one operand's self-type (a non-negative literal is `@Nat`, but `5 + @Int.0` is an i64 add) nor the possibly-narrowed result type (an `@Int.0 + 1` stored into a `@Nat` slot is still an i64 add). A two-check mirrors array indexing: the result provably in range → **Tier 1**; provably out of range (e.g. a literal `u64.MAX + 1`, or `@Int.0 + 1` under `requires(@Int.0 == i64.MAX)`) → a compile error (**E528**); otherwise — dynamic operands — a runtime-guarded **Tier 3** trap. `@Nat` subtraction is excluded — it is the underflow obligation (E502) above, never a high-overflow. Runtime traps for unguarded primitives are Vera-native: each trap carries a kind label (`divide_by_zero`, `out_of_bounds`, etc.), a per-kind Fix paragraph naming the precondition that would have prevented it, and a source backtrace — so a missing static guarantee is still a recoverable signal. diff --git a/spec/11-compilation.md b/spec/11-compilation.md index eef57d5f..44d7eda2 100644 --- a/spec/11-compilation.md +++ b/spec/11-compilation.md @@ -43,7 +43,7 @@ Generic type variables are resolved via monomorphization — each concrete insta `Nat` and `Int` share the same WASM representation (`i64`). The non-negativity invariant of `Nat` is enforced by the contract system (preconditions, postconditions), not by the WASM type. This avoids the overhead of runtime range checks on every arithmetic operation while maintaining the correctness guarantee through verification. -The one operation that can violate the invariant despite well-typed operands is unsigned subtraction (`@Nat - @Nat`). At every such site whose result is statically `@Nat` AND at least one operand has `@Nat` *provenance* (a slot reference, function call returning `@Nat`, or a sub-expression containing one — pure-literal subtractions like `0 - 1` are intentionally exempt because they're commonly consumed at `@Int` positions), the compiler emits a Tier-1 proof obligation `lhs >= rhs`. If Z3 discharges the obligation the codegen emits a plain `i64.sub`. If Z3 cannot, the function drops to Tier 3 and the codegen emits a guarded subtraction: +The one operation that can violate the invariant despite well-typed operands is unsigned subtraction (`@Nat - @Nat`). At every such site whose result is statically `@Nat` AND at least one operand has `@Nat` *provenance* (a slot reference, function call returning `@Nat`, or a sub-expression containing one — pure-literal subtractions like `0 - 1` are intentionally exempt because they're commonly consumed at `@Int` positions), the compiler emits a Tier-1 proof obligation `lhs >= rhs` — discharged at Tier 1 when a precondition or path condition proves it, else dropped to Tier 3. The codegen is type-driven, not tier-driven: it emits the guarded subtraction at *every* such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted): ```wat (if (i64.lt_s lhs rhs) @@ -53,7 +53,7 @@ The one operation that can violate the invariant despite well-typed operands is The trap is classified as `kind="unreachable"` rather than a dedicated `kind="underflow"` because the `unreachable` instruction is the lightest-weight trap mechanism and adding a dedicated kind requires new host-import scaffolding (mirroring `vera.contract_fail`); a precise underflow diagnostic via `vera verify` is the recommended path until the dedicated kind lands. -Arithmetic **overflow** of `@Int`/`@Nat` addition, subtraction, and multiplication (`+`/`-`/`*`) is handled the same way ([#798](https://github.com/aallan/vera/issues/798)). Because `Int`/`Nat` share the i64/u64 representation, these operations can wrap under two's-complement arithmetic, so each such site whose operands have `@Int`/`@Nat` provenance carries a Tier-1 proof obligation `result` stays in range. When Z3 discharges it the codegen emits the plain WASM op; when it cannot, the function drops to Tier 3 and the codegen emits a guarded op that traps on overflow with the same `unreachable` net. The overflow trap is likewise classified `kind="unreachable"` (Option A) rather than a dedicated `kind="overflow"` for the same reason — a precise overflow trap kind needs the same host-import scaffolding and is a follow-up; the obligation itself stays sound because the function traps on overflow before it can return a wrapped value. +Arithmetic **overflow** of `@Int`/`@Nat` addition, subtraction, and multiplication (`+`/`-`/`*`) is handled the same way ([#798](https://github.com/aallan/vera/issues/798)). Because `Int`/`Nat` share the i64/u64 representation, these operations wrap under two's-complement arithmetic, so each such site carries a Tier-1 proof obligation that the `result` stays in range — classified at the operands' **common (coerced) arithmetic type** (`@Int` (i64) if either operand is `@Int`, else `@Nat` (u64); `@Nat` subtraction is excluded, being the underflow obligation above). The verifier discharges it three ways: result provably in range → **Tier 1**; provably out of range → a compile error (**E528**) raised *before* codegen; otherwise — dynamic operands — **Tier 3**. As with `@Nat` subtraction, the codegen is type-driven, not tier-driven: it emits the guarded op — which traps on overflow via the same `unreachable` net — at every such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted). The overflow trap is classified `kind="unreachable"` rather than a dedicated `kind="overflow"` for the same reason as `@Nat` subtraction — a precise kind needs new host-import scaffolding and is tracked as a follow-up ([#808](https://github.com/aallan/vera/issues/808)); the obligation stays sound because the function traps on overflow before it can return a wrapped value. Authors lift Tier-3 functions back to Tier 1 by adding `requires lhs >= rhs` clauses. Subtraction sites that do not produce a `Nat`-typed result (e.g., `@Int - @Int`) are not obligation-checked — they may produce negative values, which is well-defined for `Int`. `@Byte` arithmetic is not currently permitted by the type checker (`Byte` is excluded from `NUMERIC_TYPES` in `vera/types.py`), so the underflow obligation needs no `Byte` extension today; allowing `@Byte` arithmetic with both underflow and overflow guards is tracked speculatively as [#564](https://github.com/aallan/vera/issues/564). The verifier checks the `@Nat >= 0` invariant at subtraction sites (in any position, including a function's return expression) and at binding sites where an `@Int` value narrows into a `@Nat` slot — `let` bindings, call arguments, constructor fields, top-level match binds, and literal-tuple destructures, plus the pure-literal `let @Nat = 0 - 1` case the subtraction obligation defers ([#552](https://github.com/aallan/vera/issues/552)). [#747](https://github.com/aallan/vera/issues/747) extended the narrowing obligation to the projection and instantiation sites — ADT sub-pattern binds (`match opt { Some(@Nat) -> }` on `Option`), non-literal tuple destructures, generic constructor / effect-operation / function formals instantiated to `@Nat`, and imported ADT constructors — so every narrowing **binding site** is now statically obligated. (A bare `@Int`→`@Nat` narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The Tier-3 runtime guard backs every binding site except two — the effect-operation argument, and the generic-instantiated constructor field (which erases to i64 with no per-field guard) — and a tripped guard reports a generic trap rather than the `requires(... >= 0)` fix. The effect-op guard and the dedicated trap kind are tracked as [#754](https://github.com/aallan/vera/issues/754), the generic constructor field as [#757](https://github.com/aallan/vera/issues/757) (see §11.17). diff --git a/tests/test_int_overflow.py b/tests/test_int_overflow.py index acf54ff6..acdb5715 100644 --- a/tests/test_int_overflow.py +++ b/tests/test_int_overflow.py @@ -52,8 +52,11 @@ def test_int_add_unbounded_emits_undischarged_overflow_obligation(self) -> None: { @Int.0 + 1 } """) ovf = [o for o in result.obligations if o.kind == "int_overflow"] - assert ovf, [(o.kind, o.status) for o in result.obligations] - assert all(o.status == "tier3" for o in ovf), [(o.kind, o.status) for o in ovf] + # Exactly one arithmetic site → exactly one int_overflow obligation; a + # walker that double-recorded the site would slip past an "at least one" + # check, so pin the exact count (CR #809). + assert len(ovf) == 1, [(o.kind, o.status) for o in result.obligations] + assert ovf[0].status == "tier3", [(o.kind, o.status) for o in ovf] def test_nat_add_unbounded_emits_undischarged_overflow_obligation(self) -> None: # Same shape at the u64 ceiling for @Nat. @@ -63,8 +66,8 @@ def test_nat_add_unbounded_emits_undischarged_overflow_obligation(self) -> None: { @Nat.0 + 1 } """) ovf = [o for o in result.obligations if o.kind == "int_overflow"] - assert ovf, [(o.kind, o.status) for o in result.obligations] - assert all(o.status == "tier3" for o in ovf), [(o.kind, o.status) for o in ovf] + assert len(ovf) == 1, [(o.kind, o.status) for o in result.obligations] + assert ovf[0].status == "tier3", [(o.kind, o.status) for o in ovf] def test_overflow_obligation_discharged_when_operands_bounded(self) -> None: # Bounded operands → result provably in i64 range → the overflow @@ -78,6 +81,57 @@ def test_overflow_obligation_discharged_when_operands_bounded(self) -> None: { @Int.0 + @Int.1 } """) ovf = [o for o in result.obligations if o.kind == "int_overflow"] - assert ovf and all(o.status == "verified" for o in ovf), [ + assert len(ovf) == 1, [(o.kind, o.status) for o in result.obligations] + assert ovf[0].status == "verified", [(o.kind, o.status) for o in ovf] + + def test_overflow_obligation_violated_when_provably_out_of_range(self) -> None: + # The loud-error third arm of the two-check, untested until now: when the + # operands provably force the result out of range, the int_overflow + # obligation is 'violated' and a compile error (E528) is raised *before* + # codegen — the analog of index_bounds' E527 and nat_sub's E502. A + # regression that silently downgraded this arm to Tier 3 would re-open the + # #798-class soundness hole AND pass the discharge/tier3 tests above, so + # it is pinned explicitly here. @Int at the i64 ceiling. + result = _verify(""" +public fn over(@Int -> @Int) + requires(@Int.0 >= 9223372036854775807) ensures(true) effects(pure) +{ @Int.0 + 1 } +""") + ovf = [o for o in result.obligations if o.kind == "int_overflow"] + assert len(ovf) == 1, [(o.kind, o.status) for o in result.obligations] + assert ovf[0].status == "violated", ovf[0].status + assert ovf[0].error_code == "E528", ovf[0].error_code + assert any(d.error_code == "E528" for d in result.diagnostics), [ + d.error_code for d in result.diagnostics + ] + + def test_nat_overflow_obligation_violated_at_u64_ceiling(self) -> None: + # The @Nat (u64) ceiling variant of the E528 path. + result = _verify(""" +public fn over(@Nat -> @Nat) + requires(@Nat.0 >= 18446744073709551615) ensures(true) effects(pure) +{ @Nat.0 + 1 } +""") + ovf = [o for o in result.obligations if o.kind == "int_overflow"] + assert len(ovf) == 1, [(o.kind, o.status) for o in result.obligations] + assert ovf[0].status == "violated", ovf[0].status + assert ovf[0].error_code == "E528", ovf[0].error_code + + def test_nat_subtraction_excluded_from_overflow_obligation(self) -> None: + # The SUB+Nat exclusion checked at the REAL verifier gate (not the + # differential's re-derived helper, which can't catch an exclusion + # desync): a @Nat - @Nat site is the nat_sub underflow obligation (E502), + # never an int_overflow. Pins the exclusion against a regression that + # started high-overflow-guarding @Nat subtraction. + result = _verify(""" +public fn sub(@Nat, @Nat -> @Nat) + requires(@Nat.1 >= @Nat.0) ensures(true) effects(pure) +{ @Nat.1 - @Nat.0 } +""") + kinds = [o.kind for o in result.obligations] + assert "int_overflow" not in kinds, [ + (o.kind, o.status) for o in result.obligations + ] + assert "nat_sub" in kinds, [ (o.kind, o.status) for o in result.obligations ] diff --git a/tests/test_int_overflow_codegen.py b/tests/test_int_overflow_codegen.py index 5ce859fb..623f2476 100644 --- a/tests/test_int_overflow_codegen.py +++ b/tests/test_int_overflow_codegen.py @@ -34,6 +34,7 @@ from vera.codegen import compile as codegen_compile from vera.codegen import execute from vera.parser import parse_to_ast +from vera.resolver import ModuleResolver I64_MAX = 9223372036854775807 I64_MIN = -9223372036854775808 @@ -55,11 +56,21 @@ def _compile_with_types(source: str): path = f.name try: ast = parse_to_ast(source) - diags, arts = typecheck_with_artifacts(ast, source, file=path) + # Mirror cmd_run / cmd_compile exactly: resolve imports, then thread the + # resolved modules through BOTH typecheck and codegen so the overflow + # classifier sees the same cross-module context production does (CR #809). + # Single-file fixtures resolve to an empty list (a no-op), but a fixture + # that adds an import is then exercised through the real import path. + resolver = ModuleResolver(_root=Path(path).parent) + resolved = resolver.resolve_imports(ast, Path(path)) + diags, arts = typecheck_with_artifacts( + ast, source, file=path, resolved_modules=resolved, + ) errors = [d for d in diags if d.severity == "error"] assert not errors, f"typecheck errors: {[d.description for d in errors]}" result = codegen_compile( ast, source=source, file=path, + resolved_modules=resolved, expr_semantic_types=arts.expr_semantic_types, ) errs = [d for d in result.diagnostics if d.severity == "error"] @@ -69,6 +80,44 @@ def _compile_with_types(source: str): Path(path).unlink(missing_ok=True) +def _compile_multifile(files: dict[str, str], main: str): + """Compile a multi-file program (entry + imported modules) the way ``cmd_run`` + does, so the overflow guard is exercised inside an *imported* body (CR #809). + + ``files`` maps filename -> source; ``main`` is the entry filename. The files + are written to one temp dir so ``ModuleResolver`` resolves the imports by + module name, then ``resolved_modules`` is threaded through both typecheck and + codegen — without it the import would not resolve and the imported function + would never be compiled (let alone guarded). + """ + import shutil + + d = tempfile.mkdtemp() + try: + for name, src in files.items(): + (Path(d) / name).write_text(src, encoding="utf-8") + main_path = str(Path(d) / main) + source = files[main] + ast = parse_to_ast(source) + resolver = ModuleResolver(_root=Path(main_path).parent) + resolved = resolver.resolve_imports(ast, Path(main_path)) + diags, arts = typecheck_with_artifacts( + ast, source, file=main_path, resolved_modules=resolved, + ) + errors = [x for x in diags if x.severity == "error"] + assert not errors, f"typecheck errors: {[x.description for x in errors]}" + result = codegen_compile( + ast, source=source, file=main_path, + resolved_modules=resolved, + expr_semantic_types=arts.expr_semantic_types, + ) + errs = [x for x in result.diagnostics if x.severity == "error"] + assert not errs, f"codegen errors: {[x.description for x in errs]}" + return result + finally: + shutil.rmtree(d, ignore_errors=True) + + _MASK64 = (1 << 64) - 1 @@ -430,3 +479,55 @@ def test_nat_add_matches_python_exact(self) -> None: if trapped != overflows: mismatches.append((a, b, trapped, overflows)) assert not mismatches, mismatches[:10] + + +class TestCrossModuleOverflow798: + """The overflow guard must fire inside an *imported* function body, not just + the entry file (CR #809). + + The single-file fixtures above never cross an import boundary, so they cannot + catch a regression in the cross-module compile path (``resolved_modules`` not + threaded → the imported body is never compiled/guarded). Here an imported + ``lib_inc`` overflows i64; calling it must trap, proving the guard is emitted + in the imported body the same way ``cmd_run`` / ``cmd_compile`` produce it. + """ + + _LIB = """ +module ovf_lib; + +public fn lib_inc(@Int -> @Int) + requires(true) ensures(true) effects(pure) +{ @Int.0 + 1 } +""" + _MAIN = """ +import ovf_lib; + +public fn run_it(@Unit -> @Int) + requires(true) ensures(true) effects(pure) +{ lib_inc(9223372036854775807) } +""" + + def test_imported_int_add_overflow_traps(self) -> None: + result = _compile_multifile( + {"ovf_lib.vera": self._LIB, "main.vera": self._MAIN}, "main.vera", + ) + with pytest.raises( + (wasmtime.WasmtimeError, wasmtime.Trap, RuntimeError) + ): + execute(result, fn_name="run_it", args=[]) + + def test_imported_bounded_add_does_not_trap(self) -> None: + # Companion no-trap case: a safe argument returns cleanly, so the trap + # above is the overflow guard firing — not the import path being broken. + main = """ +import ovf_lib; + +public fn run_it(@Unit -> @Int) + requires(true) ensures(true) effects(pure) +{ lib_inc(41) } +""" + result = _compile_multifile( + {"ovf_lib.vera": self._LIB, "main.vera": main}, "main.vera", + ) + exec_result = execute(result, fn_name="run_it", args=[]) + assert exec_result.value == 42 diff --git a/tests/test_int_overflow_differential.py b/tests/test_int_overflow_differential.py index cea6debf..a90c6d50 100644 --- a/tests/test_int_overflow_differential.py +++ b/tests/test_int_overflow_differential.py @@ -194,6 +194,17 @@ def _assert_in_lockstep(source: str) -> None: public fn f(@Nat, @Nat -> @Nat) requires(true) ensures(true) effects(pure) { @Nat.1 * @Nat.0 + 3 } +""", + # RISK 7: an @Int add narrowed into a @Nat return slot. The result resolves + # to @Nat, but the addition runs at i64 — both sides must classify the + # operands' common type (@Int), not the narrowed result type. This drives + # the narrowed shape through the parametrised codegen<->verifier lockstep, + # complementing test_narrowed_result_classified_* which pins only the + # verifier side (CR #809). + "int_narrowed_into_nat": """ +public fn f(@Int -> @Nat) + requires(@Int.0 >= 0) ensures(true) effects(pure) +{ @Int.0 + 1 } """, # Float arithmetic must be classified None on both sides (not guarded). "float_not_guarded": """ From 6412a2c00d16d21d4d084082d66504695c7ec275 Mon Sep 17 00:00:00 2001 From: T Date: Sat, 27 Jun 2026 08:17:14 +0100 Subject: [PATCH 3/5] =?UTF-8?q?fix(798):=20address=20review=20round=203=20?= =?UTF-8?q?=E2=80=94=20spec=20underflow=20scope,=20nat=5Fsub=20+=20cross-m?= =?UTF-8?q?odule=20test=20coverage?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Round 3: 4 CodeRabbit findings on PR #809 (all tests/docs; no core-logic change). - spec/11 section 11.2.1: scope the "Int - Int not obligation-checked" sentence to the *underflow* obligation and cross-reference the overflow obligation above — the flat wording contradicted the new overflow paragraph. - tests/test_int_overflow.py: assert the user-facing E528 diagnostic on the Nat ceiling test too (it only checked the obligation), mirroring the Int test. - tests/test_int_overflow_codegen.py: add TestNatSubUnderflow798 pinning that Nat subtraction stays the #520 nat_sub underflow guard, not #798 overflow arithmetic. CR's suggested "0 - 1" example is exempt (pure literals don't trap); the tests use slot provenance (3 - 5 traps, 5 - 3 = 2). - tests/test_int_overflow_differential.py: cover cross-module arithmetic by sweeping imported *_lib.vera modules STANDALONE. CR's suggested fix (resolve imports in the consumer + include imported sites) is a false positive: the verifier obligates per-module, so verifier-of-consumer (None) vs codegen-of-imported-body (Int) is a category mismatch, not a desync. Swept standalone, the lib's sites are in lockstep. Doc counts refreshed (5151 -> 5156). Refs #798. Co-Authored-By: Claude --- README.md | 2 +- ROADMAP.md | 2 +- TESTING.md | 8 +++--- spec/11-compilation.md | 2 +- tests/test_int_overflow.py | 6 +++++ tests/test_int_overflow_codegen.py | 34 +++++++++++++++++++++++++ tests/test_int_overflow_differential.py | 21 +++++++++++++-- 7 files changed, 66 insertions(+), 9 deletions(-) diff --git a/README.md b/README.md index d97916e8..cc05be26 100644 --- a/README.md +++ b/README.md @@ -215,7 +215,7 @@ cp /path/to/vera/SKILL.md ~/.claude/skills/vera-language/SKILL.md ## Project status -Vera is in **active development** at v0.0.181 — 1,400+ commits, 181 releases, 5,151 tests, 91% code coverage, 93 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. +Vera is in **active development** at v0.0.181 — 1,400+ commits, 181 releases, 5,156 tests, 91% code coverage, 93 conformance programs, 35 examples, and a 13-chapter specification. See **[HISTORY.md](HISTORY.md)** for how the compiler was built. The reference compiler — parser, AST, type checker, contract verifier (Z3), WASM code generator, module system, browser runtime, and runtime contract insertion — is working. The language specification is in draft across [13 chapters](spec/). diff --git a/ROADMAP.md b/ROADMAP.md index 023c750e..9a86edd5 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,7 +8,7 @@ Priority lives in this file and nowhere else — issues carry kind and area labe ## Where we are -5,151 tests, 93 conformance programs, 35 examples, 13 spec chapters. +5,156 tests, 93 conformance programs, 35 examples, 13 spec chapters. ## The roadmap diff --git a/TESTING.md b/TESTING.md index 5ee3bd7f..551e5b8c 100644 --- a/TESTING.md +++ b/TESTING.md @@ -6,7 +6,7 @@ This is the single source of truth for Vera's testing infrastructure, coverage d | Metric | Value | |--------|-------| -| **Tests** | 5,151 across 44 files (~63,000 lines of test code; 5,119 passed + 16 stress, 16 skipped) | +| **Tests** | 5,156 across 44 files (~63,000 lines of test code; 5,124 passed + 16 stress, 16 skipped) | | **Compiler code coverage** | 95% Python, 61% JavaScript — 91% combined (CI minimum: 80%) | | **Conformance programs** | 93 programs across 9 spec chapters, validating every language feature | | **Example programs** | 35, all validated through `vera check` + `vera verify` | @@ -61,9 +61,9 @@ python scripts/fix_allowlists.py --fix # auto-fix stale allowlists | `test_obligations.py` | 284 | 1083 | Reified proof obligations + warm `VerificationSession` (#222 Phase A): full-corpus differential oracle (warm session == cold `verify()` on diagnostics, summary, and obligation stream, plus warm-twice determinism, across all 35 examples and every verify/run-level conformance program), summary↔obligation tier-bookkeeping consistency, per-kind unit tests (requires / ensures / decreases / nat_sub / call_pre statuses, counterexamples, error codes), content-key stability + same-text-two-sites span disambiguation, session solver reuse, type-error short-circuit, ADT-registry resync between programs; plus the Phase B incremental suite — identical-source full replay, callee-body-edit replays callers while callee-contract-edit invalidates them, span-shift and ADT-edit conservative invalidation, cross-program isolation, timeout-status never cached (monkeypatched solver), FIFO eviction bound; plus the #727 dedup pin — a violating call in a let RHS records exactly one E501 diagnostic and one call_pre obligation | | `test_verifier.py` | 473 | 9,270 | Z3 verification, counterexamples, tier classification, call-site preconditions, branch-aware preconditions, pipe operator, cross-module contracts, match/ADT verification, decreases verification, mutual recursion, refined Bool/String/Float64 param sorts, **@Nat subtraction underflow obligation** (#520 — Path-A obligation discharge via requires/path-conditions/path-aware Z3 refutation, pure-literal exclusion, Int-Int and Nat-Int → Int exemptions), **@Nat binding-site narrowing obligation** (#552 — Tier-1 `value >= 0` at let/call-arg/effect-op-arg/ctor-field/match-bind/literal-destructure narrowing, #520 double-emit disjointness, E503 counterexample-witness pin; #747 — generic-instantiation, ADT sub-pattern, non-literal tuple-destructure, and cross-module imported-ctor narrowing, with concrete-site Tier-3 narrowings (incl. the if-expr-source SMT gap, recorded per-component) classified as codegen-guarded `tier3_runtime`, generic function-formal call-args guarded on the monomorphised callee (keyed on `call_target`, not the erased generic decl), and the two genuinely-unguarded residuals — effect-op-arg narrowings (no codegen guard yet, #754) and generic-instantiated constructor-field narrowings (constructors carry no per-field `@Nat` mono metadata) — surfacing the `E504` unguarded warning; #749 — IndexExpr/InterpolatedString walker-recursion pins, `_fresh_slot_var` nat-alias unit test, `_narrows_into_nat` verifier/codegen soundness parity), **refinement-predicate verification** (#746 — Tier-1 discharge at let/call-arg/ctor-field/effect-op-arg/match-bind/tuple-destructure narrowings and refined return positions, E505 violation with counterexample, E506 Tier-3 for untranslatable/non-primitive-base predicates, the R3 already-refined-source exemption via predicate-AST equality, the R5 violating-return pin, the refinement-over-`@Nat` `>= 0 && P` conjoin, and the **refined-ADT-sub-pattern arm-fact carry** — a `Some(@PosInt)` bind on `Option` carries the field's source-type predicate into the arm body so a downstream `@Nat` narrowing discharges instead of a false E503, and reaches call **preconditions** in the arm body via the SMT match-translation fact hook so `Some(@PosInt) -> needs_positive(@PosInt.0)` discharges instead of a false E501, while a genuine `Option` narrowing stays obligated (E505/E501), never silently assumed; an **alias-base refined return** (`{ @Age \| @Age.0 >= 18 }`) is assumed by the caller via the predicate binder name; and a **refined return from a match arm** (`Some(@PosInt) -> @PosInt.0`) discharges via a global `arm-matched => source-fact` SMT implication on both the normal and generic-fast paths, a violating `Option` payload still E505), **per-monomorphization generic verification** (#732 — an instantiated `forall` body is verified per concrete instantiation: a generic-body `@Nat` underflow is caught (naming the instantiation, with a counterexample) or discharged when guarded, the never-instantiated residual stays Tier-3 `E520`, collapsed-type-var (`A=B=Int`) De Bruijn reindex soundness, one-diagnostic dedup across multiple instantiations, and discovery of a generic reached only through a `decreases(...)` measure, plus a mixed-`tier3`/`timeout` aggregate-label completeness pin and a recursive-generic-clone source-name pin so `decreases` resolves on the monomorphised instance), **primitive-operation safety obligations** (#680 — division/modulo by-zero `E526` and array-index-bounds `E527`, the in-bounds/out-of-bounds two-check with float-exemption, honest Tier-3 for opaque lengths, the off-by-one and `i >= 0` lower-bound pins, walker recursion through array-literal / assert / interpolated-string / let-destructure positions, the opaque-shadow Tier-3 routing that stops an untranslatable `let`/destructure from silently discharging a stale same-type-outer divisor, De Bruijn-correct `E526`/`E527` fix hints rendered from the actual operands, Tier-1 projection of literal-constructor destructure components (`let Tuple<@Int, @Int> = Tuple(10, 6); _ / @Int.1` discharges `10 != 0` rather than shadowing to Tier-3, while a non-literal source stays a tracked Tier-3 shadow), a tracked placeholder pushed for *every* destructured component so same-type De Bruijn positions don't collapse (`Tuple(10, )` keeps `@Int.0` on the opaque second component, never silently discharging against the literal `10`), `_contains_opaque_shadow` recursion so a *compound* divisor / subtraction operand embedding a shadow (`shadow + 1`) falls to Tier-3 instead of a spurious `E526`/`E502` counterexample, and tracked shadowing of match-arm pattern slots bound over an untranslatable (effect-op) scrutinee so a primitive op in the arm falls to Tier-3 rather than silently discharging against a stale same-name outer (`match Source.next(()) { Some(@Int) -> 1 / @Int.0 }`); plus a **multi-agent shadow/projection audit battery** (`TestShadowAuditDivision680` / `TestShadowAuditSubtraction680` / `TestShadowAuditIndex680` / `TestDestructureDeBruijnAlignment680` / `TestShadowAuditInteractions680`) — 57 differential tests pinning the invariant trichotomy (safe → verified, opaque/shadow → Tier-3, provably-unsafe → loud) across compound shadows, modulo parity, multi-component De Bruijn alignment with value-distinct siblings, intervening-type namespaces, stacked / nested destructures, opaque match scrutinees, nested blocks, and intra-block scoping; **mutation-validated** — every shadow/projection test was confirmed to flip RED when its target machinery (`_is_opaque_shadow` / `_contains_opaque_shadow` / the De Bruijn placeholder push / the `let`-shadow / the match-arm `_fresh_pattern_env` / literal projection) is deliberately broken, so none is green-for-the-wrong-reason) | | `test_soundness_392.py` | 36 | 584 | #392 audit batches 1–2 — verifier soundness/completeness fixes: signed div/mod truncate toward zero (#799), body `assert(P)` carries a Tier-1 obligation (#800), divisions in contract predicates carry a `div_zero` obligation (#801), and the #804 assume-half of #800's `assert` rule — a prior `assert`/`assume` discharges later obligations (including a later call's precondition) + the postcondition at Tier 1, removing false E501/E503/E500/E505. Test-first: each fails on the pre-fix verifier | -| `test_int_overflow.py` | 6 | 137 | #798 — `@Int`/`@Nat` arithmetic-overflow obligations (part of the #392 `smt.py` soundness audit): `+`/`-`/`*` on `@Int`/`@Nat` now emit an `int_overflow` obligation (the analog of `nat_sub`/`div_zero`) rather than modelling the operands as Z3's unbounded integers, so a `ensures(@Int.result > @Int.0)` over `@Int.0 + 1` no longer proves a contract the i64/u64 runtime violates under two's-complement wraparound. Unbounded operands leave the obligation undischarged (Tier-3, runtime-guarded); operand bounds that prove the result stays in range discharge it at Tier 1. Test-first: each fails on the pre-fix verifier (no `int_overflow` obligation emitted) | -| `test_int_overflow_codegen.py` | 46 | 533 | #798 Stage 3 — runtime overflow-trap codegen: the codegen emits a guard at *exactly* the `@Int`/`@Nat` `+`/`-`/`*` sites the verifier obligates, so `vera run`/`vera compile` programs trap on overflow instead of silently wrapping at the i64/u64 boundary. The trap is a bare `unreachable` (Option A — matches the #520 `nat_sub` and #552 nat-bind precedent), classified `kind="unreachable"` today (a precise `"overflow"` trap kind via a host import is a follow-up). Test-first: every `*_traps` test fails on the pre-Stage-3 codegen (the op wraps silently → no trap → `execute` returns a value); every `*_no_trap` test passes before and after (safe arithmetic unchanged) | -| `test_int_overflow_differential.py` | 142 | 347 | #798 Stage 3 verifier↔codegen classification differential (cross-component soundness rule): the codegen overflow guard must fire at exactly the sites the verifier obligates *and* classify each site's operand type (`@Int` i64 vs `@Nat` u64) identically — else a Tier-1-clean program traps spuriously or a wrapping op slips through unguarded. Over a corpus exercising all five operand combos plus the literal-left ambiguity (which the pre-fix codegen mis-classified as `@Nat`), asserts the verifier's per-site gated classification equals the codegen's site for site, both sides driven by the same `ast.span_key` | +| `test_int_overflow.py` | 6 | 143 | #798 — `@Int`/`@Nat` arithmetic-overflow obligations (part of the #392 `smt.py` soundness audit): `+`/`-`/`*` on `@Int`/`@Nat` now emit an `int_overflow` obligation (the analog of `nat_sub`/`div_zero`) rather than modelling the operands as Z3's unbounded integers, so a `ensures(@Int.result > @Int.0)` over `@Int.0 + 1` no longer proves a contract the i64/u64 runtime violates under two's-complement wraparound. Unbounded operands leave the obligation undischarged (Tier-3, runtime-guarded); operand bounds that prove the result stays in range discharge it at Tier 1. Test-first: each fails on the pre-fix verifier (no `int_overflow` obligation emitted) | +| `test_int_overflow_codegen.py` | 50 | 567 | #798 Stage 3 — runtime overflow-trap codegen: the codegen emits a guard at *exactly* the `@Int`/`@Nat` `+`/`-`/`*` sites the verifier obligates, so `vera run`/`vera compile` programs trap on overflow instead of silently wrapping at the i64/u64 boundary. The trap is a bare `unreachable` (Option A — matches the #520 `nat_sub` and #552 nat-bind precedent), classified `kind="unreachable"` today (a precise `"overflow"` trap kind via a host import is a follow-up). Test-first: every `*_traps` test fails on the pre-Stage-3 codegen (the op wraps silently → no trap → `execute` returns a value); every `*_no_trap` test passes before and after (safe arithmetic unchanged) | +| `test_int_overflow_differential.py` | 143 | 364 | #798 Stage 3 verifier↔codegen classification differential (cross-component soundness rule): the codegen overflow guard must fire at exactly the sites the verifier obligates *and* classify each site's operand type (`@Int` i64 vs `@Nat` u64) identically — else a Tier-1-clean program traps spuriously or a wrapping op slips through unguarded. Over a corpus exercising all five operand combos plus the literal-left ambiguity (which the pre-fix codegen mis-classified as `@Nat`), asserts the verifier's per-site gated classification equals the codegen's site for site, both sides driven by the same `ast.span_key` | | `test_monomorphize_differential.py` | 13 | 682 | #732 differential soundness: the verifier's per-monomorphization instantiation discovery covers every instantiation codegen emits (name coverage + per-generic count), over real generic programs (conformance ch02/ch09, `examples/generics.vera`) plus inline cases for the soundness-critical scenarios — collapsed type vars, **prelude combinator emission** (`option_map`), transitive generics, a generic whose type arg is fixed only by a **where-helper's return** (a `Float64`-returning helper, so the unresolved-var `"Bool"` phantom default cannot mask a miss), a generic whose type arg is fixed only by an **imported constructor** (`id2(MkBox(7))` — the verifier's mono-context must include `_module_constructors`, else it phantom-defaults and misses codegen's `id2`), a generic whose type arg is fixed only by an **imported function's return** (`id_g(make_int(...))` — the verifier's mono-context must seed `fn_ret_types` from imported functions, else it phantom-defaults and misses codegen's `id_g`, plus a **private-shadow** case pinning the imported-fn seeding stays unfiltered like codegen since filtering would diverge into a false Tier-1), and a generic reached only through a **contract clause or `where` helper** (codegen must seed Pass 1.5 from the shared node-level walk, not just `decl.body`, or it skips the clone → `CodegenSkip` at run time) — so a missed instantiation (a false Tier-1) is caught. Guards against a vacuous pass when codegen emits nothing, plus a **determinism guard** (`vera compile --wat` is byte-stable across `PYTHONHASHSEED` — the mono worklist sorts its instantiation sets) | | `test_codegen.py` | 1,213 | 21,107 | WASM compilation, arithmetic, Float64, Byte, arrays (incl. compound element types), ADTs, match (incl. nested patterns), generics, State\, Exn\ handlers, control flow, strings, string escape sequences, IO (read\_line, read\_file, write\_file, args, exit, get\_env, sleep, time, stderr), bounds checking, quantifiers, assert/assume, refinement type aliases, pipe operator, string built-ins, built-in shadowing, parse\_nat Result, GC, Markdown host bindings, Regex host bindings, Map collection, Set collection, Decimal type, Json type, Html type, Http effect, Inference effect, Random effect, example round-trips, GC shadow stack overflow, **WASM tail-call optimization** (#517 — `return_call` emission for tail-position calls, 50K- and 1M-iteration stress, structural assertions on `return_call`/plain `call` boundary, **GC-aware TCO for allocating fns (#549 — `$gc_sp` restore before each `return_call`)**, postcondition-fallback regression (still reverts to plain `call`), analyzer unit tests covering Block-trailing / IfExpr-both-branches / MatchExpr-arm-bodies / let-value-NOT-marked / call-args-NOT-marked / ExprStmt-statement-NOT-marked / IfExpr-condition-NOT-marked / MatchExpr-scrutinee-NOT-marked), **@Nat narrowing runtime guards** (#552 let site; #747 extends to tuple-destructure / top-level match-bind / ADT sub-pattern / concrete ctor-field / call-arg sites — `i64.lt_s; unreachable` net, @Int targets exempt), **refinement-predicate runtime guards** (#746 — primitive-base (incl. `@Bool` / `@Float64`) refined params/returns traverse a `$vera.contract_fail` predicate guard at the function boundary: violating value traps, valid passes, call-args guarded transitively, generic refined returns guarded on the monomorphised instance; non-primitive `@Array` refinements (`array_length(...) > 0`) guarded too, the alias-aware `@Nat` base conjoins its implicit `>= 0`, and erased `@Unit` refinements emit no guard; **tuple-component boundary guards** decompose a `Tuple` param/return at the FFI boundary — loading each refined / `@Nat` component from the heap value and guarding it recursively for nested tuples, so a violating component traps instead of laundering past the verifier's component assumption; a **generic tuple alias** (`type Box = Tuple`) substitutes its argument so `Box` still guards its component, and a mutually-recursive (infinite) tuple alias **fails closed with E617** rather than silently emitting partial guards; a **refinement OVER a tuple** (`type Pair = { @Tuple \| true }`) is unwrapped so its components are guarded too, recursively through nested `Tuple`) | | `test_codegen_contracts.py` | 32 | 570 | Runtime pre/postconditions, contract fail messages, old/new state postconditions | diff --git a/spec/11-compilation.md b/spec/11-compilation.md index 44d7eda2..4f67430e 100644 --- a/spec/11-compilation.md +++ b/spec/11-compilation.md @@ -55,7 +55,7 @@ The trap is classified as `kind="unreachable"` rather than a dedicated `kind="un Arithmetic **overflow** of `@Int`/`@Nat` addition, subtraction, and multiplication (`+`/`-`/`*`) is handled the same way ([#798](https://github.com/aallan/vera/issues/798)). Because `Int`/`Nat` share the i64/u64 representation, these operations wrap under two's-complement arithmetic, so each such site carries a Tier-1 proof obligation that the `result` stays in range — classified at the operands' **common (coerced) arithmetic type** (`@Int` (i64) if either operand is `@Int`, else `@Nat` (u64); `@Nat` subtraction is excluded, being the underflow obligation above). The verifier discharges it three ways: result provably in range → **Tier 1**; provably out of range → a compile error (**E528**) raised *before* codegen; otherwise — dynamic operands — **Tier 3**. As with `@Nat` subtraction, the codegen is type-driven, not tier-driven: it emits the guarded op — which traps on overflow via the same `unreachable` net — at every such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted). The overflow trap is classified `kind="unreachable"` rather than a dedicated `kind="overflow"` for the same reason as `@Nat` subtraction — a precise kind needs new host-import scaffolding and is tracked as a follow-up ([#808](https://github.com/aallan/vera/issues/808)); the obligation stays sound because the function traps on overflow before it can return a wrapped value. -Authors lift Tier-3 functions back to Tier 1 by adding `requires lhs >= rhs` clauses. Subtraction sites that do not produce a `Nat`-typed result (e.g., `@Int - @Int`) are not obligation-checked — they may produce negative values, which is well-defined for `Int`. `@Byte` arithmetic is not currently permitted by the type checker (`Byte` is excluded from `NUMERIC_TYPES` in `vera/types.py`), so the underflow obligation needs no `Byte` extension today; allowing `@Byte` arithmetic with both underflow and overflow guards is tracked speculatively as [#564](https://github.com/aallan/vera/issues/564). The verifier checks the `@Nat >= 0` invariant at subtraction sites (in any position, including a function's return expression) and at binding sites where an `@Int` value narrows into a `@Nat` slot — `let` bindings, call arguments, constructor fields, top-level match binds, and literal-tuple destructures, plus the pure-literal `let @Nat = 0 - 1` case the subtraction obligation defers ([#552](https://github.com/aallan/vera/issues/552)). [#747](https://github.com/aallan/vera/issues/747) extended the narrowing obligation to the projection and instantiation sites — ADT sub-pattern binds (`match opt { Some(@Nat) -> }` on `Option`), non-literal tuple destructures, generic constructor / effect-operation / function formals instantiated to `@Nat`, and imported ADT constructors — so every narrowing **binding site** is now statically obligated. (A bare `@Int`→`@Nat` narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The Tier-3 runtime guard backs every binding site except two — the effect-operation argument, and the generic-instantiated constructor field (which erases to i64 with no per-field guard) — and a tripped guard reports a generic trap rather than the `requires(... >= 0)` fix. The effect-op guard and the dedicated trap kind are tracked as [#754](https://github.com/aallan/vera/issues/754), the generic constructor field as [#757](https://github.com/aallan/vera/issues/757) (see §11.17). +Authors lift Tier-3 functions back to Tier 1 by adding `requires lhs >= rhs` clauses. Subtraction sites that do not produce a `Nat`-typed result (e.g., `@Int - @Int`) carry no *underflow* obligation — they may produce negative values, which is well-defined for `Int` (they remain subject to the overflow obligation described above). `@Byte` arithmetic is not currently permitted by the type checker (`Byte` is excluded from `NUMERIC_TYPES` in `vera/types.py`), so the underflow obligation needs no `Byte` extension today; allowing `@Byte` arithmetic with both underflow and overflow guards is tracked speculatively as [#564](https://github.com/aallan/vera/issues/564). The verifier checks the `@Nat >= 0` invariant at subtraction sites (in any position, including a function's return expression) and at binding sites where an `@Int` value narrows into a `@Nat` slot — `let` bindings, call arguments, constructor fields, top-level match binds, and literal-tuple destructures, plus the pure-literal `let @Nat = 0 - 1` case the subtraction obligation defers ([#552](https://github.com/aallan/vera/issues/552)). [#747](https://github.com/aallan/vera/issues/747) extended the narrowing obligation to the projection and instantiation sites — ADT sub-pattern binds (`match opt { Some(@Nat) -> }` on `Option`), non-literal tuple destructures, generic constructor / effect-operation / function formals instantiated to `@Nat`, and imported ADT constructors — so every narrowing **binding site** is now statically obligated. (A bare `@Int`→`@Nat` narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The Tier-3 runtime guard backs every binding site except two — the effect-operation argument, and the generic-instantiated constructor field (which erases to i64 with no per-field guard) — and a tripped guard reports a generic trap rather than the `requires(... >= 0)` fix. The effect-op guard and the dedicated trap kind are tracked as [#754](https://github.com/aallan/vera/issues/754), the generic constructor field as [#757](https://github.com/aallan/vera/issues/757) (see §11.17). ### 11.2.2 Unit as Void diff --git a/tests/test_int_overflow.py b/tests/test_int_overflow.py index acdb5715..90e87308 100644 --- a/tests/test_int_overflow.py +++ b/tests/test_int_overflow.py @@ -116,6 +116,12 @@ def test_nat_overflow_obligation_violated_at_u64_ceiling(self) -> None: assert len(ovf) == 1, [(o.kind, o.status) for o in result.obligations] assert ovf[0].status == "violated", ovf[0].status assert ovf[0].error_code == "E528", ovf[0].error_code + # Also assert the user-facing E528 diagnostic, mirroring the signed + # ceiling test above — the @Nat branch must surface on both the internal + # obligation and the external diagnostic surface (CR #809). + assert any(d.error_code == "E528" for d in result.diagnostics), [ + d.error_code for d in result.diagnostics + ] def test_nat_subtraction_excluded_from_overflow_obligation(self) -> None: # The SUB+Nat exclusion checked at the REAL verifier gate (not the diff --git a/tests/test_int_overflow_codegen.py b/tests/test_int_overflow_codegen.py index 623f2476..31775dcd 100644 --- a/tests/test_int_overflow_codegen.py +++ b/tests/test_int_overflow_codegen.py @@ -176,6 +176,16 @@ def _assert_no_trap(source: str, fn: str, args: list[int], expect: int) -> None: { @Nat.1 * @Nat.0 } """ +# @Nat SUBTRACTION — the #520 nat_sub *underflow* path, NOT a #798 overflow. +# De Bruijn: sub(a, b) computes @Nat.1 - @Nat.0 = a - b. Slot operands give it +# @Nat provenance, so the nat_sub guard applies (a pure-literal `0 - 1` would be +# exempt and would NOT trap — hence slots, not literals). +_NAT_SUB = """ +public fn sub(@Nat, @Nat -> @Nat) + requires(true) ensures(true) effects(pure) +{ @Nat.1 - @Nat.0 } +""" + # Literal-LEFT @Int: the site is classified on the EXPRESSION's resolved type # (@Int / i64), not the literal's @Nat self-type — so the overflow is caught @@ -344,6 +354,30 @@ def test_max_times_one_no_trap(self) -> None: _assert_no_trap(_NAT_MUL, "mul", [U64_MAX, 1], U64_MAX) +class TestNatSubUnderflow798: + """@Nat subtraction stays the #520 nat_sub *underflow* guard and must NOT be + switched to #798 high-overflow arithmetic — it is excluded on both the + verifier and codegen sides. This co-located block pins the #798 exclusion + boundary so a regression that started overflow-guarding @Nat subtraction + surfaces right next to the overflow tests (nat_sub codegen is also covered in + test_codegen.py). Operands are slots so the guard applies — a pure-literal + `0 - 1` is intentionally exempt and would not trap. De Bruijn: sub(a, b) + computes @Nat.1 - @Nat.0 = a - b. + """ + + def test_underflow_traps(self) -> None: + _assert_traps(_NAT_SUB, "sub", [3, 5]) # 3 - 5 underflows + + def test_zero_minus_one_via_slots_traps(self) -> None: + _assert_traps(_NAT_SUB, "sub", [0, 1]) # 0 - 1, slot provenance + + def test_in_range_difference_no_trap(self) -> None: + _assert_no_trap(_NAT_SUB, "sub", [5, 3], 2) # 5 - 3 = 2 + + def test_equal_operands_no_trap(self) -> None: + _assert_no_trap(_NAT_SUB, "sub", [7, 7], 0) # 7 - 7 = 0 + + # ===================================================================== # RISK A — MUL round-trip soundness (design RISK 5) # ===================================================================== diff --git a/tests/test_int_overflow_differential.py b/tests/test_int_overflow_differential.py index a90c6d50..c52622b7 100644 --- a/tests/test_int_overflow_differential.py +++ b/tests/test_int_overflow_differential.py @@ -37,7 +37,18 @@ def _corpus_files() -> list[Path]: - """Examples + every verify/run-level conformance program.""" + """Examples + every verify/run-level conformance program, plus imported + library modules swept *standalone*. + + A cross-module *consumer* (e.g. ``ch07_cross_module_contracts.vera``) cannot + be swept in place: the verifier obligates arithmetic per-module, so an + imported site reads as unobligated in the consumer's verification while the + codegen guards the imported body it inlines — a category mismatch, not a + desync (verifier ``None`` vs codegen ``Int``). Instead we sweep the + imported ``*_lib.vera`` modules on their own, where they typecheck cleanly + and their sites ARE in lockstep, so cross-module arithmetic is still covered + without that false positive (CR #809). + """ manifest = json.loads( (CONFORMANCE_DIR / "manifest.json").read_text(encoding="utf-8"), ) @@ -46,7 +57,13 @@ def _corpus_files() -> list[Path]: for entry in manifest if entry["level"] in ("verify", "run") ] - return sorted(EXAMPLES_DIR.glob("*.vera")) + sorted(conformance) + lib_modules = [ + CONFORMANCE_DIR / entry["file"] + for entry in manifest + if entry["level"] == "check" and entry["file"].endswith("_lib.vera") + ] + return (sorted(EXAMPLES_DIR.glob("*.vera")) + + sorted(conformance) + sorted(lib_modules)) def _binary_sites(program: ast.Program) -> list[ast.BinaryExpr]: From 20bf85c5f9b9d6c1bd8e2b24767da0fae56cef6e Mon Sep 17 00:00:00 2001 From: T Date: Sat, 27 Jun 2026 08:41:53 +0100 Subject: [PATCH 4/5] docs(798): sync spec section 11.3.3 Nat-subtraction note with type-driven codegen Addresses a CodeRabbit outside-diff finding on PR #809: section 11.3.3 still said a Z3-discharged Nat subtraction compiles to a bare i64.sub, contradicting the section 11.2.1 rule (corrected earlier this PR) that the underflow guard is emitted regardless of the verifier's tier. Reworded: the bare i64.sub applies only at guard-exempt sites (pure-literal subtractions); where the guard applies it is emitted type-driven (a Tier-1 discharge means it provably never fires, not that it is omitted). Refs #798. Co-Authored-By: Claude --- spec/11-compilation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/11-compilation.md b/spec/11-compilation.md index 4f67430e..b6836a21 100644 --- a/spec/11-compilation.md +++ b/spec/11-compilation.md @@ -106,7 +106,7 @@ Binary operators compile to their WASM equivalents: Float64 modulo uses the decomposition `a % b = a - trunc(a / b) * b`, where `trunc` is `f64.trunc` (truncation toward zero). This matches C's `fmod` semantics and is consistent with integer `%` (which uses `i64.rem_s`, also truncated toward zero). WASM has no native `f64.rem` instruction, so the compiler emits a multi-instruction sequence using temporary locals. -`-` on `@Nat` operands is the bare `i64.sub` shown above only when the verifier discharges the underflow obligation `lhs >= rhs`. When it cannot, the codegen emits a guarded sequence that traps on underflow — see §11.2.1. +`-` on `@Nat` operands compiles to the bare `i64.sub` shown above only at sites exempt from the underflow guard (e.g. pure-literal subtractions). Where the guard applies (a statically-`@Nat` result with `@Nat` operand provenance, §11.2.1), it is emitted regardless of the verifier's tier — the codegen is type-driven, not tier-driven, so a Tier-1 discharge of `lhs >= rhs` means the guard provably never fires, not that the bare `i64.sub` is emitted instead. Float64 comparisons return `i32` (0 or 1), matching WASM's native comparison semantics. From c7b5ada7f0506b3074b2e3d18a66b9642957e78c Mon Sep 17 00:00:00 2001 From: T Date: Sat, 27 Jun 2026 09:08:43 +0100 Subject: [PATCH 5/5] docs(798): fix spec 11.2.1 representation/range conflation (i64 vs signed/unsigned 64-bit) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Addresses a second CodeRabbit outside-diff finding on PR #809: section 11.2.1 said Int/Nat "share the i64/u64 representation", contradicting the section's own opening ("share the same WASM representation (i64)"). Both lower to i64; the u64 is the unsigned interpretation/range, not a separate representation. - Representation: both are i64 (one WASM type). - Overflow range: signed 64-bit for Int, unsigned 64-bit for Nat — the load-bearing distinction (the verifier uses I64_MIN/I64_MAX vs 0/U64_MAX), now stated as a range, separate from representation. - Scoped the lead-in to the *non-negativity* invariant so the underflow (Nat - Nat) discussion reads as distinct from the overflow discussion below. Refs #798. Co-Authored-By: Claude --- spec/11-compilation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/spec/11-compilation.md b/spec/11-compilation.md index b6836a21..457cd804 100644 --- a/spec/11-compilation.md +++ b/spec/11-compilation.md @@ -43,7 +43,7 @@ Generic type variables are resolved via monomorphization — each concrete insta `Nat` and `Int` share the same WASM representation (`i64`). The non-negativity invariant of `Nat` is enforced by the contract system (preconditions, postconditions), not by the WASM type. This avoids the overhead of runtime range checks on every arithmetic operation while maintaining the correctness guarantee through verification. -The one operation that can violate the invariant despite well-typed operands is unsigned subtraction (`@Nat - @Nat`). At every such site whose result is statically `@Nat` AND at least one operand has `@Nat` *provenance* (a slot reference, function call returning `@Nat`, or a sub-expression containing one — pure-literal subtractions like `0 - 1` are intentionally exempt because they're commonly consumed at `@Int` positions), the compiler emits a Tier-1 proof obligation `lhs >= rhs` — discharged at Tier 1 when a precondition or path condition proves it, else dropped to Tier 3. The codegen is type-driven, not tier-driven: it emits the guarded subtraction at *every* such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted): +The one operation that can violate the non-negativity invariant despite well-typed operands is unsigned subtraction (`@Nat - @Nat`). At every such site whose result is statically `@Nat` AND at least one operand has `@Nat` *provenance* (a slot reference, function call returning `@Nat`, or a sub-expression containing one — pure-literal subtractions like `0 - 1` are intentionally exempt because they're commonly consumed at `@Int` positions), the compiler emits a Tier-1 proof obligation `lhs >= rhs` — discharged at Tier 1 when a precondition or path condition proves it, else dropped to Tier 3. The codegen is type-driven, not tier-driven: it emits the guarded subtraction at *every* such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted): ```wat (if (i64.lt_s lhs rhs) @@ -53,7 +53,7 @@ The one operation that can violate the invariant despite well-typed operands is The trap is classified as `kind="unreachable"` rather than a dedicated `kind="underflow"` because the `unreachable` instruction is the lightest-weight trap mechanism and adding a dedicated kind requires new host-import scaffolding (mirroring `vera.contract_fail`); a precise underflow diagnostic via `vera verify` is the recommended path until the dedicated kind lands. -Arithmetic **overflow** of `@Int`/`@Nat` addition, subtraction, and multiplication (`+`/`-`/`*`) is handled the same way ([#798](https://github.com/aallan/vera/issues/798)). Because `Int`/`Nat` share the i64/u64 representation, these operations wrap under two's-complement arithmetic, so each such site carries a Tier-1 proof obligation that the `result` stays in range — classified at the operands' **common (coerced) arithmetic type** (`@Int` (i64) if either operand is `@Int`, else `@Nat` (u64); `@Nat` subtraction is excluded, being the underflow obligation above). The verifier discharges it three ways: result provably in range → **Tier 1**; provably out of range → a compile error (**E528**) raised *before* codegen; otherwise — dynamic operands — **Tier 3**. As with `@Nat` subtraction, the codegen is type-driven, not tier-driven: it emits the guarded op — which traps on overflow via the same `unreachable` net — at every such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted). The overflow trap is classified `kind="unreachable"` rather than a dedicated `kind="overflow"` for the same reason as `@Nat` subtraction — a precise kind needs new host-import scaffolding and is tracked as a follow-up ([#808](https://github.com/aallan/vera/issues/808)); the obligation stays sound because the function traps on overflow before it can return a wrapped value. +Arithmetic **overflow** of `@Int`/`@Nat` addition, subtraction, and multiplication (`+`/`-`/`*`) is handled the same way ([#798](https://github.com/aallan/vera/issues/798)). Because `Int` and `Nat` share the `i64` representation, these operations wrap under two's-complement arithmetic, so each such site carries a Tier-1 proof obligation that the `result` stays in range — classified at the operands' **common (coerced) arithmetic type** (`@Int` if either operand is `@Int`, else `@Nat`; `@Nat` subtraction is excluded, being the underflow obligation above). The range checked is the **signed** 64-bit range for `@Int` and the **unsigned** 64-bit range for `@Nat` — the shared `i64` interpreted with the operation's signedness. The verifier discharges it three ways: result provably in range → **Tier 1**; provably out of range → a compile error (**E528**) raised *before* codegen; otherwise — dynamic operands — **Tier 3**. As with `@Nat` subtraction, the codegen is type-driven, not tier-driven: it emits the guarded op — which traps on overflow via the same `unreachable` net — at every such site regardless of the verifier's tier (a Tier-1 discharge means the guard provably never fires, but it is still emitted). The overflow trap is classified `kind="unreachable"` rather than a dedicated `kind="overflow"` for the same reason as `@Nat` subtraction — a precise kind needs new host-import scaffolding and is tracked as a follow-up ([#808](https://github.com/aallan/vera/issues/808)); the obligation stays sound because the function traps on overflow before it can return a wrapped value. Authors lift Tier-3 functions back to Tier 1 by adding `requires lhs >= rhs` clauses. Subtraction sites that do not produce a `Nat`-typed result (e.g., `@Int - @Int`) carry no *underflow* obligation — they may produce negative values, which is well-defined for `Int` (they remain subject to the overflow obligation described above). `@Byte` arithmetic is not currently permitted by the type checker (`Byte` is excluded from `NUMERIC_TYPES` in `vera/types.py`), so the underflow obligation needs no `Byte` extension today; allowing `@Byte` arithmetic with both underflow and overflow guards is tracked speculatively as [#564](https://github.com/aallan/vera/issues/564). The verifier checks the `@Nat >= 0` invariant at subtraction sites (in any position, including a function's return expression) and at binding sites where an `@Int` value narrows into a `@Nat` slot — `let` bindings, call arguments, constructor fields, top-level match binds, and literal-tuple destructures, plus the pure-literal `let @Nat = 0 - 1` case the subtraction obligation defers ([#552](https://github.com/aallan/vera/issues/552)). [#747](https://github.com/aallan/vera/issues/747) extended the narrowing obligation to the projection and instantiation sites — ADT sub-pattern binds (`match opt { Some(@Nat) -> }` on `Option`), non-literal tuple destructures, generic constructor / effect-operation / function formals instantiated to `@Nat`, and imported ADT constructors — so every narrowing **binding site** is now statically obligated. (A bare `@Int`→`@Nat` narrowing at a function **return** position, or a `@Nat` component of a tuple/constructor built in value position, is not yet obligated — see [#758](https://github.com/aallan/vera/issues/758).) The Tier-3 runtime guard backs every binding site except two — the effect-operation argument, and the generic-instantiated constructor field (which erases to i64 with no per-field guard) — and a tripped guard reports a generic trap rather than the `requires(... >= 0)` fix. The effect-op guard and the dedicated trap kind are tracked as [#754](https://github.com/aallan/vera/issues/754), the generic constructor field as [#757](https://github.com/aallan/vera/issues/757) (see §11.17).