feat(807): Tier-1 Float64 builtin modeling + integer-literal range check (#807, #812)#811
Conversation
Models the three modelable Float64 builtins deferred from #797's FloatingPoint-sort fix, completing #807. - float_clamp(v, lo, hi): modeled unconditionally as the faithful WASM f64.min(f64.max(v, lo), hi) via explicit _wasm_fp_min / _wasm_fp_max helpers. Z3's own fp.min / fp.max DIVERGE from WASM on NaN (SMT-LIB returns the non-NaN operand; WASM propagates) and on signed zero, so a naive model would unsoundly prove !float_is_nan(float_clamp(NaN, ...)) — the helpers build the WASM semantics explicitly. Total op, no obligation. - int_to_float(n) / float_to_int(x): cross the Int-Float boundary where Z3's SYMBOLIC Int-Real-FP reasoning is unreliable (it returns spurious counterexamples that don't satisfy their own constraints, non- deterministically across timeouts). So they are modeled at Tier 1 ONLY for a concrete (constant-foldable) argument; a symbolic argument defers to a sound Tier 3 (matching the #392 audit principle of deferring what Z3 cannot soundly model). float_to_int is partial (i64.trunc_f64_s traps on NaN / Inf / out-of-i64-range), so a concrete out-of-domain argument is a loud E529 and a symbolic one a runtime-guarded Tier-3 trap. Verify-vs-run differentials confirm each model agrees with wasmtime bit-for-bit (+/-0, +/-inf, NaN, ties, lo>hi, the 2^53 rounding boundary, i64 max, and the trap cases). The four format/parse Float64 builtins (float_to_string, parse_float64, decimal_from_float, decimal_to_float) remain Tier-3 by necessity. New: error code E529 (float_to_int domain) and obligation kind float_to_int_domain. Release prep for v0.0.183. Closes #807. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds Float64 builtin modelling, E529 float_to_int domain checks, E149 integer-literal range validation, and matching spec, tests, release metadata, and version updates. ChangesFloat64 modelling and literal range updates
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #811 +/- ##
=======================================
Coverage 92.03% 92.04%
=======================================
Files 89 89
Lines 26855 26934 +79
Branches 321 321
=======================================
+ Hits 24717 24791 +74
- Misses 2130 2135 +5
Partials 8 8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…oolkit) pr-review-toolkit pass on #811 (code-reviewer + silent-failure-hunter found the fix sound — no critical/important issues; the deferral discipline was cleared end-to-end). Addressed the actionable findings: - verifier.py: the E529 rationale said "The SMT solver proved ..." but the concrete-gated float_to_int domain check classifies the FP literal directly in Python with no solver call (the handler's own docstring says exactly that). Reworded to mechanism-neutral ("a constant the verifier determined ..."). - tests: pin the three DISTINCT E529 reason strings (NaN / infinite / out of i64 range) — previously only error_code == "E529" was asserted, so a swapped or collapsed label would pass silently. Added negative-infinity and negative-out-of-range E529 cases, plus the i64.MIN edge to the int_to_float verify-vs-run differential (the asymmetric two's-complement boundary). Also filed #812 (pre-existing, found by silent-failure-hunter: an out-of-range integer literal is accepted by the checker, then fails at codegen with an opaque `i64.const ... out of range` error — codegen-guarded, so not a soundness hole) and recorded it in ROADMAP Tier 1. No behaviour change (diagnostic-text + tests + roadmap). Refs #807. Co-Authored-By: Claude <noreply@anthropic.invalid>
…ype (E149) Found during the #807 review (silent-failure-hunter pass). Closes a silent Tier-1 soundness hole and a loud codegen wart with one check. An integer literal must fit its target type — Int is i64, Nat is u64 — now checked at type-check time via the bidirectional `expected` type (refinements stripped to the base). A literal past its bound is a clean E149: - SILENT + unsound (the important one): a literal in (i64.MAX, u64.MAX] used in an Int context (e.g. 18446744073709551615) made `vera verify` prove the postcondition `result == 18446744073709551615` while the runtime returned -1 (the i64 reinterpretation of the all-ones bit pattern) — a Tier-1 proof the runtime violates. - LOUD (#812 as filed): a literal >= 2^64 was accepted by `vera check`, then failed at codegen with a raw `i64.const ... out of range` WAT error. The asymmetric edge -9223372036854775808 (i64.MIN) stays valid — its magnitude 2^63 is checked under the u64 bound via unary minus — while -2^64 is caught at the inner magnitude literal. Non-breaking: conformance 93, examples 35, the full checker + overflow + codegen suites green; no corpus program used an out-of-range literal. Filed #813 for the broader, distinct soundness bug this surfaced: a NON-literal Nat value > i64.MAX widened to Int reinterprets the same way (the widen(u64.MAX) repro proves a false postcondition) — the Nat-subtype-of-Int relation needs a coercion obligation, a focused #392-style follow-up. Closes #812. Refs #807. Co-Authored-By: Claude <noreply@anthropic.invalid>
pr-review-toolkit review + a soundness follow-throughRan code-reviewer, silent-failure-hunter, pr-test-analyzer, comment-analyzer over the diff. Zero critical/important issues — the Float64 fix validated sound from every angle: the silent-failure-hunter cleared every deferral path end-to-end ("ship"); the code-reviewer empirically confirmed the Addressed (toolkit findings), in Soundness follow-through, in Filed #813: the literal probe surfaced a broader, distinct soundness bug — a non-literal |
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_checker.py`:
- Around line 6423-6442: Add a regression test in the existing negated i64
literal tests to cover the invalid boundary just below i64.MIN, using the same
helper pattern in test_checker.py. Extend the unary-minus literal checks around
test_negated_i64_min_literal_ok and
test_negated_literal_exceeding_u64_magnitude_is_error so that
-9223372036854775809 is rejected with the expected error, ensuring the
parser/type checker does not accept an off-by-one-too-permissive `@Int` boundary.
In `@vera/checker/expressions.py`:
- Around line 135-168: E149 only checks bare IntLit values, so oversized
negative `@Int` literals slip through when parsed as unary negation over a
positive literal. Update the unary-neg literal handling in the same
expression-checking path that uses base_type(expected), _I64_MAX/_U64_MAX, and
self._error so the operand is range-validated before negation: allow the operand
value 2^63 only for forming -9223372036854775808, but reject any larger operand
with the same E149 error. Keep the positive-literal check unchanged and mirror
its bound logic in the unary-neg branch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ac5c2765-7304-4907-845c-e2ce0fb524eb
⛔ Files ignored due to path filters (1)
docs/llms-full.txtis excluded by!docs/**
📒 Files selected for processing (11)
CHANGELOG.mdHISTORY.mdROADMAP.mdTESTING.mdscripts/check_spec_examples.pyspec/04-expressions.mdtests/test_checker.pytests/test_float64_builtins_807.pyvera/checker/expressions.pyvera/errors.pyvera/verifier.py
…itical) CodeRabbit caught a Critical hole in the #812 literal range check on the negative side. -9223372036854775809 (= -(2^63 + 1), one below i64.MIN) parses as unary minus over the magnitude literal 9223372036854775809; that operand is checked against the u64 bound (it has no Int context under the negation), so the band (2^63, u64.MAX] slipped past and ran to a wrong POSITIVE value (9223372036854775807) — the same silent reinterpretation the positive check closes. _check_unary now range-checks a negated integer literal: the magnitude may reach 2^63 (forming i64.MIN) but no further -> E149. A magnitude > u64.MAX is already caught at the operand literal, so the band check caps at u64.MAX to avoid a double diagnostic. i64.MIN (-9223372036854775808) stays valid. Mutation-validated (neutralizing the band check flips the new test RED); conformance 93, examples 35, checker + codegen suites green. Refs #807, #812. Co-Authored-By: Claude <noreply@anthropic.invalid>
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai full review |
✅ Action performedFull review finished. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@README.md`:
- Line 218: The project-status snapshot in README.md has an outdated test total,
so update the count in the active-development summary to match the current
documented total used by TESTING.md and ROADMAP.md. Keep the change limited to
the status sentence in README.md and ensure the visible test count is aligned
with the latest collected total.
In `@tests/test_float64_builtins_807.py`:
- Around line 94-105: The temp-file helper currently leaves behind each
NamedTemporaryFile created with delete=False, causing .vera files to accumulate
across parametrized runs. Update the shared helper logic around the
tempfile.NamedTemporaryFile usage to always unlink the generated path after
compile_vera/execute completes, using a Windows-safe cleanup pattern (e.g. in a
finally block) so both this helper and the other referenced helper stop leaking
files.
- Around line 469-474: The `test_float64_builtins_807.py` parametrized runtime
differential for `float_to_int` covers the positive overflow and non-finite
cases, but it is missing the low-side out-of-range trap where the input is less
than `i64.MIN`. Update the `body` list in the `test_float_to_int` coverage to
include a negative overflow case alongside the existing `nan()`, `infinity()`,
and high-side overflow inputs, using the same `float_to_int` pattern so the
runtime differential exercises both bounds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6491929d-41a7-49e0-9586-fb3deaa619c7
⛔ Files ignored due to path filters (5)
docs/index.htmlis excluded by!docs/**docs/index.mdis excluded by!docs/**docs/llms-full.txtis excluded by!docs/**docs/llms.txtis excluded by!docs/**uv.lockis excluded by!**/*.lock,!uv.lock
📒 Files selected for processing (18)
CHANGELOG.mdHISTORY.mdREADME.mdROADMAP.mdTESTING.mdpyproject.tomlscripts/check_spec_examples.pyspec/04-expressions.mdspec/06-contracts.mdtests/test_checker.pytests/test_float64_builtins_807.pytests/test_verifier.pyvera/__init__.pyvera/checker/expressions.pyvera/errors.pyvera/obligations/core.pyvera/smt.pyvera/verifier.py
…(CodeRabbit) Three CodeRabbit findings on a7414fe, all verified valid: - tests/test_float64_builtins_807.py: the _run_float_expr / _run_int_expr helpers used NamedTemporaryFile(delete=False) with no cleanup, leaking a .vera file per parametrized run. Wrapped both in try/finally with Path(path).unlink(missing_ok=True) (delete=False stays — Windows can't reopen a held temp file via parse_file). - tests/test_float64_builtins_807.py: added the low-side float_to_int trap case (float_to_int(0.0 - 9000000000000000000.0 * 2.0), < i64.MIN) to the runtime differential, mirroring the existing high-side / NaN / Inf cases so both bounds are exercised. - README.md: the active-development status line said 5,250 tests (stale — check_doc_counts does not gate README); synced to the live 5,263 total. Test-only + README. No behaviour change. Refs #807. Co-Authored-By: Claude <noreply@anthropic.invalid>
Summary
Two related verifier↔runtime integer/float soundness items from the #392 campaign:
#807 — Tier-1 modeling for the modelable
@Float64builtins that #797 left as Tier-3 deferrals.float_clamp(v, lo, hi)f64.min(f64.max(v, lo), hi)(explicit NaN / ±0) — unconditionalint_to_float(n)fpToFP(RNE, ToReal(n))— concrete arg onlyfloat_to_int(x)#812 — integer literals are range-checked against their target machine type (E149), discovered during this PR's review.
@Intis i64,@Natis u64; a literal past its bound is now a clean compile error.Design decisions (grounded in empirical Z3 / runtime behaviour)
float_clampmust not usez3.fpMin/fpMax— they diverge from WASM on NaN (SMT-LIB returns the non-NaN operand; WASM propagates) and ±0. A naive model would unsoundly prove!float_is_nan(float_clamp(NaN, …)). The model builds WASM semantics explicitly and mirrors the codegenmax-then-minorder (solo>hiclamps tohi).int_to_float/float_to_intare concrete-gated — Z3's symbolic Int↔Real↔FP reasoning is unreliable (it returned a spurioussatto a provably-valid claim:n≈5.5e48whose actual conversion is1.88e48 > 0). So Tier-1 modeling applies only to constant-foldable args; symbolic ones defer to a sound Tier 3 — the Auditsmt.pyZ3 translation layer for verification soundness #392 principle of deferring what Z3 cannot soundly model. Corpus aligns: conformance = concrete (Tier-1),examples/json.vera= symbolic (Tier-3, no regression).float_to_intis partial (i64.trunc_f64_straps), so a concrete out-of-domain arg is a loud E529; a symbolic one is a runtime-guarded Tier-3 trap.18446744073709551615(u64.MAX) used as@Intmadevera verifyproveensures(@Int.result == 18446744073709551615)while the runtime returned-1. The asymmetric edge-9223372036854775808(i64.MIN) stays valid.Each Float64 model is confirmed by a verify-vs-run differential vs wasmtime, bit-for-bit, across ±0/±inf/NaN/ties/
lo>hi/2^53-rounding/i64 min+max/trap cases.Exit checklist (§0)
pytest tests/ -v— green (77test_float64_builtins_807.py+ 8TestIntegerLiteralRange812intest_checker.py;test_overall_tier_countsupdated 80→81 T3)mypy vera/— cleancheck_conformance(93) /check_examples(35) — greencheck_version_sync(0.0.183, 6 files) /check_doc_counts(5261/46) /check_site_assets/check_spec_examples— green[0.0.183](Added Tier-1 modeling for the modelable Float64 builtins deferred from #797 (float_clamp, int_to_float, float_to_int) #807 + Fixed checker: out-of-i64-range integer literal accepted, then fails at codegen with an opaque WAT error #812) + link refs; spec §6.4.3 (E529) + §4.2 (E149); HISTORY row; ROADMAP swapped checker: out-of-i64-range integer literal accepted, then fails at codegen with an opaque WAT error #812→verifier: @Nat → @Int widening of a value > i64.MAX is unsound (Tier-1 proves a false postcondition) #813Deliberately separate (recorded per §3)
@Float64builtins stay Tier-3 by necessity (Z3 can't model float formatting/parsing or the opaqueDecimalhandle) — bucket A of Tier-1 modeling for the modelable Float64 builtins deferred from #797 (float_clamp, int_to_float, float_to_int) #807.int_to_float/float_to_intstay Tier-3 — a soundness decision forced by Z3's unreliable symbolic FP↔Real reasoning, not a deferral.@Natvalue > i64.MAX widened to@Intreinterprets the same way (widen(u64.MAX)proves a false postcondition). That's the@Nat <: @Intsubtype relation, needing a coercion obligation (verify + codegen + differential, like verifier/smt.py: integer arithmetic overflow unmodeled — Tier 1 proves contracts false under i64/u64 wraparound #798/smt.py: signed division/modulo translated as Z3 Euclidean — Tier 1 proves wrong quotient/remainder for negatives #799) — a focused Auditsmt.pyZ3 translation layer for verification soundness #392-style follow-up, not an integer-literal fix.To sanity-check before merge
test_overall_tier_counts80→81 T3 (one new sound obligation at json.vera's symbolicfloat_to_intsite).Closes #807.
Closes #812.
Summary by CodeRabbit
float_clamp(WASM-faithful NaN and signed-zero behavior) and tiered handling forint_to_float/float_to_int.float_to_intdomain obligation that flags invalid inputs with E529 (NaN, ±infinity, or out-of-i64-range).