Skip to content

feat(807): Tier-1 Float64 builtin modeling + integer-literal range check (#807, #812)#811

Merged
aallan merged 5 commits into
mainfrom
feat/807-float64-tier1-builtins
Jun 27, 2026
Merged

feat(807): Tier-1 Float64 builtin modeling + integer-literal range check (#807, #812)#811
aallan merged 5 commits into
mainfrom
feat/807-float64-tier1-builtins

Conversation

@aallan

@aallan aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

Two related verifier↔runtime integer/float soundness items from the #392 campaign:

#807 — Tier-1 modeling for the modelable @Float64 builtins that #797 left as Tier-3 deferrals.

Builtin Tier-1 model Obligation
float_clamp(v, lo, hi) faithful WASM f64.min(f64.max(v, lo), hi) (explicit NaN / ±0) — unconditional none (total)
int_to_float(n) fpToFP(RNE, ToReal(n))concrete arg only none (total)
float_to_int(x) truncated-toward-zero value — concrete arg only E529 domain check (NaN / Inf / out-of-i64-range)

#812 — integer literals are range-checked against their target machine type (E149), discovered during this PR's review. @Int is i64, @Nat is u64; a literal past its bound is now a clean compile error.

Design decisions (grounded in empirical Z3 / runtime behaviour)

  1. float_clamp must not use z3.fpMin/fpMax — they diverge from WASM on NaN (SMT-LIB returns the non-NaN operand; WASM propagates) and ±0. A naive model would unsoundly prove !float_is_nan(float_clamp(NaN, …)). The model builds WASM semantics explicitly and mirrors the codegen max-then-min order (so lo>hi clamps to hi).
  2. int_to_float / float_to_int are concrete-gated — Z3's symbolic Int↔Real↔FP reasoning is unreliable (it returned a spurious sat to a provably-valid claim: n≈5.5e48 whose actual conversion is 1.88e48 > 0). So Tier-1 modeling applies only to constant-foldable args; symbolic ones defer to a sound Tier 3 — the Audit smt.py Z3 translation layer for verification soundness #392 principle of deferring what Z3 cannot soundly model. Corpus aligns: conformance = concrete (Tier-1), examples/json.vera = symbolic (Tier-3, no regression).
  3. float_to_int is partial (i64.trunc_f64_s traps), so a concrete out-of-domain arg is a loud E529; a symbolic one is a runtime-guarded Tier-3 trap.
  4. checker: out-of-i64-range integer literal accepted, then fails at codegen with an opaque WAT error #812 — the checker now range-checks every integer literal via its bidirectional target type (refinements stripped). This closed a silent soundness hole: 18446744073709551615 (u64.MAX) used as @Int made vera verify prove ensures(@Int.result == 18446744073709551615) while the runtime returned -1. The asymmetric edge -9223372036854775808 (i64.MIN) stays valid.

Each Float64 model is confirmed by a verify-vs-run differential vs wasmtime, bit-for-bit, across ±0/±inf/NaN/ties/lo>hi/2^53-rounding/i64 min+max/trap cases.

Exit checklist (§0)

Deliberately separate (recorded per §3)

To sanity-check before merge

Closes #807.
Closes #812.

Summary by CodeRabbit

  • New Features
    • Added tier-1 verification modelling for Float64 builtins, including float_clamp (WASM-faithful NaN and signed-zero behavior) and tiered handling for int_to_float / float_to_int.
    • Introduced a float_to_int domain obligation that flags invalid inputs with E529 (NaN, ±infinity, or out-of-i64-range).
  • Bug Fixes
    • Integer literals are now range-checked at compile time with clear E149 diagnostics.
  • Tests
  • Documentation
    • Updated specs, changelog/history, version, and repository/testing metrics for v0.0.183.

Models the three modelable Float64 builtins deferred from #797's
FloatingPoint-sort fix, completing #807.

- float_clamp(v, lo, hi): modeled unconditionally as the faithful WASM
  f64.min(f64.max(v, lo), hi) via explicit _wasm_fp_min / _wasm_fp_max helpers.
  Z3's own fp.min / fp.max DIVERGE from WASM on NaN (SMT-LIB returns the
  non-NaN operand; WASM propagates) and on signed zero, so a naive model would
  unsoundly prove !float_is_nan(float_clamp(NaN, ...)) — the helpers build the
  WASM semantics explicitly.  Total op, no obligation.
- int_to_float(n) / float_to_int(x): cross the Int-Float boundary where Z3's
  SYMBOLIC Int-Real-FP reasoning is unreliable (it returns spurious
  counterexamples that don't satisfy their own constraints, non-
  deterministically across timeouts).  So they are modeled at Tier 1 ONLY for a
  concrete (constant-foldable) argument; a symbolic argument defers to a sound
  Tier 3 (matching the #392 audit principle of deferring what Z3 cannot soundly
  model).  float_to_int is partial (i64.trunc_f64_s traps on NaN / Inf /
  out-of-i64-range), so a concrete out-of-domain argument is a loud E529 and a
  symbolic one a runtime-guarded Tier-3 trap.

Verify-vs-run differentials confirm each model agrees with wasmtime
bit-for-bit (+/-0, +/-inf, NaN, ties, lo>hi, the 2^53 rounding boundary, i64
max, and the trap cases).  The four format/parse Float64 builtins
(float_to_string, parse_float64, decimal_from_float, decimal_to_float) remain
Tier-3 by necessity.

New: error code E529 (float_to_int domain) and obligation kind
float_to_int_domain.  Release prep for v0.0.183.

Closes #807.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f6f9dea8-0ae8-4c6d-a7dd-27e1ed48f32f

📥 Commits

Reviewing files that changed from the base of the PR and between a7414fe and 3c17e49.

📒 Files selected for processing (4)
  • README.md
  • ROADMAP.md
  • TESTING.md
  • tests/test_float64_builtins_807.py

📝 Walkthrough

Walkthrough

Adds Float64 builtin modelling, E529 float_to_int domain checks, E149 integer-literal range validation, and matching spec, tests, release metadata, and version updates.

Changes

Float64 modelling and literal range updates

Layer / File(s) Summary
Version and release notes
pyproject.toml, vera/__init__.py, CHANGELOG.md, HISTORY.md, README.md, ROADMAP.md, TESTING.md
Bumps the package and module version to 0.0.183 and refreshes release notes, project status, roadmap counts, test totals, and changelog compare links.
Spec, obligation kind, and error codes
scripts/check_spec_examples.py, spec/04-expressions.md, spec/06-contracts.md, vera/errors.py, vera/obligations/core.py
Adds E149 and E529, extends ObligationKind with float_to_int_domain, documents integer-literal range checking and Float64 conversion tiering, and updates the spec example allowlist offsets.
Integer literal range validation
vera/checker/expressions.py, tests/test_checker.py
Adds i64/u64 bounds for integer literals, emits E149 for out-of-range @Int/@Nat literals and negation cases, and adds boundary and propagation regression tests.
Float64 SMT translation, verifier checks, and tests
vera/smt.py, vera/verifier.py, tests/test_float64_builtins_807.py, tests/test_verifier.py
Adds WASM-faithful Float64 helpers and call translation, routes float_to_int into a new domain obligation check, emits E529, and adds Tier-1/Tier-3 and runtime-differential tests for float_clamp, int_to_float, and float_to_int.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • aallan/vera#778: Adds new obligation kinds and verifier reporting in the same area of the codebase.

Suggested labels

compiler, tests, spec, ci, docs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is clear, concise, and accurately summarises the main changes for #807 and #812.
Linked Issues check ✅ Passed The changes implement the required Tier-1 Float64 modelling for #807 and checker-time integer-literal range checks for #812.
Out of Scope Changes check ✅ Passed No unrelated or out-of-scope code changes stand out; supporting docs, tests, and version bumps all align with #807/#812.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/807-float64-tier1-builtins

Comment @coderabbitai help to get the list of available commands.

@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 93.75000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.04%. Comparing base (d725d2d) to head (3c17e49).

Files with missing lines Patch % Lines
vera/verifier.py 90.62% 3 Missing ⚠️
vera/smt.py 94.44% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #811   +/-   ##
=======================================
  Coverage   92.03%   92.04%           
=======================================
  Files          89       89           
  Lines       26855    26934   +79     
  Branches      321      321           
=======================================
+ Hits        24717    24791   +74     
- Misses       2130     2135    +5     
  Partials        8        8           
Flag Coverage Δ
javascript 65.33% <ø> (ø)
python 95.09% <93.75%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…oolkit)

pr-review-toolkit pass on #811 (code-reviewer + silent-failure-hunter found the
fix sound — no critical/important issues; the deferral discipline was cleared
end-to-end).  Addressed the actionable findings:

- verifier.py: the E529 rationale said "The SMT solver proved ..." but the
  concrete-gated float_to_int domain check classifies the FP literal directly in
  Python with no solver call (the handler's own docstring says exactly that).
  Reworded to mechanism-neutral ("a constant the verifier determined ...").
- tests: pin the three DISTINCT E529 reason strings (NaN / infinite / out of i64
  range) — previously only error_code == "E529" was asserted, so a swapped or
  collapsed label would pass silently.  Added negative-infinity and
  negative-out-of-range E529 cases, plus the i64.MIN edge to the int_to_float
  verify-vs-run differential (the asymmetric two's-complement boundary).

Also filed #812 (pre-existing, found by silent-failure-hunter: an out-of-range
integer literal is accepted by the checker, then fails at codegen with an opaque
`i64.const ... out of range` error — codegen-guarded, so not a soundness hole)
and recorded it in ROADMAP Tier 1.

No behaviour change (diagnostic-text + tests + roadmap).  Refs #807.

Co-Authored-By: Claude <noreply@anthropic.invalid>
…ype (E149)

Found during the #807 review (silent-failure-hunter pass).  Closes a silent
Tier-1 soundness hole and a loud codegen wart with one check.

An integer literal must fit its target type — Int is i64, Nat is u64 — now
checked at type-check time via the bidirectional `expected` type (refinements
stripped to the base).  A literal past its bound is a clean E149:

- SILENT + unsound (the important one): a literal in (i64.MAX, u64.MAX] used in
  an Int context (e.g. 18446744073709551615) made `vera verify` prove the
  postcondition `result == 18446744073709551615` while the runtime returned -1
  (the i64 reinterpretation of the all-ones bit pattern) — a Tier-1 proof the
  runtime violates.
- LOUD (#812 as filed): a literal >= 2^64 was accepted by `vera check`, then
  failed at codegen with a raw `i64.const ... out of range` WAT error.

The asymmetric edge -9223372036854775808 (i64.MIN) stays valid — its magnitude
2^63 is checked under the u64 bound via unary minus — while -2^64 is caught at
the inner magnitude literal.  Non-breaking: conformance 93, examples 35, the
full checker + overflow + codegen suites green; no corpus program used an
out-of-range literal.

Filed #813 for the broader, distinct soundness bug this surfaced: a NON-literal
Nat value > i64.MAX widened to Int reinterprets the same way (the widen(u64.MAX)
repro proves a false postcondition) — the Nat-subtype-of-Int relation needs a
coercion obligation, a focused #392-style follow-up.

Closes #812.  Refs #807.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan changed the title feat(807): Tier-1 modeling for float_clamp / int_to_float / float_to_int feat(807): Tier-1 Float64 builtin modeling + integer-literal range check (#807, #812) Jun 27, 2026
@aallan

aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

pr-review-toolkit review + a soundness follow-through

Ran code-reviewer, silent-failure-hunter, pr-test-analyzer, comment-analyzer over the diff. Zero critical/important issues — the Float64 fix validated sound from every angle: the silent-failure-hunter cleared every deferral path end-to-end ("ship"); the code-reviewer empirically confirmed the float_clamp model matches codegen order + NaN/±0, the concrete-gating is airtight, and the float_to_int value model + E529 obligation gate identically with bit-exact i64 boundaries; the guards are mutation-resistant.

Addressed (toolkit findings), in 117328b: reworded the E529 rationale (it claimed an SMT-solver proof the concrete-gated path doesn't perform); pinned the three distinct E529 reason strings; added negative-inf / negative-out-of-range / i64.MIN coverage.

Soundness follow-through, in 8367a4f (closes #812): the silent-failure-hunter flagged an out-of-range integer literal reaching codegen. Probing it turned up something worse than the filed wart — a silent Tier-1 soundness hole: 18446744073709551615 (u64.MAX) used as @Int makes vera verify prove ensures(@Int.result == 18446744073709551615) while the runtime returns -1. Rather than park it, the checker now range-checks every integer literal against its target machine type (E149): @Int ≤ i64.MAX, @Nat ≤ u64.MAX, the asymmetric -9223372036854775808 (i64.MIN) stays valid. Non-breaking (conformance 93, examples 35, full checker + overflow + codegen suites green).

Filed #813: the literal probe surfaced a broader, distinct soundness bug — a non-literal @Nat value > i64.MAX widened to @Int reinterprets identically (widen(u64.MAX) proves a false postcondition). That's the @Nat <: @Int subtype relation, needing a coercion obligation (a focused #392-style follow-up, like #798/#799), not an integer-literal fix — so it's tracked separately, not bundled here.

@aallan

aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@aallan

aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_checker.py`:
- Around line 6423-6442: Add a regression test in the existing negated i64
literal tests to cover the invalid boundary just below i64.MIN, using the same
helper pattern in test_checker.py. Extend the unary-minus literal checks around
test_negated_i64_min_literal_ok and
test_negated_literal_exceeding_u64_magnitude_is_error so that
-9223372036854775809 is rejected with the expected error, ensuring the
parser/type checker does not accept an off-by-one-too-permissive `@Int` boundary.

In `@vera/checker/expressions.py`:
- Around line 135-168: E149 only checks bare IntLit values, so oversized
negative `@Int` literals slip through when parsed as unary negation over a
positive literal. Update the unary-neg literal handling in the same
expression-checking path that uses base_type(expected), _I64_MAX/_U64_MAX, and
self._error so the operand is range-validated before negation: allow the operand
value 2^63 only for forming -9223372036854775808, but reject any larger operand
with the same E149 error. Keep the positive-literal check unchanged and mirror
its bound logic in the unary-neg branch.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ac5c2765-7304-4907-845c-e2ce0fb524eb

📥 Commits

Reviewing files that changed from the base of the PR and between 99eb467 and 8367a4f.

⛔ Files ignored due to path filters (1)
  • docs/llms-full.txt is excluded by !docs/**
📒 Files selected for processing (11)
  • CHANGELOG.md
  • HISTORY.md
  • ROADMAP.md
  • TESTING.md
  • scripts/check_spec_examples.py
  • spec/04-expressions.md
  • tests/test_checker.py
  • tests/test_float64_builtins_807.py
  • vera/checker/expressions.py
  • vera/errors.py
  • vera/verifier.py

Comment thread tests/test_checker.py
Comment thread vera/checker/expressions.py
…itical)

CodeRabbit caught a Critical hole in the #812 literal range check on the
negative side.  -9223372036854775809 (= -(2^63 + 1), one below i64.MIN) parses
as unary minus over the magnitude literal 9223372036854775809; that operand is
checked against the u64 bound (it has no Int context under the negation), so the
band (2^63, u64.MAX] slipped past and ran to a wrong POSITIVE value
(9223372036854775807) — the same silent reinterpretation the positive check
closes.

_check_unary now range-checks a negated integer literal: the magnitude may reach
2^63 (forming i64.MIN) but no further -> E149.  A magnitude > u64.MAX is already
caught at the operand literal, so the band check caps at u64.MAX to avoid a
double diagnostic.  i64.MIN (-9223372036854775808) stays valid.

Mutation-validated (neutralizing the band check flips the new test RED);
conformance 93, examples 35, checker + codegen suites green.

Refs #807, #812.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan

aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@aallan

aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@aallan

aallan commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Line 218: The project-status snapshot in README.md has an outdated test total,
so update the count in the active-development summary to match the current
documented total used by TESTING.md and ROADMAP.md. Keep the change limited to
the status sentence in README.md and ensure the visible test count is aligned
with the latest collected total.

In `@tests/test_float64_builtins_807.py`:
- Around line 94-105: The temp-file helper currently leaves behind each
NamedTemporaryFile created with delete=False, causing .vera files to accumulate
across parametrized runs. Update the shared helper logic around the
tempfile.NamedTemporaryFile usage to always unlink the generated path after
compile_vera/execute completes, using a Windows-safe cleanup pattern (e.g. in a
finally block) so both this helper and the other referenced helper stop leaking
files.
- Around line 469-474: The `test_float64_builtins_807.py` parametrized runtime
differential for `float_to_int` covers the positive overflow and non-finite
cases, but it is missing the low-side out-of-range trap where the input is less
than `i64.MIN`. Update the `body` list in the `test_float_to_int` coverage to
include a negative overflow case alongside the existing `nan()`, `infinity()`,
and high-side overflow inputs, using the same `float_to_int` pattern so the
runtime differential exercises both bounds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6491929d-41a7-49e0-9586-fb3deaa619c7

📥 Commits

Reviewing files that changed from the base of the PR and between d725d2d and a7414fe.

⛔ Files ignored due to path filters (5)
  • docs/index.html is excluded by !docs/**
  • docs/index.md is excluded by !docs/**
  • docs/llms-full.txt is excluded by !docs/**
  • docs/llms.txt is excluded by !docs/**
  • uv.lock is excluded by !**/*.lock, !uv.lock
📒 Files selected for processing (18)
  • CHANGELOG.md
  • HISTORY.md
  • README.md
  • ROADMAP.md
  • TESTING.md
  • pyproject.toml
  • scripts/check_spec_examples.py
  • spec/04-expressions.md
  • spec/06-contracts.md
  • tests/test_checker.py
  • tests/test_float64_builtins_807.py
  • tests/test_verifier.py
  • vera/__init__.py
  • vera/checker/expressions.py
  • vera/errors.py
  • vera/obligations/core.py
  • vera/smt.py
  • vera/verifier.py

Comment thread README.md Outdated
Comment thread tests/test_float64_builtins_807.py Outdated
Comment thread tests/test_float64_builtins_807.py
…(CodeRabbit)

Three CodeRabbit findings on a7414fe, all verified valid:

- tests/test_float64_builtins_807.py: the _run_float_expr / _run_int_expr helpers
  used NamedTemporaryFile(delete=False) with no cleanup, leaking a .vera file per
  parametrized run.  Wrapped both in try/finally with
  Path(path).unlink(missing_ok=True) (delete=False stays — Windows can't reopen a
  held temp file via parse_file).
- tests/test_float64_builtins_807.py: added the low-side float_to_int trap case
  (float_to_int(0.0 - 9000000000000000000.0 * 2.0), < i64.MIN) to the runtime
  differential, mirroring the existing high-side / NaN / Inf cases so both bounds
  are exercised.
- README.md: the active-development status line said 5,250 tests (stale —
  check_doc_counts does not gate README); synced to the live 5,263 total.

Test-only + README.  No behaviour change.  Refs #807.

Co-Authored-By: Claude <noreply@anthropic.invalid>
@aallan aallan merged commit daf1cb9 into main Jun 27, 2026
28 checks passed
@aallan aallan deleted the feat/807-float64-tier1-builtins branch June 27, 2026 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant