fix(802): defer string_length to Tier 3 for UTF-8 byte soundness (v0.0.182) by aallan · Pull Request #810 · aallan/vera

aallan · 2026-06-27T10:15:57Z

Closes #802 — the final sub-task of the #392 smt.py soundness audit. With this, all six soundness bugs the audit found (#797–#802) are fixed, so this PR also Closes #392.

The bug

vera verify translated string_length to Z3's Length, which counts Unicode code points, but Vera's runtime string_length counts UTF-8 bytes. They disagree on every multibyte character, so the verifier proved false contracts:

public fn f(@String -> @Int)
  requires(@String.0 == "é")      # U+00E9: 1 code point, 2 UTF-8 bytes
  ensures(@Int.result == 1)       # FALSE — runtime returns 2
  effects(pure)
{ string_length(@String.0) }

vera verify → proved at Tier 1 (Z3 Length("é") == 1), yet vera run raises a postcondition violation (string_length("é") is 2). Silent for ASCII (1 byte = 1 code point); wrong for any multibyte character.

The fix

string_length (vera/smt.py): modeled at Tier 1 only for a string literal, whose exact UTF-8 byte length is known (string_length("é") now proves == 2, disproves == 1); on any non-literal argument it defers to a runtime-guarded Tier 3 obligation — Z3's string theory has no byte-length operator, so this matches the numeric-cast / quantifier / decimal precedent the audit established. Sound, and it preserves the literal-arg precision the original z3.Length was added for (with the correct byte count).
Audited the other modeled string builtins and confirmed they're sound, so they're untouched: string_contains / string_starts_with / string_ends_with stay Tier 1 because UTF-8 is self-synchronizing — a valid substring/prefix/suffix matches at the byte level exactly when it matches at the code-point level (verified by a verify↔run differential on non-ASCII inputs). The offset-sensitive builtins (string_slice, string_index_of, string_char_code, string_chars) were never translated to Z3 and already defer to Tier 3.

Also found in review — astral-plane string literals (folded in)

The adversarial review of this PR surfaced a second Z3-string unsoundness in the same class, so I fixed it here (this PR closes the #392 audit — shipping a known Z3-string hole would make that close dishonest). Z3's string-sort alphabet only reaches U+2FFFF, and its Python binding silently stores any higher code point as the literal's escape text — so string_contains("\u{10FFFF}", "f") proved true at Tier 1 (matching a phantom escape byte) while the runtime returns false. Reproduced, then fixed in vera/smt.py: a string literal containing a code point > U+2FFFF defers to Tier 3 (StringVal returns None), so the predicates never match phantom bytes. string_length is unaffected (it byte-counts the decoded value, not StringVal). Regression test: TestAstralStringLiteral802. The review also prompted a runtime byte-length test (string_length("é") == 2, "😀" == 4) — pinning the runtime premise the verifier's byte model relies on — and escaped-unicode / 4-byte / empty-literal coverage.

Exit checklist (§0)

pytest tests/ — green (5,138 passed, 16 skipped, 16 deselected)
mypy vera/ — clean
check_conformance.py — 93/93 (ch04_string_builtins.vera gains a non-ASCII byte-length case that verifies at Tier 1 and runs as 2)
check_examples.py — 35/35
check_version_sync.py — 0.0.182
CHANGELOG [0.0.182] + link refs; doc-counts / site-assets / spec-examples green
New behaviour covered by tests that fail on the pre-fix verifier — tests/test_string_length_soundness.py (RED-confirmed: pre-fix proves == 1 / disproves == 2)

Tests

New tests/test_string_length_soundness.py (8): the literal byte-model ("é" → 2, not 1), slot-arg deferral (the issue's probe no longer proves at Tier 1), ASCII preservation, and the predicate-soundness guards (verify↔run agree on non-ASCII).
Rewritten tests that pinned the old unsound behaviour — TestStringLengthVerification asserted string_length was Tier-1 via z3.Length (it now asserts the deferral / literal byte-model). Refinement, E501-rendering, and Expand smt.translate_expr coverage: FloatLit, ArrayLit, IndexExpr #667 call-typing tests that used string_length only as an example builtin switch to string_starts_with (a still-Tier-1 predicate), preserving each test's actual purpose. The aggregate tier-count test moves -2 T1 / +2 T3 (two example contracts defer), total unchanged.

Reproduce-before-fix note (CR's suggested option)

The issue offered a byte-indexed Z3 model as the alternative to deferral. A full byte model would require re-encoding Vera strings as Z3 byte-sequences (SeqSort of BitVec8) — a large rework that would itself need an audit. The literal-byte-length model here fixes the unsoundness, keeps the documented literal-arg precision, and minimises collateral weakening (sound checks like string_length("") > 0 over a literal still decide statically); a richer byte model can follow if Tier-1 string-length reasoning over variables is ever needed.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Improved verifier soundness for string_length by modelling supported string literals using exact UTF-8 byte length; non-literal inputs now defer to runtime-guarded Tier 3 checks, with added safeguards for astral and lone-surrogate edge cases.
- Retained Tier 1 handling for string_contains, string_starts_with, and string_ends_with where sound, with Tier 3 deferrals for problematic literals.
Tests
- Added a new string_length soundness regression suite and updated tiering/verification expectations, including an end-to-end contract-violation case for issue 802.
Documentation
- Updated spec chapter 6.4.3 and release materials for v0.0.182, refreshing documented release and test-status totals.

…0.182) vera verify translated string_length to Z3's Length, which counts Unicode code points, but Vera's runtime counts UTF-8 bytes — so ensures(string_length("é") == 1) verified at Tier 1 while the runtime returns 2 (a false proof for every non-ASCII string). Final sub-task of the #392 smt.py soundness audit. string_length is now modeled at Tier 1 only for a string literal (its exact byte length is known); a non-literal argument defers to a runtime-guarded Tier 3 obligation (Z3's string theory has no byte-length operator), matching the numeric-cast / quantifier / decimal precedent. The boolean predicates string_contains / string_starts_with / string_ends_with stay Tier 1 — UTF-8 is self-synchronizing, so a valid substring/prefix/suffix matches at the byte level exactly when it matches at the code-point level. The offset-sensitive builtins (string_slice / string_index_of / string_char_code / string_chars) were never modeled in Z3 and already defer. Tests that pinned the old z3.Length Tier-1 behavior are rewritten to the sound deferral; refinement / E501 / #667 tests that used string_length only as an example builtin switch to string_starts_with (a still-Tier-1 predicate). New test_string_length_soundness.py covers the literal byte model, slot deferral, and predicate soundness; ch04_string_builtins gains a non-ASCII byte-length case. Closes #802. Closes #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai · 2026-06-27T10:16:08Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d219998-b6ce-47e3-834a-c7ce0cc8eda4

📥 Commits

Reviewing files that changed from the base of the PR and between 2690a08 and 2f4ba0b.

⛔ Files ignored due to path filters (6)

docs/index.html is excluded by !docs/**
docs/index.md is excluded by !docs/**
docs/llms-full.txt is excluded by !docs/**
docs/llms.txt is excluded by !docs/**
tests/conformance/ch04_string_builtins.vera is excluded by !**/*.vera
uv.lock is excluded by !**/*.lock, !uv.lock

📒 Files selected for processing (15)

CHANGELOG.md
HISTORY.md
README.md
ROADMAP.md
TESTING.md
pyproject.toml
scripts/check_spec_examples.py
spec/06-contracts.md
tests/test_codegen.py
tests/test_obligations.py
tests/test_string_length_soundness.py
tests/test_verifier.py
tests/test_verifier_coverage.py
vera/__init__.py
vera/smt.py

📝 Walkthrough

Walkthrough

This PR changes string_length to use UTF-8 byte counts for string literals and defer non-literal cases to Tier 3. It adds regression tests, updates verifier expectations for related string predicates, and bumps release/version metadata to 0.0.182.

Changes

String length soundness

Layer / File(s)	Summary
Translation and contract `spec/06-contracts.md`, `scripts/check_spec_examples.py`, `CHANGELOG.md`, `vera/smt.py`	`string_length` is reclassified for literals vs non-literals, the spec example allowlist is updated, the release note is added, and SMT translation now uses UTF-8 byte lengths or Tier 3 fallback.
Regression coverage `tests/test_string_length_soundness.py`, `tests/test_obligations.py`, `tests/test_verifier.py`, `tests/test_verifier_coverage.py`, `tests/test_codegen.py`	New regression cases cover literal and non-literal `string_length` verification, non-ASCII string predicates, and updated verifier/codegen expectations for the revised string semantics.
Release metadata `HISTORY.md`, `README.md`, `ROADMAP.md`, `TESTING.md`, `pyproject.toml`, `vera/__init__.py`	Release notes, status text, test counts, version fields, and package version are bumped to 0.0.182.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

smt.py: string_length translated as Z3 code-point Length — Tier 1 proves wrong length for non-ASCII (Vera counts UTF-8 bytes) #802: Matches the same string_length UTF-8 byte-length soundness fix and the new regression coverage.

Possibly related PRs

aallan/vera#456: Directly overlaps with the vera/smt.py string_length translation path changed here.

Suggested labels

compiler, tests, spec, docs, ci

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately summarises the main change: deferring non-literal string_length to Tier 3 for UTF-8 byte soundness.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/802-string-utf8-bytes

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-27T10:16:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.03%. Comparing base (2690a08) to head (2f4ba0b).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #810   +/-   ##
=======================================
  Coverage   92.03%   92.03%           
=======================================
  Files          89       89           
  Lines       26853    26855    +2     
  Branches      321      321           
=======================================
+ Hits        24713    24717    +4     
+ Misses       2132     2130    -2     
  Partials        8        8

Flag	Coverage Δ
javascript	`65.33% <ø> (ø)`
python	`95.09% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…runtime/edge tests Adversarial review of #810 found a second Z3-string unsoundness in the same class as #802: Z3's string sort alphabet only reaches U+2FFFF, and the Python binding silently stores any higher code point as the literal's escape string — so string_contains("\u{10FFFF}", "f") proved true at Tier 1 while the runtime returns false. Since this PR closes the #392 smt.py string-soundness audit, fixing it here keeps that close honest. - vera/smt.py: a string literal containing a code point > U+2FFFF defers to Tier 3 (StringVal returns None), so the predicates never match phantom escape bytes. string_length is unaffected (it byte-counts the decoded value, not StringVal), so an astral literal's byte length is still soundly Tier-1. - tests/test_string_length_soundness.py: TestAstralStringLiteral802 (the regression — pre-fix proved the false contract at Tier 1) plus literal-edge coverage (escaped-unicode decode, 4-byte emoji, empty string). - tests/test_codegen.py: a non-ASCII runtime byte-length test (string_length("é") == 2, "😀" == 4) — pins the runtime premise the verifier's byte model relies on, so a codegen regression to code-point counting can't hide behind green verifier-side tests. - spec/06 §6.4.3 + CHANGELOG: note the astral-literal Tier-3 deferral. Refs #802, #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

vera/smt.py (1)
1075-1079: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Reject lone surrogate escapes before UTF-8 sizing vera/smt.py:1075-1079

_decode_string_escapes() accepts \u{D800}-style values, so len(arg_node.value.encode("utf-8")) can raise UnicodeEncodeError instead of falling back to Tier 3. Reject surrogate code points upstream or handle the encode failure here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@vera/smt.py` around lines 1075 - 1079, The string_length fast path in
`vera.smt` can crash on lone surrogate escapes because
`arg_node.value.encode("utf-8")` assumes valid UTF-8. Update the `string_length`
handling in the SMT translation logic to either reject surrogate code points
before calling `encode` or catch `UnicodeEncodeError` and fall back to the Tier
3 path, using the existing `call`/`arg_node` string-literal branch as the place
to make the change.
Source: Path instructions

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_string_length_soundness.py`:
- Around line 124-132: Add an explicit boundary test around the ord(ch) >
0x2FFFF check in the string length soundness suite: extend
test_four_byte_emoji_literal_byte_length or add a nearby case that verifies both
\u{2FFFF} and \u{30000} so the compiler behavior at the exact cutoff is covered.
Use the existing _verify helper and string_length assertions to ensure \u{2FFFF}
is treated on the lower side and \u{30000} on the higher side, catching any
accidental > versus >= regression.
- Around line 192-206: The astral-literal test only checks that Tier 1 did not
verify, which can still pass if verification fails differently or the obligation
disappears. In test_astral_predicate_not_proved_at_tier1, keep the existing
tier1_verified assertion but also assert _ok(result) and that
result.summary.tier3_runtime is at least 1 so the test explicitly covers the
intended Tier-3 fallback behavior.

---

Outside diff comments:
In `@vera/smt.py`:
- Around line 1075-1079: The string_length fast path in `vera.smt` can crash on
lone surrogate escapes because `arg_node.value.encode("utf-8")` assumes valid
UTF-8. Update the `string_length` handling in the SMT translation logic to
either reject surrogate code points before calling `encode` or catch
`UnicodeEncodeError` and fall back to the Tier 3 path, using the existing
`call`/`arg_node` string-literal branch as the place to make the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c7827713-5433-4a0e-bc61-ad9a8cebf0dd

📥 Commits

Reviewing files that changed from the base of the PR and between 35579b4 and 0195d9c.

📒 Files selected for processing (8)

CHANGELOG.md
README.md
ROADMAP.md
TESTING.md
spec/06-contracts.md
tests/test_codegen.py
tests/test_string_length_soundness.py
vera/smt.py

…toff/astral tests (review) CodeRabbit review of #810: - vera/smt.py: a lone surrogate (U+D800..U+DFFF) is not UTF-8-encodable, so the new string_length literal byte-model (value.encode) AND z3.StringVal both crashed `vera verify` with UnicodeEncodeError. Guard both: string_length catches UnicodeEncodeError -> Tier 3; the StringVal literal-deferral now covers surrogates alongside the > U+2FFFF astral cutoff. - tests/test_string_length_soundness.py: a surrogate regression test (no crash, defers), an exact-cutoff boundary test (\u{2FFFF} Tier-1 vs \u{30000} Tier-3, pinning > vs >=), and the astral predicate test reworked to the literal-on-literal form so it asserts the clean Tier-3 fallback (_ok + tier3 >= 1). - spec/06 section 6.4.3 + CHANGELOG: note the surrogate deferral. Note: CR's finding 1 suggested a string_length boundary test, but string_length byte-counts via Python and does not exercise the z3.StringVal cutoff, so the boundary test is on a predicate instead. Refs #802, #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-06-27T11:15:13Z

@coderabbitai Good catch — reproduced (it crashed vera verify with UnicodeEncodeError) and fixed in 2961d91. A lone surrogate broke both paths: string_length's value.encode("utf-8") (now wrapped → Tier 3 on UnicodeEncodeError) and z3.StringVal in the predicate path (the literal-deferral guard now covers U+D800–U+DFFF alongside the > U+2FFFF astral cutoff). Regression: test_surrogate_literal_does_not_crash exercises both a string_length and a string_contains over "\u{D800}".

coderabbitai · 2026-06-27T11:15:33Z

Tip

For best results, initiate chat on the files or code changes.

@aallan Nice fix — that closes the hole I was worried about.

Catching UnicodeEncodeError in the string_length literal path and widening the literal-deferral guard to include U+D800–U+DFFF on the predicate path is the right soundness-preserving behavior here. The regression coverage in test_surrogate_literal_does_not_crash also looks well targeted since it exercises both string_length and string_contains on "\u{D800}".

✏️ Learnings added

Learnt from: aallan
URL: https://github.com/aallan/vera/pull/810

Timestamp: 2026-06-27T11:15:31.091Z
Learning: In `vera/smt.py`, string literals containing lone surrogates (`U+D800–U+DFFF`) must not be translated at Tier 1: the `string_length` literal path can raise `UnicodeEncodeError` on UTF-8 encoding and should defer to Tier 3, and the predicate literal path should also defer rather than calling `z3.StringVal`. Regression coverage exists in `tests/test_string_length_soundness.py` via `test_surrogate_literal_does_not_crash`, which exercises both `string_length` and `string_contains` on `"\u{D800}"`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@spec/06-contracts.md`:
- Around line 257-258: The String length/spec paragraph overstates the U+2FFFF
fallback and conflicts with `vera/smt.py` and
`tests/test_string_length_soundness.py`. Update the wording in the string
builtins section so the U+2FFFF cutoff applies only to `z3.StringVal`-based
predicate translation (with lone surrogates still deferring to Tier 3), while
`string_length` on literals remains Tier 1 via byte-counting. Keep the existing
references to `string_length`, `string_contains`, `string_starts_with`,
`string_ends_with`, and `z3.StringVal` to anchor the revised scope.

In `@tests/test_string_length_soundness.py`:
- Around line 243-254: The current assertions in test_string_length_soundness.py
only check that tier3_runtime is reached, which still allows a deferred
verification error; update the soundness checks around _verify for string_length
and string_contains to also assert _ok(length) and _ok(predicate). Keep the
existing Tier-3 fallback assertion, but make the test fail if the verifier
reports an E500/checker error instead of a clean graceful fallback, using the
same length and predicate variables.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 161dfdbf-8c72-49e1-9346-2798caa1e07e

📥 Commits

Reviewing files that changed from the base of the PR and between 0195d9c and 2961d91.

📒 Files selected for processing (7)

CHANGELOG.md
README.md
ROADMAP.md
TESTING.md
spec/06-contracts.md
tests/test_string_length_soundness.py
vera/smt.py

…r-3 fallback (review) CodeRabbit review of #810: - spec/06 section 6.4.3: the string-length paragraph overstated the U+2FFFF cutoff. It applies only to the z3.StringVal-based predicate translation; string_length byte-counts the decoded literal, so an astral (> U+2FFFF) literal's length stays Tier 1. Reworded to scope the cutoff to the predicates, with the lone surrogate (no UTF-8 encoding) still deferring on both paths. - tests/test_string_length_soundness.py: the surrogate test now also asserts _ok(length) and _ok(predicate), so it fails on an E500 / checker error rather than only requiring tier3_runtime — pinning the graceful fallback, not just any deferral. Refs #802, #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

…s-loop test (review-pr) pr-review-toolkit review of #810 (4 agents — all found the fix sound; 0 critical/important code issues). Addressed the agreed findings: - Doc accuracy: comments / spec / CHANGELOG / test-docstring claimed z3.StringVal "raises" on a lone surrogate. It does not (on z3 4.16 it silently stores the code point's escape text, same as the > U+2FFFF astral case); only string_length's value.encode("utf-8") genuinely raises. Reworded vera/smt.py, spec/06, CHANGELOG, and the surrogate test docstring; fixed the astral escape-text illustration ("\U000xxxxx" -> the actual "\u{...}" form). - tests/test_codegen.py: added a runtime soundness-loop test — a slot-arg string_length contract that `vera verify` DEFERS to Tier 3 is enforced at runtime (ensures(@Int.result == 1) over "é" raises a postcondition violation), closing the "deferral is sound, not merely imprecise" loop the verifier-side tests leave open. No behaviour change (comment / test / doc only). Refs #802, #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-06-27T13:07:13Z

pr-review-toolkit review (4 agents, parallel)

Ran a comprehensive review of this PR — code-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer — against head b350578.

Critical / Important: none. All four agents independently validated the fix as sound:

The literal byte-model + non-literal Tier-3 deferral is conservative (None → Tier 3 verified at every translate_expr consumer — never a false discharge); the except UnicodeEncodeError is correctly narrow.
The astral cutoff (> U+2FFFF) and surrogate guards are complete w.r.t. the inputs Z3 can't model; mutation-validated (reverting the byte-model / flipping > to >= / removing the surrogate guard each flips the right tests RED).
The UTF-8 self-synchronization soundness of the kept predicates was re-checked over ~20k randomized cases (0 mismatches).

Addressed (the only findings — doc accuracy + one test), in b350578:

Two agents flagged that the comments/spec/CHANGELOG/test-docstring said z3.StringVal raises on a lone surrogate. Verified on z3 4.16: it does not — it silently stores the code point's escape text (same as the astral case); only string_length's encode("utf-8") genuinely raises. The guard returns None before StringVal either way, so behaviour was always sound — only the rationale was wrong. Reworded vera/smt.py, spec/06, CHANGELOG, the surrogate test docstring; corrected the astral escape-text illustration to the actual \u{...} form.
Added a runtime soundness-loop test (tests/test_codegen.py): a slot-arg string_length contract that vera verify defers to Tier 3 raises a postcondition violation at runtime — closing the "deferral is sound, not just imprecise" loop the verifier-side tests leave open.

No behaviour change in this round.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_string_length_soundness.py (1)

217-236: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

This boundary test is exercising the requires(...) fallback, not just the z3.StringVal cut-off.

The "\u{30000}" literal in requires(@String.0 == ...) takes the repo’s known untranslatable-literal path before string_starts_with(...) becomes the interesting part, so tier3_runtime >= 1 can still pass without pinning the > 0x2FFFF predicate boundary itself. Use literal-on-literal probes for both sides of the cut-off and assert the deferred case stays _ok(...).

Suggested fix

-        modeled = _verify("""
-public fn f(`@String` -> `@Bool`)
-  requires(`@String.0` == "\\u{2FFFF}")
+        modeled = _verify("""
+public fn f(`@Unit` -> `@Bool`)
+  requires(true)
   ensures(`@Bool.result` == true)
   effects(pure)
-{ string_starts_with(`@String.0`, "\\u{2FFFF}") }
+{ string_starts_with("\\u{2FFFF}", "\\u{2FFFF}") }
 """)
         assert _ok(modeled) and modeled.summary.tier3_runtime == 0, (
             modeled.summary.tier1_verified, modeled.summary.tier3_runtime,
         )
         deferred = _verify("""
-public fn f(`@String` -> `@Bool`)
-  requires(`@String.0` == "\\u{30000}")
+public fn f(`@Unit` -> `@Bool`)
+  requires(true)
   ensures(`@Bool.result` == true)
   effects(pure)
-{ string_starts_with(`@String.0`, "\\u{30000}") }
+{ string_starts_with("\\u{30000}", "\\u{30000}") }
 """)
+        assert _ok(deferred), [d.description for d in deferred.diagnostics]
         assert deferred.summary.tier3_runtime >= 1, (
             deferred.summary.tier1_verified, deferred.summary.tier3_runtime,
         )

Based on learnings, astral literals inside requires(...) are dropped as untranslatable in this repo, and as per path instructions this test should pin the exact observable it claims to cover.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_string_length_soundness.py` around lines 217 - 236, The current
test in test_string_length_soundness is hitting the untranslatable-literal
fallback in requires(...) instead of validating the string boundary in
string_starts_with(...). Update the probe to use literal-on-literal checks on
both sides of the 0x2FFFF cutoff, and make the deferred case assert _ok(...)
with the expected tier3_runtime behavior so the test directly covers the
intended boundary rather than the fallback path. Use the existing _verify, _ok,
and string_starts_with symbols to keep the assertions focused on the cutoff
behavior.

Sources: Path instructions, Learnings

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_codegen.py`:
- Around line 1239-1243: The test in `test_codegen.py` is too broad because it
accepts raw `wasmtime.Trap`, `WasmtimeError`, and any `RuntimeError`, which can
hide unrelated failures in `execute()`. Narrow the assertion to the
contract-violation wrapper exposed by `vera/codegen/api.py::execute`, and verify
the observable trap shape it normalises to instead of matching generic exception
types. Keep the test focused on the `execute` call with the invalid raw argument
and assert the specific wrapper/error class that represents contract violations.

---

Outside diff comments:
In `@tests/test_string_length_soundness.py`:
- Around line 217-236: The current test in test_string_length_soundness is
hitting the untranslatable-literal fallback in requires(...) instead of
validating the string boundary in string_starts_with(...). Update the probe to
use literal-on-literal checks on both sides of the 0x2FFFF cutoff, and make the
deferred case assert _ok(...) with the expected tier3_runtime behavior so the
test directly covers the intended boundary rather than the fallback path. Use
the existing _verify, _ok, and string_starts_with symbols to keep the assertions
focused on the cutoff behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b261ff19-6151-4a8c-9540-3dd409c58d5a

📥 Commits

Reviewing files that changed from the base of the PR and between 7376df2 and b350578.

📒 Files selected for processing (8)

CHANGELOG.md
README.md
ROADMAP.md
TESTING.md
spec/06-contracts.md
tests/test_codegen.py
tests/test_string_length_soundness.py
vera/smt.py

CodeRabbit review of b350578 (1 inline + 1 outside-diff, both "Functional Correctness | Minor"). Both findings are the same anti-pattern — an assertion satisfiable by something other than its named target — verified valid by reproduction + mutation: - tests/test_codegen.py: the round-5 runtime soundness-loop test used pytest.raises((WasmtimeError, Trap, RuntimeError)). execute() normalises every wasmtime trap into WasmTrapError (a RuntimeError subclass), so the broad form green-passes on any failure — a setup error or a regression that traps for a different reason. Narrowed to raises(WasmTrapError) + assert kind == "contract_violation" (confirmed empirically: a postcondition violation raises exactly this kind). Mutation: asserting the wrong kind flips the test RED. - tests/test_string_length_soundness.py (test_astral_cutoff_boundary): the slot-arg precondition form (a String param compared to an astral literal) produces TWO tier3 obligations — the astral requires literal also defers — so tier3 >= 1 would still pass if only the requires deferred and the body predicate regressed to Tier 1, masking the soundness-critical path. Rewrote to literal-on-literal with a trivial precondition so a SINGLE postcondition obligation is driven solely by the body predicate's astral literal, and added an explicit clean-deferral assertion. Mutation: removing the > 0x2FFFF guard now flips the isolated assertion RED. Test-only, plus the two TESTING.md line counts. No behaviour change. Refs #802, #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-06-27T13:39:55Z

@coderabbitai Re: the outside-diff finding on tests/test_string_length_soundness.py test_astral_cutoff_boundary (217-236) — valid on a precision point, fixed in 2d2db6e. The mechanism is slightly different from the writeup: the slot-arg requires(@String.0 == "\u{30000}") form produces two tier3 obligations (the astral requires literal defers as an obligation, not just an assumption), so tier3 >= 1 couldn't isolate the body-predicate cutoff — a body-only regression to Tier 1 would still pass. Rewrote both sides to literal-on-literal with requires(true), so a single ensures obligation is driven solely by the body predicate's astral literal, and added assert _ok(deferred). Mutation-checked: removing the > 0x2FFFF guard now flips the isolated tier3 >= 1 assertion RED.

coderabbitai · 2026-06-27T13:40:14Z

Tip

For best results, initiate chat on the files or code changes.

@aallan Nice catch on the precision issue — thanks for tightening that.

You're right: with requires(@String.0 == "\u{30000}"), the test was conflating two Tier 3 sources, so tier3 >= 1 wasn't sufficient to prove the body-predicate cutoff itself was what deferred. Rewriting it to literal-on-literal under requires(true) makes the obligation accounting much cleaner, and adding assert _ok(deferred) plus the mutation check is exactly the kind of isolation I was hoping for.

That addresses my concern.

✏️ Learnings added

Learnt from: aallan
URL: https://github.com/aallan/vera/pull/810

Timestamp: 2026-06-27T13:40:10.152Z
Learning: In `tests/test_string_length_soundness.py`, the `test_astral_cutoff_boundary` regression must isolate the body-predicate astral cutoff in `vera/smt.py` without introducing extra Tier 3 obligations from assumptions. A form like `requires(String.0 == "\u{30000}")` creates two Tier 3 obligations, so `tier3 >= 1` is not specific enough. The robust pattern is literal-on-literal with `requires(true)`, so a single `ensures` obligation is driven solely by the body predicate's astral literal, together with `assert _ok(deferred)` and a mutation check that removing the `> 0x2FFFF` guard flips the isolated assertion red.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

aallan · 2026-06-27T13:56:58Z

@coderabbitai resume

aallan · 2026-06-27T13:56:59Z

@coderabbitai full review

coderabbitai · 2026-06-27T13:57:03Z

✅ Action performed

Reviews resumed.

coderabbitai · 2026-06-27T13:57:04Z

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@TESTING.md`:
- Line 78: Update the TESTING.md table summary for
test_string_length_soundness.py to also mention the astral/surrogate-literal
predicate regression; the current row only describes the string_length
byte-count behavior and Tier 1 predicates. Edit the row text so it references
the astral/surrogate literal path and its Tier 3 deferral, using the existing
test_string_length_soundness.py entry as the anchor.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bb3cf7ef-55e5-494f-9c17-021c199aef95

📥 Commits

Reviewing files that changed from the base of the PR and between 7376df2 and 2d2db6e.

📒 Files selected for processing (8)

CHANGELOG.md
README.md
ROADMAP.md
TESTING.md
spec/06-contracts.md
tests/test_codegen.py
tests/test_string_length_soundness.py
vera/smt.py

… row (CodeRabbit) CodeRabbit full review of 2d2db6e flagged that the test_string_length_soundness.py summary row described the string_length byte-count fix and the Tier-1 predicate cases but omitted the astral/surrogate literal predicate path the file also covers (TestAstralStringLiteral802) — which defers to Tier 3 because z3.StringVal cannot model those code points. Appended that clause so the row matches the file's actual coverage. Doc-only. Refs #802, #392. Co-Authored-By: Claude <noreply@anthropic.invalid>

aallan · 2026-06-27T14:05:17Z

@coderabbitai full review

coderabbitai · 2026-06-27T14:05:22Z

✅ Action performed

Full review finished.

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread tests/test_string_length_soundness.py

Comment thread tests/test_string_length_soundness.py Outdated

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread spec/06-contracts.md Outdated

Comment thread tests/test_string_length_soundness.py Outdated

T and others added 2 commits June 27, 2026 12:26

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread tests/test_codegen.py

coderabbitai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread TESTING.md Outdated

aallan merged commit d725d2d into main Jun 27, 2026
27 checks passed

aallan deleted the fix/802-string-utf8-bytes branch June 27, 2026 15:48

aallan mentioned this pull request Jun 27, 2026

Audit smt.py Z3 translation layer for verification soundness #392

Closed

Conversation

aallan commented Jun 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The bug

The fix

Also found in review — astral-plane string literals (folded in)

Exit checklist (§0)

Tests

Reproduce-before-fix note (CR's suggested option)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aallan commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aallan commented Jun 27, 2026

pr-review-toolkit review (4 agents, parallel)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aallan commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026

Uh oh!

aallan commented Jun 27, 2026

Uh oh!

aallan commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aallan commented Jun 27, 2026

Uh oh!

coderabbitai Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aallan commented Jun 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

codecov Bot commented Jun 27, 2026 •

edited

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading

coderabbitai Bot commented Jun 27, 2026 •

edited

Loading