Improved models for SIMD intrinsics by karthikbhargavan · Pull Request #1481 · celabshq/libcrux

karthikbhargavan · 2026-06-17T10:53:00Z

This PR does some cleanup in the SIMD intrinsics models, with a view to unifying
the proofs in SHA-3, ML-KEM, and ML-DSA and adding more tests for these models:

-crates/utils/core-models — executable reference models, differential tests, and lemmas
-crates/utils/intrinsics — the trusted F* axioms in avx2_extract.rs / arm64_extract.rs

This PR is layered on top of the upgrade-0.3.7 branch and is a pre-requisite for merging the SHA-3, ML-KEM, and ML-DSA proofs.

The executable bit-vector reference models the SIMD trust axioms are validated against, unioned from the per-branch work: - ARM/NEON core_arch (arm.rs + arm/{neon,neon_handwritten,interpretations}). - x86 interpretations superset (3-way merge) incl. the per-lane lift bodies; deduped _mm256_madd_epi16, reconciled the equivalent _mm256_bsrli_epi128 idiom. - bitvec.rs keeps the hax-0.3.7 #_v_F fix + the u8/64-bit lane interps. x86/interpretations.rs: mark the _mm256_{loadu,storeu}_si256_{i16,i32,u8} typed wrappers #[hax_lib::exclude]. They call BitVec::{from_slice,to_vec} (generic, themselves excluded) and are only the Rust-side reference for the differential tests; left un-excluded they extract as dangling references and break F* typechecking of the Int_vec module for any consumer that builds it.

The harness used to validate the SIMD intrinsic reference models against the real hardware semantics and to audit the trust surface: - scripts/cross-validate.py, intrinsics-audit.py, classify-nospec.py, check-intrinsics-parity.sh. - the trust-surface index (intrinsics-trust-index.{md,csv}), the no-spec classification, and the cross-validation findings.

avx2_extract.rs / arm64_extract.rs = ml-kem ⊕ sha3 (trust-base avx2 dropped: never F*-verified). cmpgt_epi16 uses ml-kem's sound VPCMPGTW form. sha3's u64x4 Keccak lane view + ARM64 Keccak fallback bodies are folded in. avx2_extract.rs: - Relocate sha3's u64x4 lane view (vec256_as_u64x4 / get_lane_u64x4 / lemma_get_lane_u64x4_bit) out of the Vec256 fstar::replace(interface) block into a fstar::before(interface) block on get_lane_u64, so it follows the Vec256/Vec128 struct typeclass instances. Keeps ml-kem's i16-view block byte-identical to its verified original. - Restore ml-kem's i16-view lemma_mm256_xor_si256 / lemma_mm256_srli_epi16 alongside sha3's u64-view lemmas; they are explicitly called by Libcrux_ml_kem.Vector.Avx2.Compress. avx2.rs: add the #[hax_lib::ensures(|_r| future(output).len() == output.len())] length-preservation post to the six &mut [T] storeu wrappers, matching mm_storeu_bytes_si128 which already had it. #[hax_lib::opaque] without an explicit ensures extracts a trivial post under hax 0.3.7, dropping the output-length fact ML-DSA's Simd.Avx2.Rejection_sample proofs depend on. Affects only the real Libcrux_intrinsics.Avx2 (ML-DSA's path); ml-kem/sha3 use the Avx2_extract model and never build it. fstar-helpers/fstar-bitvec: BitVec.Intrinsics.fsti union (ml-kem base + sha3 u64x4 keccak ops) + Bitvec.{U64Rotate,VxarqProof} keccak-rotate helpers.

…q/vbcaxq/vrax1q/vxarq) The four ARM64 NEON SHA3-extension fallbacks in crates/utils/intrinsics/src/arm64_extract.rs _veor3q_u64 == (a ^ b) ^ c _vbcaxq_u64 == a ^ (b & ~c) _vrax1q_u64 == a ^ rotate_left(b, 1) _vxarq_u64 == rotate_left(a ^ b, LEFT) have REAL bodies (compositions of the basic NEON ops), not unimplemented!() model stubs, and carry #[hax_lib::ensures] specs. But until now the only F* artifact any consumer built for the intrinsics crate was the .fsti interface, so these fallback specs were trusted as axioms — nobody proved the bodies. This adds a dedicated, isolated F* verification that actually proves them: - fstar-helpers/fstar-bitvec/Bitvec.Sha3FallbackProof.fst: states each fallback body (mirrored verbatim from the hax-extracted Arm64_extract.fst) as a Pure function whose ensures is the fallback's spec, and discharges it. The proofs rest ONLY on the basic NEON op specs (e_veorq_u64 / e_vbicq_u64 / e_vshlq_n_u64 / e_vshrq_n_u64 from Arm64_extract.fsti) plus the one auditable u64 rotate identity in Bitvec.U64Rotate. They do NOT depend on any of the unimplemented!() model ops that keep Arm64_extract.fst from verifying whole — which is why the fallback proofs are isolated into their own module rather than rooting Arm64_extract.fst / Avx2_extract.fst. - crates/utils/intrinsics/proofs/fstar/Makefile: dedicated verification whose ROOTS are Bitvec.U64Rotate / Bitvec.VxarqProof / Bitvec.Sha3FallbackProof. Only Libcrux_intrinsics.Arm64_extract.fsti is pulled in (the .fst bodies and the model ops stay trusted as .fsti axioms, as in every consumer crate). - .github/workflows/intrinsics-hax.yml: extracts the intrinsics crate and runs the verification, modeled on sha3-hax.yml (hax pinned via get-hax-ref, F* v2026.03.24). Verified clean from a cold cache: 41 modules, all VCs discharged, no admits on the four fallbacks. The .gitignore is narrowed so the hand-written Makefile is tracked while generated extraction artifacts stay ignored.

`get_lane_u64` stores a Vec256 into a `[u64; 4]` by reinterpreting it as `&mut [u8]` via `core::slice::from_raw_parts_mut`. hax rejects raw pointers (`reject_RawOrMutPointer`), so any consumer that extracts the real `avx2.rs` (the core-models world, i.e. without `--cfg pre_core_models` — ml-dsa) failed to extract the intrinsics crate. Every other raw-pointer store/load wrapper in this file is already `#[hax_lib::opaque]` (e.g. mm256_storeu_si256_u8, mm_storeu_si128_u128); get_lane_u64 was the lone exception. Mark it opaque too, so hax extracts it as an uninterpreted `val` instead of trying to model the pointer body. Consumers that need a functional spec use the `avx2_extract` model (pre_core_models world); ml-dsa does not use get_lane_u64 at all. Fixes the ML-DSA `extract` CI job (reject_RawOrMutPointer at avx2.rs).

franziskuskiefer

Looks like CI isn't happy.
Also, should this be against the hax-0.3.7 branch instead of main then?

karthikbhargavan · 2026-06-17T11:23:24Z

Looks like CI isn't happy. Also, should this be against the hax-0.3.7 branch instead of main then?

yep, fixing.

Review re-requested

…jection_sample Upstreamed from the libcrux-ml-dsa-proofs campaign. Adds the trusted F* spec layer (D6.3) for the AVX2 intrinsics ml-dsa relies on; each `val` is the trusted spec for a CPU-differential-tested core-models model. - Spec.Intrinsics.fsti: 15 `val` SMTPat lemmas — store/load/setzero/ blendv/cmpeq/or/sign (compute_hint / use_hint) and movemask / storeu_si128 / loadu_si256_u8 + projections (rejection_sample). - intrinsics-trust-index.{md,csv}: refresh the trust ledger to match (T1 193 -> 194, D6.3 F* spec coverage 76.7% -> 86.6%).

franziskuskiefer

I think this needs a bit of work/cleanup/splitup.

The checked in F* code is stale. Should it be dropped?

franziskuskiefer · 2026-06-18T08:06:42Z

+#!/usr/bin/env python3
+"""
+classify-nospec.py — For each genuinely spec-free T1 wrapper, find call sites
+in ml-kem / ml-dsa / sha-3 and report the verification status of the caller.


All these things look global from the outside but only do something for those three algorithms. That needs to be made clear somewhere, and we need a way to extend this for other algorithms like AES or anything else that's using intrinsics.

franziskuskiefer · 2026-06-18T08:12:10Z

+    /// `result[i] = saturate_to_i16( (2 * a[i] * b[i]) >> 16 )`.
+    /// Saturation only fires for the (i16::MIN, i16::MIN) corner case (which
+    /// would yield a value >= 2^15 = i16 overflow).
+    pub fn vqdmulhq_s16(a: i16x8, b: i16x8) -> i16x8 {


This doesn't appear to handle the (i16::MIN, i16::MIN) case correctly.
This makes me think that there may be more issues (vqdmulhq_n_s32 appears to have it too). Can you improve testing, esp. for corner cases and make the PR more reviewable, i.e. into smaller digestible chunks?

franziskuskiefer · 2026-06-18T08:15:48Z

@@ -0,0 +1,132 @@
+#!/usr/bin/env bash


This script fails. Please make sure it works and we should run it on CI to make sure it's an actual CI gate.

franziskuskiefer · 2026-06-18T08:19:44Z

+    in FStar.Classical.forall_intro aux
+"#
+)]
+#[hax_lib::requires(LEFT >= 0 && LEFT <= 64)]


Is it <= 64 or <64. lemma_mm256_slli_epi64_u64x4 says <64.

franziskuskiefer · 2026-06-18T08:27:27Z

+            // clamped to 16), it is NOT taken modulo 16. The old `tmp % 16` here
+            // mirrored a Rust core bug whose fix we upstreamed, so the model must
+            // follow the spec rather than the (now-corrected) implementation.
            let tmp = IMM8 % 256;


Some are using rem_euclid and other like this one % 256. We should make sure it's consistent and what we want.

franziskuskiefer · 2026-06-18T08:29:17Z

+    if args.print_summary:
+        print(summary)
+    else:
+        print(summary)
+    return 0


This seems a bit silly.

franziskuskiefer · 2026-06-18T08:31:08Z

+# SIMD intrinsics trust index
+
+Generated by `crates/utils/core-models/scripts/intrinsics-audit.py`.
+See `crates/utils/core-models/INTRINSICS-TRUST-PLAN.md` for the trust-ladder definition (L0..L4) and the D6 sub-percentages.


There's no INTRINSICS-TRUST-PLAN.md.

franziskuskiefer · 2026-06-18T08:31:58Z

+  Rust_primitives.Integers.v sh > 0 /\ Rust_primitives.Integers.v sh < 64
+
+(* Bridge lemma — see module header. *)
+assume


I don't think this is listed as axiom?

…(parity with ml-dsa-proofs) coeff_gather_bv_lemma (masked 3-byte coefficient bit decomposition; analog of the existing shl_casted_u8_bv_lemma) + u8_to_bv_logand15_lemma / u8_to_bv_shr4_lemma (u8 low/high nibble bit lemmas). These are the D6.3 intrinsic-model val axioms backing the AVX2 rejection_sample functional proof landed on ml-dsa-proofs (6bcc4f0); CPU/Python-validatable, D6.5 (machine-int proof) deferred. Byte-identical to the versions verified there. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…8_as_i16x8 bit-decomp axiom For the ml-kem NEON from_bytes/to_bytes functional proofs, give the byte load/store intrinsics their bit_vec ensures (mirror of the AVX2 mm256_loadu/storeu_si256_u8 ensures), and add the canonical i16x8 lane-bit decomposition axiom for the 128-bit register: - e_vld1q_bytes: len==16 ==> bit_vec_equal result (bit_vec_of_int_t_array bytes 8) - e_vst1q_bytes: length + (len==16 ==> bit_vec_equal (bit_vec_of_int_t_array out 8) v) - bit_vec_of_int_t_array_vec128_as_i16x8_lemma (Arm64) — hardware-agnostic bit math, identical statement to the core-models-validated AVX2 analog. VALIDATION (hardware-differential, native arm64): the load/store LE-reinterpret semantics induced by these ensures are validated bit-exact against real NEON hardware — 2,000,000 differential checks, 0 fails (libcrux-notes/agent-status/neon_vld1q_bytes_hwdiff_validate-2026-06-22.rs). e_vst1q_bytes uses a single F* ensures (length /\ bit_vec) like the AVX2 storeu (the Lean-only Rust-expr length ensures is dropped; hax rejects two stacked ensures here). Shared crate — strengthening posts only, additive. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Give mm256_srai_epi32 its lane32-form ensures (was l_True): for 0<=s<32, lane32 result j == (lane32 vector j) / pow2 s. F* integer division is Euclidean floor, which matches the sign-filling arithmetic shift, and shift 0 is the identity (sound, unlike the logical srli). Needed by the ml-kem AVX2 montgomery_reduce_i32s proof (shift 16 sign-extends the low i16 lane). VALIDATION: the srai_epi32_general_lane_formula transcription test in interpretations.rs checks the axiom RHS against the core-models _mm256_srai_epi32 model (signed a[i] >> s) over random inputs incl. negative lanes and i32 edges; cargo test -p core-models green (23/23). The model is hardware-differential-tested via the existing mk! entry. Shared crate; additive (strengthens a post only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… axiom The i32 and u32 lane VIEWS of the same 128-bit NEON register (vec128_as_i32x4 / vec128_as_u32x4) are independent abstract vals; the existing _vreinterpretq_s32_u32 / _vreinterpretq_u32_s32 ensures only assert register equality (result == a), so a value cannot be tracked across the signedness reinterpret. Add e_vreinterpret_i32_u32_lane_bridge stating the two views are 2's-complement related (the bit-preserving bitcast semantics). Hardware-differential validated bit-exact against real arm64 vreinterpretq_{s32_u32,u32_s32}: 24,000,000 lane-checks, 0 fails (neon_vreinterpret_s32_u32_hwdiff_validate-2026-06-23.rs). Used by the NEON d-bit compress functional proof (ML-KEM A3). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Strengthen the F* model ensures of _vst1q_u8 in arm64_extract.rs from length-only to per-byte content: when the output slice has length 16, out_future[k] == get_lane_u8x16 v k for k<16 (lane k -> byte k, LE), anchored to the same get_lane_u8x16 convention as the trusted _vld1q_u8. Validated bit-exact against real arm64 hardware (14,400,000 differential checks, 0 fails: load->store round-trip + store of vreinterpretq_u8_s64; see libcrux-notes neon_vst1q_u8_hwdiff_validate-2026-06-24). The body stays lean::replace_body("()"). This is the trusted axiom that the Neon serialize_10/12 proofs (landing in the same session) consume. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

karthikbhargavan added 4 commits June 17, 2026 11:42

karthikbhargavan requested a review from a team as a code owner June 17, 2026 10:53

karthikbhargavan requested a review from franziskuskiefer June 17, 2026 10:53

karthikbhargavan added 2 commits June 17, 2026 13:00

intrinsics/core-models: cargo fmt

b10a024

franziskuskiefer previously requested changes Jun 17, 2026

View reviewed changes

karthikbhargavan marked this pull request as draft June 17, 2026 11:21

karthikbhargavan changed the base branch from main to upgrade-hax-0.3.7 June 17, 2026 11:23

karthikbhargavan added 3 commits June 17, 2026 13:51

changelog: add #1481 (improved models for SIMD intrinsics)

1566068

changelog: trim #1481 entry to title

9ab6877

changelog: tag #1481 with affected crates

f92c59f

karthikbhargavan marked this pull request as ready for review June 17, 2026 14:55

karthikbhargavan requested a review from franziskuskiefer June 17, 2026 14:55

karthikbhargavan added 3 commits June 17, 2026 20:11

Merge branch 'upgrade-hax-0.3.7' into pr2-intrinsics

038d7e0

Merge branch 'upgrade-hax-0.3.7' into pr2-intrinsics

a749383

franziskuskiefer requested changes Jun 18, 2026

View reviewed changes

karthikbhargavan and others added 5 commits June 19, 2026 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved models for SIMD intrinsics#1481

Improved models for SIMD intrinsics#1481
karthikbhargavan wants to merge 17 commits into
upgrade-hax-0.3.7from
pr2-intrinsics

karthikbhargavan commented Jun 17, 2026 •

edited

Loading

Uh oh!

franziskuskiefer left a comment

Uh oh!

karthikbhargavan commented Jun 17, 2026

Uh oh!

franziskuskiefer left a comment

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

franziskuskiefer Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

karthikbhargavan commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

franziskuskiefer left a comment

Choose a reason for hiding this comment

Uh oh!

karthikbhargavan commented Jun 17, 2026

Uh oh!

franziskuskiefer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

karthikbhargavan commented Jun 17, 2026 •

edited

Loading