Fieldnote-Echo · project-navi-bot · May 27, 2026 · May 27, 2026 · May 27, 2026 · May 27, 2026
@@ -78,11 +78,25 @@ independent uniform top-bucket sets — overlap **hypergeometrically**,
 unrelated document clears a given overlap threshold — is closed-form and
 data-independent, not a tuned cutoff. (Whether *true* neighbours clear the bar
 is empirical; this is an exact candidate-generation null, not a
-retrieval-optimality theorem.) The invariance underneath it — that the rank
-transform is unchanged by any strictly monotone reparametrisation of the
-coordinates — is separately machine-checked in Lean, with the formalisation
-accompanying the paper. Details in
-[`docs/RANK_MODES.md`](docs/RANK_MODES.md).
+retrieval-optimality theorem.) Two pieces of this are separately machine-checked in Lean 4, both `sorry`-free
+on Lean's standard axiom base (`propext`, `Classical.choice`, `Quot.sound`):
+
+- the **ordinal invariance** on which the rank transform rests — that a vector's
+  sorting permutation is unchanged by any strictly monotone reparametrisation
+  of its coordinates — in
+  [`takens-formalization`](https://github.com/Project-Navi/takens-formalization)
+  (theorem `isOrdinalPatternOf_comp_strictMono`); and
+- the **shape of the bitmap candidate filter** — that under a finite
+  monotone-likelihood-ratio overlap-tilt model, an overlap-count *threshold*
+  (the popcount cutoff) is the Bayes-optimal deterministic admission rule, and
+  the uniform constant-weight null gives that threshold exactly the
+  hypergeometric upper tail — in
+  [`ordvec-formalization`](https://github.com/Fieldnote-Echo/ordvec-formalization)
+  (theorem `exists_uniformBitmapOverlapTail_finiteBayesRisk_le_and_hypergeomTail`).
+  This is an *in-model* result: it fixes the optimal rule's shape under the
+  stated assumptions, not a claim that any given corpus obeys the model.
+
+Details in [`docs/RANK_MODES.md`](docs/RANK_MODES.md).
 
 ## Quickstart
 

@@ -128,15 +128,23 @@ don't have a hypergeometric null because they don't have fixed
 bucket cardinalities — their score distribution depends on the
 unknown embedding distribution.
 
-**A research program this suggests.** The chain — representation →
-statistic → retrieval theorem → systems implementation — has a
-plausible formal target. Under a shared-latent-support model where
-relevant documents have elevated coordinates on a query-specific
-support set `S_q`, the top-bucket overlap statistic is monotone in
-the likelihood ratio for relevance, suggesting that bitmap probing
-may approach Bayes-optimality under that model. We do not claim
-that theorem here; this section flags it as the natural
-mathematical direction for the empirical results below.
+**A research program this suggests — now partly machine-checked.** The
+chain — representation → statistic → retrieval theorem → systems
+implementation — now has a checked link in the middle. Under a finite
+monotone-likelihood-ratio overlap-tilt model (the formal cousin of a
+shared-latent-support model, where relevant documents have elevated
+coordinates on a query-specific support set `S_q`), the top-bucket
+overlap statistic is monotone in the likelihood ratio for relevance,
+and an overlap-count threshold — the popcount cutoff — is the
+Bayes-optimal deterministic admission rule, with the uniform
+constant-weight null assigning that threshold exactly the
+hypergeometric upper tail. That is proved `sorry`-free in
+[`ordvec-formalization`](https://github.com/Fieldnote-Echo/ordvec-formalization)
+(`exists_uniformBitmapOverlapTail_finiteBayesRisk_le_and_hypergeomTail`).
+It is an *in-model* result: it fixes the shape of the optimal rule
+under the stated assumptions, not a claim that any given corpus obeys
+the model — whether real neighbours clear the bar stays empirical,
+which is what the bench and the paper measure.
 
 The systems consequence is what the bench measures: at a moderate M
 the bitmap probe captures most of exact RankQuant's top-10 neighbours,

@@ -180,6 +180,7 @@ impl Bitmap {
     /// `Vec<u32>` of all N scores and `select_nth_unstable` the
     /// top-`m`: O(N + m log m). The ~808 KiB temp at N=207k is
     /// cheap relative to the cost it saves at M ≥ 1000.
+    #[must_use = "this scans the corpus to generate candidates; dropping the result discards that work"]
     pub fn top_m_candidates(&self, q: &[f32], m: usize) -> Vec<u32> {
         assert_all_finite(q);
         let m_eff = m.min(self.n_vectors);
@@ -224,6 +225,7 @@ impl Bitmap {
     /// `queries` is a flat `batch * dim` f32 slice. Returns
     /// `Vec<Vec<u32>>` with one top-`m` set per query, sorted by
     /// overlap descending.
+    #[must_use = "this scans the corpus per query to generate candidates; dropping the result discards that work"]
     pub fn top_m_candidates_batched(&self, queries: &[f32], m: usize) -> Vec<Vec<u32>> {
         let dim = self.dim;
         let batch = queries.len() / dim;
@@ -294,6 +296,7 @@ impl Bitmap {
     /// [`Self::top_m_candidates_batched`] in parallel via rayon. Use
     /// when the full query workload is larger than one batch fits
     /// efficiently in L2/L3.
+    #[must_use = "this scans the corpus per query to generate candidates; dropping the result discards that work"]
     pub fn top_m_candidates_batched_chunked(
         &self,
         queries: &[f32],

@@ -109,6 +109,7 @@ pub type RankQuantFastscanIndex = RankQuantFastscan;
 /// type. Prefer the slice accessors above for read-only per-query access —
 /// exposing the flat buffers as the stable representation is the trade-off for
 /// that zero-copy interop.
+#[must_use = "search runs the full scan to produce these results; dropping them discards that work"]
 pub struct SearchResults {
     pub scores: Vec<f32>,
     pub indices: Vec<i64>,

@@ -125,6 +125,7 @@ impl SignBitmap {
     /// `m_eff` produce a deterministic survivor set across runs and
     /// SIMD dispatch paths — same audit discipline as
     /// [`crate::Bitmap::top_m_candidates`].
+    #[must_use = "this scans the corpus to generate candidates; dropping the result discards that work"]
     pub fn top_m_candidates(&self, q: &[f32], m: usize) -> Vec<u32> {
         let m_eff = m.min(self.n_vectors);
         if m_eff == 0 {
@@ -157,6 +158,7 @@ impl SignBitmap {
     /// top-`m` candidate sets for `batch` queries in parallel. Mirrors
     /// [`crate::Bitmap::top_m_candidates_batched`] in kernel
     /// shape (CHUNK=8 hot + tail) and tie-break semantics.
+    #[must_use = "this scans the corpus per query to generate candidates; dropping the result discards that work"]
     pub fn top_m_candidates_batched(&self, queries: &[f32], m: usize) -> Vec<Vec<u32>> {
         let dim = self.dim;
         let batch = queries.len() / dim;