feat(experimental): indexed MultiBucketBitmap contingency kernels + batched projections (#219)#225
Conversation
…ection API (#219) New experimental-gated `Contingency` (src/contingency.rs): the full nb×nb bucket-overlap table C[a][b] = |{coords: query∈a ∧ doc∈b}| for two equal-length &[u8] bucket-code slices, built in one O(dim) histogram pass. Projections mirror ordgraph::edge: top_overlap / diagonal_agreement / band_agreement / top_group_overlap / bucket_l1_distance / coarsened_counts / rankquant_symmetric_score + a general project(&weights) for learned/custom nb×nb matrices, and a Projection enum. Stateless — no index, no persistence, never wired into a search path; this is the pairwise-evidence container ordgraph migrates to. Kernel: scalar reference + AVX-512 (avx512f/bw/vpopcntdq) histogram with a masked 64-byte tail (live-lane masking avoids tail bucket-0 false matches), runtime is_x86_feature_detected dispatch matching #[target_feature], portable scalar fallback. #![deny(unsafe_op_in_unsafe_fn)] honored; no new deps. bucket_l1_distance returns u64 (the distance-weighted sum (nb-1)·dim overflows u32 for accepted dim up to u32::MAX). Counts are u32 (a cell <= dim). Tests reproduce ordgraph::edge's exact projection values + simd==scalar parity + constructor validation + the u32-overflow regression. Behind `experimental`, absent from the default build. Verified: fmt/clippy(-D warnings)/test green. Refs #219. Targets 0.6 (may land in 0.5.0 — maintainer decides). Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…atched projections (#219) Stacked on feat/contingency-dense (parity test cross-checks the dense Contingency). Indexed counterpart to the stateless API — one query bucket code vs many doc bitmaps, emitting tables/projections per doc: - contingency_row(q_bitmaps, doc_idx) -> full nb×nb table |Q_a ∩ D_b|, single pass. - diagonal_overlap_row(..) -> the nb diagonal cells (nb popcount-AND passes, not nb²). - project_all_batched(q_bitmaps, &[&weights]) -> docs×projections: builds each doc's table ONCE then applies every weight matrix to the cached integers (no per-projection rescan, unlike bilinear_score). rayon over docs. Plus diagonal_weights()/banded_weights() weight-matrix constructors. Kernels: scalar + AVX-512 VPOPCNTDQ (contingency/diagonal_accumulate_avx512vpop) mirroring bitmap.rs masked-tail popcount-AND, runtime-dispatched, portable fallback; #![deny(unsafe_op_in_unsafe_fn)] honored; no new deps; not wired into any search path. Correctness gate: scalar == SIMD == diagonal == dense Contingency, EXACT integer equality across dims 384/768/1024 (full + masked tail) × bits {2,4}. bench_contingency.rs (SYNTHETIC, 3 regimes): build-once/project-many 5.7-6.5×, no-rescan 4.4-5.5×, SIMD 2.3-3.4×. Verified: fmt/clippy(-D warnings)/test(experimental + default) green. Behind `experimental`. Refs #219. Targets 0.6 (may land in 0.5.0 — maintainer decides). Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Code Review by Qodo
Context used 1.
|
PR Summary by QodoAdd indexed contingency kernels and batched projections to MultiBucketBitmap (experimental) WalkthroughsDescription• Add per-doc indexed contingency table + diagonal fast path on MultiBucketBitmap. • Add batched projections to reuse per-doc tables across many weight matrices. • Add AVX-512 VPOPCNTDQ runtime-dispatched kernels with parity tests and benchmarks. Diagramgraph TD
A["bench_contingency example"] --> B["MultiBucketBitmap (API 2)"] --> C["contingency/diagonal accumulate"] --> D["runtime dispatch"]
D --> E["AVX-512 VPOPCNTDQ kernel"]
D --> F["scalar kernel"]
B --> G["project_all_batched"]
H["tests: parity + batching"] --> B
High-Level AssessmentThe following are alternative approaches to this PR: 1. Reuse existing Bitmap overlap kernel via a shared primitive
2. Use portable SIMD (std::simd) instead of AVX-512 intrinsics
3. Cache per-doc contingency tables inside the index
Recommendation: The PR’s approach (build per-doc tables on demand, then batch-apply weight matrices) is the right tradeoff for issue #219: it captures the dominant algorithmic wins (build-once/project-many and no-rescan) without permanently increasing index memory. The main thing to consider is extracting shared masked-tail SIMD utilities to avoid long-term duplication with other bitmap kernels; portable SIMD is attractive but may not match AVX-512 VPOPCNTDQ performance today. File ChangesEnhancement (2)
|
There was a problem hiding this comment.
Code Review
This pull request introduces an indexed contingency and projection surface for MultiBucketBitmap (addressing issue #219), including AVX-512 VPOPCNTDQ optimized kernels with portable scalar fallbacks, batched projection methods, and a comprehensive profiling harness in examples/bench_contingency.rs. The review feedback recommends optimizing the parallel batched projection methods (project_all_batched and its scalar twin) by replacing heap-allocated temporary vectors with stack-allocated arrays to avoid allocator lock contention and overhead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- AVX-512 histogram kernel: precompute the nb bucket-value broadcast vectors once before the 64-byte block loop instead of recomputing set1_epi8 per block per bucket (gemini high). - band_agreement: iterate only the in-band columns instead of scanning all columns with an abs_diff filter. - rankquant_symmetric_score: factor the query weight out of the inner loop (nb multiplies instead of nb²). All exact: simd==scalar parity, rankquant-direct-sum parity, and projection parity tests unchanged and green. fmt/clippy clean. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
gemini: project_all_batched and its scalar twin allocated a Vec<u32> per document inside the rayon parallel map. nb is 2/4/16 (bits 1/2/4), so nb*nb <= 256 — use a stack [0u32; 256] sliced to nb*nb, removing the per-doc heap allocation and allocator contention from the batched hot loop. Exact parity unchanged (scalar==SIMD==diagonal==dense green). fmt/clippy clean. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit f9df6e9 |
…flow) (#224) radius is an uncapped public parameter (Projection::BandAgreement{radius}); a near-usize::MAX value overflowed qb + radius (panic in debug, silent wrap to a wrong/zero total in release). Use qb.saturating_add(radius). Regression test asserts band_agreement(usize::MAX) == total_count. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
qodo correctly flagged the infallible nb*nb allocation: nb is a caller-supplied usize (codes are u8, but nb is independent), so a large nb (e.g. 1<<20) would allocate a terabyte-scale table and abort the host. Cap nb at the u8 code domain (<=256) before the vec! — >256 buckets is also meaningless for u8 codes. (I had wrongly dismissed this as a false-positive; it is real.) Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…(Codex) The nb<=256 bound (36d0a0b) rejected the large_nb_uses_scalar_and_skips_range_scan test's nb=300. nb=256 is the new max and is still > 255, so it exercises the same behavior: find_out_of_range is skipped (every u8 code is < 256) and the scalar path is taken (nb > 16). Full suite green (I had filtered the prior run to the new test and missed this — fixed). Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
Per qodo finding #2 (PR #224): the comments in lib.rs for MultiBucketBitmap and Contingency/Projection did not make explicit that: - MultiBucketBitmap is NOT the default single-score retrieval path - MultiBucketBitmap is gated behind the non-default `experimental` feature, is unstable, and is excluded from semver guarantees - Contingency/Projection are the stable side of the `experimental` gate and ARE covered by semver guarantees Expanded the lib.rs comments to include an explicit warning on MultiBucketBitmap's niche (bilinear decomposition research, not production retrieval), its storage overhead (2-4x vs RankQuant), and its instability. Clarified that Contingency/Projection are the stable surface. Added matching feature-gate and stability notes to the contingency.rs module-level docs. Doc-only change; no logic touched. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…doc, speedup consistency) - Add release-active assert!(nb * nb <= 256, ...) before the stack-table slice in both project_all_batched and project_all_batched_scalar in src/multi_bucket.rs, making the nb<=16 invariant fail-loud locally instead of relying solely on MultiBucketBitmap::new's bits restriction. - Add a doc note to the indexed contingency/projection section header explaining that this surface is gated behind the non-default `experimental` feature, excluded from semver, and is not the default single-score retrieval path (RankQuant/Bitmap are preferred). - Fix inverted speedup output in examples/bench_contingency.rs: regime_a printed 1.0/speedup in the human table but speedup in DATA; regime_b printed 1.0/fastpath_speedup in the table column but fastpath_speedup in DATA. Both now print the non-inverted ratio consistently. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
… to module rustdoc qodo's 'Missing MultiBucketBitmap non-default docs' finding wants the experimental/non-default/semver-excluded note on the user-visible rustdoc, not an internal comment. Add a prominent stability banner to the module-level //! docs (what docs.rs renders): MultiBucketBitmap and the indexed contingency / projection kernels are gated behind the non-default `experimental` feature, are not stable public API, and are excluded from semver guarantees; the stable surface is the stateless dense Contingency / Projection API. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…e references) Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
…t/contingency-indexed Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit d62b411 |
…oning) qodo's 'Missing MultiBucketBitmap non-default docs' finding asks the docs to explicitly position MultiBucketBitmap as niche — not the default single-score retrieval path — and name the alternatives that dominate. Extend the module banner with a 'Positioning' note: MultiBucketBitmap is a research/analysis substrate (never kernel-optimized for retrieval); for primary retrieval use RankQuant (symmetric/asymmetric), Bitmap (top-bucket popcount), and the two-stage candidate-gen -> rerank flow. The indexed contingency/projection surface is for analyzing bucket-overlap structure, not primary retrieval. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
Resolved in
This states (a) the surface is niche, (b) it is not the default single-score retrieval path, and (c) the alternatives that dominate for retrieval — directly addressing the requirement-gap. Doc build is clean under |
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 766e3fa |
…ace) The retrieval-positioning note belongs on the crate-root docs.rs front page that every user sees, not only in the experimental-gated multi_bucket module (which a caller who never enables `experimental` never reads — yet they are exactly the audience choosing a retrieval path). Add the 'not a primary retrieval path; use RankQuant / Bitmap / two-stage' note to the lib.rs crate docs, right after the four headline substrate families. Uses a plain `MultiBucketBitmap` reference so the default (no-experimental) doc build stays link-clean; verified under RUSTDOCFLAGS=-D warnings both with and without the feature. Signed-off-by: Nelson Spence <nelson@projectnavi.ai>
|
Correction / stronger fix in
Verified link-clean under |
|
/agentic_review |
|
Code review by qodo was updated up to the latest commit 0f7d470 |
Signed-off-by: Nelson Spence <nelson@projectnavi.ai> # Conflicts: # src/lib.rs
What
The indexed counterpart (#219, API 2 of 2),
experimental-gated. Stacked on #224 — its parity test cross-checks the denseContingency, so merge #224 first (this PR will retarget tomainautomatically).Indexed methods on
MultiBucketBitmap— one query bucket code vs many doc bitmaps:contingency_row(q, doc_idx)— fullnb×nbtable per doc, single pass.diagonal_overlap_row(..)— thenbdiagonal cells (nbpopcount-AND passes, notnb²).project_all_batched(q, &[&weights])— docs×projections, builds each doc's table once then applies all weight matrices (no per-projection rescan). rayon over docs.diagonal_weights()/banded_weights()weight-matrix constructors.Kernel
Scalar + AVX-512 VPOPCNTDQ mirroring
bitmap.rsmasked-tail popcount-AND, runtime-dispatched, portable fallback.#![deny(unsafe_op_in_unsafe_fn)]honored, no new deps, not wired into any search path.Correctness
scalar == SIMD == diagonal == dense Contingency, exact integer equality, across dims 384/768/1024 (full + masked tail) × bits {2,4}.Benchmarks (
bench_contingency.rs, SYNTHETIC, 3 regimes)build-once/project-many 5.7–6.5×; no-rescan 4.4–5.5×; SIMD 2.3–3.4×. The algorithmic wins (build-once, no-rescan) dominate; SIMD is the constant factor on top.
Refs #219. Targets 0.6; may land in 0.5.0 — maintainer decides.