Context
ordgraph (ordgraph-proto/src/edge.rs) builds full bucket-overlap contingency tables C[a][b] = |{coords: query∈bucket a ∧ doc∈bucket b}| over fixed-composition BucketCodes, plus projections: top overlap, diagonal agreement, band agreement, top-group overlap, bucket L1 distance, coarsened counts, rankquant symmetric score. ordvec's experimental MultiBucketBitmap is the same bilinear bucket-overlap math (Σ_{a,b} W[a,b]·|Q_a ∩ D_b|) in bitmap form.
Profiling (branch experiment/multibucket-profile) established the durable conclusions:
- Symmetric scoring's recall loss (~0.10 @10 vs asymmetric) is real and kernel-independent.
- Full-table materialization is strictly more work than one scalar projection.
- Dense contingency dominates for one pair; dedicated scalar kernels dominate for one score.
MultiBucketBitmap earns its keep only when the workload is many docs AND multiple projections / full-table evidence per doc.
- ⇒
MultiBucketBitmap is NOT the default single-score retrieval path (two-stage / FastScan / RankQuant-asym dominate it on the full recall/latency/bytes frontier at every dim).
Reframe: two APIs, not one
1. Stateless dense-code contingency API (ordgraph migrates to this first)
- Inputs: two bucket-code / rank-code vectors.
- Output: full
buckets × buckets contingency table.
- Projection helpers: diagonal agreement, band agreement, top overlap, top-group overlap, bucket L1, rankquant-like score, learned/custom weight matrices.
- One
O(dim) histogram pass builds C; every projection is then cheap. This is the right container for one (query, doc) pair.
2. Indexed MultiBucketBitmap API (future indexed companion)
- Inputs: query bucket code + many document bitmaps.
- Output: full tables per doc or batched projections per doc.
- Must avoid rescanning per projection (compute table/accumulators once).
- Needs a VPOPCNTDQ / AVX-512-style popcount kernel with a portable fallback.
- Niche: one query vs many docs, full joint table or many projections per doc. Not positioned as a single-score retriever.
Boundary
- ordvec owns: contingency math, projection scoring, packed/vectorized kernels.
- ordgraph owns: "this projection result is evidence for X" (
EdgeEvidence graph/event payload).
Acceptance criteria
Notes
Not for 0.5.0. See docs/multibucket_profile_findings.md on the branch for the full profiling + adversarial verdict.
Context
ordgraph(ordgraph-proto/src/edge.rs) builds full bucket-overlap contingency tablesC[a][b] = |{coords: query∈bucket a ∧ doc∈bucket b}|over fixed-compositionBucketCodes, plus projections: top overlap, diagonal agreement, band agreement, top-group overlap, bucket L1 distance, coarsened counts, rankquant symmetric score. ordvec's experimentalMultiBucketBitmapis the same bilinear bucket-overlap math (Σ_{a,b} W[a,b]·|Q_a ∩ D_b|) in bitmap form.Profiling (branch
experiment/multibucket-profile) established the durable conclusions:MultiBucketBitmapearns its keep only when the workload is many docs AND multiple projections / full-table evidence per doc.MultiBucketBitmapis NOT the default single-score retrieval path (two-stage / FastScan / RankQuant-asym dominate it on the full recall/latency/bytes frontier at every dim).Reframe: two APIs, not one
1. Stateless dense-code contingency API (ordgraph migrates to this first)
buckets × bucketscontingency table.O(dim)histogram pass buildsC; every projection is then cheap. This is the right container for one(query, doc)pair.2. Indexed
MultiBucketBitmapAPI (future indexed companion)Boundary
EdgeEvidencegraph/event payload).Acceptance criteria
ordgraph::edge::Projection/EdgeEvidence) before the local copy is deleted.MultiBucketBitmapas a second indexed API, gated on table / batched-projection output (no per-projection rescan).MultiBucketBitmapis not the default single-score retrieval path.Notes
Not for 0.5.0. See
docs/multibucket_profile_findings.mdon the branch for the full profiling + adversarial verdict.