Context
Downstream users need a default way to choose the SignBitmap / Bitmap candidate budget before exact RankQuant rerank. If every integration hardcodes max(min_candidates, k * multiplier) differently, OrdVec’s two-stage behavior becomes fragmented and hard to benchmark.
This is related to #130, but distinct: #130 estimates allocation/scan cost. This issue is about a blessed retrieval-policy helper for candidate counts, especially for the SignBitmap → RankQuant default path used by OrdinalDB.
Related: #172 evaluates sign scaling and soft candidate-gate strategies; the helper can start conservative and evolve with evidence.
Evidence
SignBitmap::top_m_candidates takes an explicit m and clamps only to len: src/sign_bitmap.rs:133-159.
Bitmap::top_m_candidates takes an explicit m and clamps only to len: src/bitmap.rs:192-237.
RankQuant::search_asymmetric_subset clamps final k to the candidate list length: src/quant.rs:569-572.
- Persisted-format docs say candidate-count selection is tracked outside the index bytes:
docs/PERSISTED_FORMAT.md:89.
Proposed Shape
Sketch:
pub struct TwoStageCandidatePolicy {
pub min_candidates: usize,
pub k_multiplier: usize,
pub max_candidates: Option<usize>,
}
impl Default for TwoStageCandidatePolicy { /* conservative default */ }
impl TwoStageCandidatePolicy {
pub fn candidate_count(&self, k: usize, n_vectors: usize) -> Result<usize, CandidatePolicyError>;
}
Equivalent naming is fine. A small free function is also acceptable if a struct is overkill.
Acceptance Criteria
Non-goals
- No automatic quality guarantee or benchmark claim.
- No query planner.
- No hidden dynamic tuning unless benchmarked and documented.
Context
Downstream users need a default way to choose the SignBitmap / Bitmap candidate budget before exact RankQuant rerank. If every integration hardcodes
max(min_candidates, k * multiplier)differently, OrdVec’s two-stage behavior becomes fragmented and hard to benchmark.This is related to #130, but distinct: #130 estimates allocation/scan cost. This issue is about a blessed retrieval-policy helper for candidate counts, especially for the SignBitmap → RankQuant default path used by OrdinalDB.
Related: #172 evaluates sign scaling and soft candidate-gate strategies; the helper can start conservative and evolve with evidence.
Evidence
SignBitmap::top_m_candidatestakes an explicitmand clamps only tolen:src/sign_bitmap.rs:133-159.Bitmap::top_m_candidatestakes an explicitmand clamps only tolen:src/bitmap.rs:192-237.RankQuant::search_asymmetric_subsetclamps finalkto the candidate list length:src/quant.rs:569-572.docs/PERSISTED_FORMAT.md:89.Proposed Shape
Sketch:
Equivalent naming is fine. A small free function is also acceptable if a struct is overkill.
Acceptance Criteria
k == 0,n_vectors == 0,k > n_vectors, and overflow without panicking.k, zero cases, and saturated/overflow boundaries.Non-goals