Audit-driven integration + Rust core fidelity work (7 waves)#83
Merged
Conversation
The store's side-car schema cherry-picked `{id, content, meta}` on
write and reconstructed Documents with only those fields plus `score`.
`Document.blob` (binary payloads, e.g. images/audio) and
`Document.sparse_embedding` (hybrid-search sparse vectors) were
silently dropped on every write_documents call and never re-attached
on filter_documents / embedding_retrieval / storage / save_to_disk.
This is the same shape of bug as #81 — fields available at write time
were not propagated to the result-object constructor. The reference
`InMemoryDocumentStore` preserves both fields verbatim.
Fix:
- Store `blob` and `sparse_embedding` in the in-memory dict alongside
`id` / `content` / `meta`.
- Pass them through `_reconstruct` so every retrieval path (filter,
embedding_retrieval, storage, save/load) returns Documents with both
fields populated when they were set on write.
- Add JSON serializers (`_serialize_doc_data` / `_deserialize_doc_data`)
for save_to_disk / load_from_disk — `ByteStream.to_dict()` and
`SparseEmbedding.to_dict()` give us JSON-safe forms.
- Bump docstore schema version 1 → 2; keep v1 loadable with blob /
sparse_embedding defaulting to None so existing on-disk stores
reload without manual migration.
Also in this PR:
- Tighten `_validate_filters` to match `InMemoryDocumentStore`
(`document_store.py:504-509`): a bare `{"field": "x"}` with no
`operator` or `conditions` is malformed; the reference rejects it
and we now do too.
- Scope back the module docstring's drop-in parity claim — BM25
retrieval (`bm25_retrieval` / `bm25_retrieval_async`) is not
implemented, so the unqualified "matches the public surface of
InMemoryDocumentStore" sentence was overpromising. Pipelines that
need keyword search alongside vector search should wire in an
explicit `InMemoryBM25Retriever` against a separate store.
Tests added (6):
- `test_blob_field_round_trips_through_filter_and_retrieval`
- `test_sparse_embedding_field_round_trips_through_filter_and_retrieval`
- `test_blob_and_sparse_embedding_survive_save_load_roundtrip`
- `test_documents_without_blob_or_sparse_embedding_round_trip_as_none`
- `test_load_accepts_v1_schema_with_no_blob_or_sparse_fields`
- `test_filter_documents_rejects_field_without_operator`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The side-car schema cherry-picked {text, metadata, ref_doc_id} on
write, so `_reconstruct_node` could only build a bare
`TextNode(id_, text, metadata)` plus the SOURCE relationship. Every
other field LlamaIndex preserves through `SimpleVectorStore` was
silently dropped on every `query` / `get_nodes` / persist+load:
- relationships other than SOURCE (PREVIOUS / NEXT / PARENT / CHILD)
— these power hierarchical RAG retrievers like AutoMergingRetriever
and RecursiveRetriever
- `excluded_embed_metadata_keys` / `excluded_llm_metadata_keys`
— silently changed downstream prompt content and embedding inputs
- `text_template`, `metadata_template`, `metadata_separator`
— change `get_content()` output for callers using non-default
templates
- `start_char_idx` / `end_char_idx` — citations / highlighting
- `mimetype` — defaulted to text/plain on retrieval
Fix: use llama-index's own `node_to_metadata_dict` /
`metadata_dict_to_node` (the framework's canonical full-fidelity
round-trip) and store the resulting `node_dict` alongside the existing
metadata + ref_doc_id (kept at top level for fast filter / doc-id
lookup without re-parsing _node_content per hit).
`_reconstruct_node` dispatches on schema shape: v2 entries carry
`node_dict` and round-trip via `metadata_dict_to_node`; v1 entries
(persisted before this fix) keep loading with their original narrow
fidelity rather than failing. Nodes schema bumps 1 → 2.
Also in this commit (filter / query parity against
`SimpleVectorStore` and `utils.build_metadata_filter_fn`):
- `FilterCondition.NOT` is implemented (was raising
NotImplementedError): matches when none of the inner filters match,
per the reference.
- `FilterOperator.TEXT_MATCH` is now case-SENSITIVE; previous impl
silently lowercased both sides (semantic divergence — our results
disagreed with SimpleVectorStore on mixed-case keys).
- `FilterOperator.TEXT_MATCH_INSENSITIVE` is now supported as the
opt-in case-folding form.
- `FilterOperator.ALL` and `FilterOperator.ANY` are now supported
(list-of-tags matching) instead of raising NotImplementedError.
- Both TEXT_MATCH variants raise `TypeError` on non-string operands,
matching the reference (`utils.py:141-144, 148-151`).
- `query.mode != VectorStoreQueryMode.DEFAULT` now raises
NotImplementedError instead of silently treating MMR / SVM / hybrid
as DEFAULT — those modes need full-precision vectors which the
quantized index discards.
Tests:
- Renamed `test_query_text_match_is_case_insensitive` →
`test_query_text_match_is_case_sensitive` (semantics changed).
- Replaced the now-stale `test_query_unsupported_filter_operator_raises`
with `test_query_text_match_raises_type_error_on_non_string_operands`.
- Added: TEXT_MATCH_INSENSITIVE, ALL, ANY, NOT, query.mode!=DEFAULT,
v1 backward-compat load, full-node fidelity through query +
get_nodes + persist (asserts every relationship type, excluded
metadata keys, char_idx, mimetype, templates).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_build_results` constructed returned Documents without `embedder=`, so downstream code couldn't call `doc.embed()` / `doc.async_embed()` on a retrieved hit — both methods fall back to `self.embedder`, which was None, raising "No embedder provided" at runtime. Matches `LanceDb._build_search_results`, which is the named drop-in target in the module docstring and passes `embedder=self.embedder` explicitly. Tests: - `test_search_results_carry_embedder` asserts the field is set on hits returned from `db.search`. - `test_async_search_results_also_carry_embedder` asserts the same for `db.async_search`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_search_vector` constructed the returned Documents without `id=sid`, so `similarity_search`, `similarity_search_with_score`, and `similarity_search_by_vector` all returned `Document.id = None` — breaking downstream LangChain code (retrievers, chains) that expects to identify search hits by id. The fix is one kwarg in two places: - The Document used in the callable-filter predicate (so user filters can match on `doc.id`, matching the InMemoryVectorStore convention). - The Document attached to each (Document, score) result tuple. Both call sites already had `sid` in scope; it just wasn't being passed to the constructor. `get_by_ids` already used the right pattern (`Document(id=sid, page_content=..., metadata=...)`). Adds two regression tests in test_langchain.py covering: - Each of the three similarity_search variants returns Documents with `.id` populated (both explicit ids and store-generated UUIDs). - The callable filter receives a Document with `.id` set so predicates can filter on it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests langchain-core's in-tree InMemoryVectorStore suite covers that we didn't replicate, surfaced by an audit of the reference test file. Each pins behaviour that's a drop-in regression risk: - `test_async_methods_await_aembed_functions` — patches aembed_* with AsyncMock to verify our async paths actually await the async embedder methods, not silently fall back to sync (would block the event loop with a real async embedder). - `test_add_documents_upsert_replaces_metadata` — re-adding a Document with the same id and new metadata must let the new metadata win. - `test_add_documents_does_not_mutate_inputs` — caller-passed Document objects must not be mutated (no in-place .id assignment, metadata dict identity preserved). - `test_add_documents_with_ids_is_idempotent` — re-running ingestion on an unchanged corpus must not accrete duplicates. - `test_get_by_ids_empty_input_and_order_preserved` — empty input returns [] without error; non-empty preserves input id order so callers can zip with parallel arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers operators we implement but don't exercise — each one is its own branch in the filter dispatch (llama_index.py:382-433), and untested branches are easy to silently swap or regress: - `test_query_is_empty_treats_missing_key_as_match` — IS_EMPTY's trickiest semantic: missing key counts as a hit. - `test_query_contains_operator_matches_list_membership` — CONTAINS (scalar-in-list), distinct from ALL / ANY. - `test_query_with_lt_filter` / `test_query_with_lte_filter` — the entire LT / LTE half of the numeric range was untested. - `test_query_with_gte_filter` — boundary case (GTE includes the threshold; GT excludes). - `test_query_with_nin_filter` — complements existing IN coverage. - `test_query_contradictive_same_key_and_returns_empty` — confirms AND is genuinely conjunctive over same-key duplicates rather than last-wins / OR. - `test_query_returns_results_sorted_by_similarity` — top-1 of a self-query is the matching node; similarities are monotonically non-increasing. Only guard against a regression that returns hits in insertion order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests the haystack in-tree DocumentStoreBaseTests suite covers that we didn't replicate. Each pins behaviour that's a drop-in regression risk: - `test_filter_documents_rejects_malformed_filter_shapes` (parametrised over 3 shapes) — our outer `_validate_filters` catches one shape; haystack's `document_matches_filter` catches the others. Both must fail loudly, not silently match everything. - `test_filter_documents_with_and_or_not_operators` — compound filter dicts (the standard production shape) work end-to-end through both `filter_documents` and `embedding_retrieval` (the latter as an allowlist). - `test_embedding_retrieval_rejects_empty_query_embedding` — empty list surfaces as a clean ValueError, not a kernel panic. - `test_filter_documents_equality_with_missing_meta_key` — `== None` matches docs where the field is absent. - `test_delete_documents_on_lazy_empty_store_is_noop` — lazy store with no committed dim doesn't blow up on delete-unknown-id. - `test_get_metadata_field_min_max_handles_float_meta_prefix_and_single_value` — three uncovered branches: float values, "meta.x" prefix, single- value collection (min == max). - `test_two_stores_have_independent_state` — refactor canary against accidental class-level mutable state. - `test_async_concurrent_embedding_retrievals_are_consistent` — 10 concurrent async retrievals all produce the same top-k as a single sync call, proving the to_thread wrappers under load. - `test_return_embedding_flag_is_inert_for_turbovec` — pins the deliberate divergence (quantization discards full-precision embeddings) so a caller doesn't quietly start relying on the flag. - `test_shutdown_closes_async_executor` — owned executor rejects new tasks after shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests agno's LanceDb unit-test suite covers that we didn't replicate. Each pins behaviour that's a drop-in regression risk: - `test_update_metadata_updates_all_docs_sharing_content_id` — content_id is many-to-one; update must hit every doc with that cid, not just the first match. - `test_update_metadata_preserves_quantized_codes` — search results are bit-identical before and after a metadata update (no silent re-embed of stored content). - `test_insert_reembeds_document_with_empty_list_embedding` — agno's async pipeline surfaces failed embeds as embedding=[] (distinct path from None); the integration must re-embed those. - `test_insert_empty_document_list_is_noop` — empty input from upstream filtering doesn't change store state or raise. - `test_search_with_empty_query_returns_empty` — empty / None query short-circuits to [], matching LanceDb. Without the short-circuit the embedder would hash "" and return arbitrary near-matches. Required a 2-line addition to `search` and `async_search`. - `test_delete_by_metadata_handles_non_string_values` — bool / float values in metadata equality predicates work natively. - `test_update_metadata_with_empty_dict_is_noop` — empty update dict preserves existing metadata. - `test_search_does_not_dedupe_distinct_documents_with_identical_content` — pins our deliberate divergence from LanceDb (which dedupes by content string). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
Follow-up round of audit surfaced four #81-family gaps where we populate a field but no test asserts on it (silent-regression risk): - `test_similarity_search_with_score_returns_descending_scores_and_self_match` — pins the actual semantics of the float in `(Document, float)`: self-match wins, scores are monotonically non-increasing. Existing tests only asserted `isinstance(score, float)`. - `test_load_then_add_assigns_fresh_handles_without_collision` — `_next_u64` is persisted across dump/load; if it were dropped, new handles would collide with old ones and corrupt search results. - `test_embeddings_property_returns_supplied_embedder` — pins the `embeddings` property override so a refactor dropping it doesn't silently break `similarity_search_with_relevance_scores` discovery. - `test_aget_by_ids_preserves_order_and_returns_documents_with_id` — async mirror of the get_by_ids order test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Active bug fix: `add()` deduplicated against the existing store but not against the input batch itself. If a single call contained two nodes with the same node_id, the store ended up with N vectors in the index but only the last node_id mapped back to one of them — the earlier handles became orphans that `query` later resolved through the duplicate node_id, returning the second node's payload attached to the first node's vector. Now raises `ValueError` on intra-batch duplicates rather than silently corrupting state. Plus three field-completeness tests: - `test_add_raises_on_intra_batch_duplicate_node_id` — covers the fix above, and verifies the store is left in a clean state (no half-written index). - `test_query_round_trips_image_node_subtype` — pins ImageNode fidelity (image_url, image_mimetype, etc.) through query. PR #83 verified full BaseNode fidelity but exclusively via TextNode; ImageNode / IndexNode coverage was missing. - `test_query_round_trips_index_node_subtype` — same for IndexNode (used by composable indexes / routers / sub-question pipelines). - `test_query_returned_node_always_has_none_embedding` — pins the contract that turbovec discards full-precision embeddings on add, so query / get_nodes results have `embedding=None`. Aligns with `get()` raising NotImplementedError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six gaps surfaced by an exhaustive return-field audit. Each pins a value the current code returns but no existing test asserts — the same shape as #81 but at finer granularity: - `test_filter_documents_returns_documents_with_score_none` — score is only set on embedding_retrieval; filter_documents must return docs with `score=None`. Guards against a future cache leak that carries a stale score between read paths. - `test_storage_property_documents_have_no_blob_sparse_or_score_when_unset` — extends the existing storage test to assert all four optional fields default to None when not set on write. - `test_embedding_retrieval_preserves_content_and_meta` — no existing retrieval test asserts the actual `Document.content` string survives the round-trip; if `_reconstruct` ever dropped content, only the blob / sparse-embedding tests would notice indirectly. - `test_to_dict_includes_all_init_params_and_type_key` — pins all four `init_parameters` keys + the outer `type` key (used by Haystack pipeline serialization to resolve the class) in a single test. Existing tests cover subsets but never the full shape. - `test_save_load_preserves_similarity_function_and_return_embedding` — `load_from_disk` uses `.get()` with defaults for both non-bit_width params; if `save_to_disk` ever stopped writing them, load would silently fall back to defaults. - `test_embedding_retrieval_all_results_have_finite_float_scores` — the existing top-k test only asserts `results[0].score is not None`; this extends to all results and adds `math.isfinite` so a kernel NaN-on-tail-hit regression doesn't slip through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five gaps from the round-2 audit. Three pin deliberate divergences from LanceDb (the named drop-in target) so a future change to either side becomes a deliberate decision rather than a silent surprise. Two pin behavioural contracts that lacked assertions. - `test_search_results_content_origin_divergence_from_lancedb` — LanceDb drops `Document.content_origin` on insert; turbovec mirrors that behaviour. Pin the divergence explicitly. - `test_search_results_size_divergence_from_lancedb` — same pin for `Document.size`. - `test_search_results_embedding_is_none_divergence_from_lancedb` — LanceDb sets `embedding=item["vector"]` on returned hits so callers can read the original vector. turbovec discards full-precision vectors after quantization, so we return `embedding=None`. Pin the divergence so callers don't get a runtime surprise. - `test_reranker_output_documents_carry_reranking_score` — pins that `reranking_score` survives the post-rerank result list. Existing reranker tests only check ordering; this catches a refactor that drops fields the reranker mutated. - `test_delete_by_metadata_returns_false_when_no_match` — pins the False branch of the bool return contract; other delete tests cover True but not False, so a regression always-returning-True wouldn't be detected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`add_with_ids_2d` was mutating the ID tables (`id_to_slot` / `slot_to_id`) BEFORE delegating to `self.inner.add_2d`. If the inner call returned `Err` — most commonly `DimMismatch` when a caller passed the wrong dim on a committed-dim index — the ID tables retained `n` ghost entries pointing at slots that don't exist in the inner index. The next `search_with_allowlist` would index a ghost id and read out-of-bounds packed-code bytes; the next `remove` would corrupt the table further. Fix: capture `base_slot = self.inner.len()` before the inner add, call `self.inner.add_2d(vectors, dim)?` first, then mutate the ID tables only on success. `add_with_ids_2d_rolls_back_id_tables_on_inner_dim_mismatch` is the regression test — without the fix it would either reject the second correctly-dim'd add (ids 40/50 stuck as ghosts) or leak ghosts into later searches. Plus six audit-driven IdMapIndex tests: - `add_with_ids_2d_rejects_non_multiple_buffer` — the `VectorBufferNotMultipleOfDim` error variant was never asserted. - `add_with_ids_2d_rejects_zero_dim` — same variant, dim=0 sub-branch. - `search_returns_descending_scores_aligned_with_ids` — the `(Vec<f32>, Vec<u64>)` parallel-array contract had no length / ordering / finiteness assertion at the IdMapIndex layer (same shape as #81: return value populated, never checked). - `search_multi_query_results_are_row_major` — every existing test used a single query, leaving the documented row-major layout for multi-query unverified at this layer. - `remove_keeps_swapped_id_addressable_in_both_tables` — the swap- and-pop consistency between `slot_to_id` and `id_to_slot` was only verified indirectly via self-query; a bug updating only one table could mask itself. - `prepare_does_not_change_search_results` — `prepare()` had no observable post-condition assertion at the IdMapIndex layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three tests covering populated-but-unasserted state on TurboQuantIndex that prior tests didn't reach: - `search_single_query_sets_nq_to_one` — `SearchResults.nq` was only asserted in multi-query batches; a regression dropping nq to 0 in the single-query path wouldn't fail any existing test. - `is_empty_tracks_len` — `is_empty()` was never called by any test; a wrong-polarity regression (e.g. `self.n_vectors > 0`) would compile and pass the suite. - `add_2d_rejects_non_multiple_of_8_dim_on_lazy_index` — the `DimNotMultipleOf8` AddError variant is only reachable from a lazy index's first `add_2d` and was never asserted; a regression flipping the branch to Ok or DimMismatch wouldn't have failed. Also extends `search_on_lazy_uncommitted_returns_empty` to assert `res.nq == 0` (the lazy-empty path for nq). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Most existing tests exercise operations in isolation; bugs that live in the transition between two operations (cache invalidation, slot reuse, persistence capturing post-mutation state) can slip past those. The audit that surfaced the LlamaIndex intra-batch `add()` corruption pointed at the same risk in the Rust core. Nine sequences pinned in a new test file: High-priority (plausible bug surfaces): - `second_add_after_search_lets_new_vectors_be_found` — blocked-cache invalidation correctness after a second add. - `add_swap_remove_add_then_self_query_finds_all_three_phases` — mixed shrink+grow keeps packed_codes / n_vectors consistent. - `swap_remove_after_load_produces_correct_search` — the load-then-mutate-then-search path through OnceLock state. - `swap_remove_then_round_trip_matches_in_memory_search` — persistence captures the post-removal state, not stale tail bytes. - `id_map_re_added_id_returns_new_vector_not_old` — re-add after remove returns the NEW vector; existing test only checked `contains`. - `prepare_then_add_invalidates_blocked_cache` — prepare-warmed cache must still invalidate on subsequent add. - `id_map_remove_last_then_add_keeps_slot_tables_consistent` — the no-swap branch of `remove(last)` leaves no stale tail entry in `slot_to_id` for the next add. Defensive: - `add_after_load_extends_index` — loaded index can be extended. - `prepare_then_swap_remove_invalidates_cache` — the prepare-then- delete-then-search path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four tests pinning Rust internal-module branches the audit found under-covered: - `tv_truncated_payload_errors_cleanly` — write a v3 .tv file, then truncate it. Loader must surface UnexpectedEof, not panic. The `read_exact` error path was previously unverified. - `tv_unsupported_version_errors_with_useful_message` — hand-construct a .tv with version=99. Loader must surface InvalidData with the "unsupported format version" message; the catch-all arm in `read_core_versioned` was unreached. - `tv_v3_invalid_n_calib_errors_cleanly` — hand-construct a v3 file with n_calib=7 (neither 0 nor dim). Loader must reject with InvalidData per the io.rs contract; this branch was unreached. - `produces_expected_shape_for_bit_width_three` — extends the parametrised shape test (which only covered 2 and 4) to bit_width=3, whose pack layout is its own branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The encode and search pipelines silently corrupted state on degenerate
inputs:
- NaN coord: `simd_norm` returns NaN, the `n_val > 1e-10` branch goes
to `inv = 0.0`, but `0.0 * NaN = NaN` propagates through rotation
and poisons `vec_scales[slot] = NaN`. The kernel's `s > heap_min`
comparison is false for NaN, so the slot exists in `len()` but is
**silently unreachable** through search.
- +/- Inf coord: same NaN-poisoning path via `1.0 / Inf = 0.0`,
followed by `0.0 * Inf = NaN`.
- Magnitude >= ~2e19: the f32 sum-of-squares in `simd_norm` overflows
to `+Inf`, `scale[i] = Inf` gets stored, and the slot **silently
wins top-k against every query** because `Inf * anything = Inf`.
- NaN/Inf query value: kernel produces NaN scores, heap silently
drops them, returned indices are arbitrary with NEG_INFINITY scores.
Fix: validate at the entry of `add`, `add_2d`, `search`, and
`search_with_mask` that every value is finite AND `|value| < 1e16`
(conservative bound that prevents f32 overflow for dims up to 2^16).
The validation cost is one pass over the input — cheaper than making
every downstream stage NaN/Inf-safe.
API surfaces:
- `TurboQuantIndex::add_2d` and `IdMapIndex::add_with_ids_2d` return
a new `AddError::InvalidInputValue { vector_index, coord_index,
value }` variant. Callers handling untrusted input should prefer
these `_2d` paths.
- `TurboQuantIndex::add` and the `search` family panic with a clear
message — consistent with their existing "caller violated
precondition" panic style.
`AddError` drops its `Eq` derive (still `PartialEq`) because the new
variant carries an `f32`, which is not `Eq` (NaN != NaN).
15 tests across the entry points (NaN, +Inf, -Inf, huge magnitude,
boundary value just below the threshold, multi-vector batch reporting,
lazy-uncommitted skip-validation path).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four binding-layer bugs surfacing as Rust panics (PanicException to
Python) instead of typed exceptions:
- Non-contiguous numpy arrays passed to `add` / `search` /
`add_with_ids` panicked via `as_slice().expect("vectors must be
contiguous")`. Now raise `ValueError("vectors must be C-contiguous;
call np.ascontiguousarray(...) first")` via `ok_or_else`.
- Query with wrong `ncols` propagated to the core's
`assert_eq!(queries.len(), nq * dim)` and surfaced as PanicException.
The binding now validates `ncols == self.dim_opt()` before calling
inner and raises `ValueError("query dim X does not match index
dim Y")`.
- `swap_remove(idx)` with idx >= len panicked via core's
`assert!(idx < n_vectors)`. Binding now raises `IndexError` with a
range-style message.
- `IdMapIndex.search` returned shape `(0, k)` for empty queries while
`TurboQuantIndex.search` returned `(0, min(k, n_vectors, n_allowed))`.
Cross-class inconsistency — fixed by computing the effective_k
identically for the nq=0 path.
Plus 11 tests covering the new behaviour and the Rust core's
`InvalidInputValue` error surfacing through the binding as
`ValueError`:
- `test_add_noncontiguous_vectors_raises_value_error`
- `test_search_query_dim_mismatch_raises_value_error`
- `test_search_noncontiguous_query_raises_value_error`
- `test_swap_remove_out_of_bounds_raises_index_error`
- `test_add_rejects_nan_with_value_error`
- `test_add_rejects_huge_magnitude_with_value_error`
- `test_search_with_nan_query_raises`
- `test_search_empty_queries_returns_consistent_shape_across_index_types`
- `test_search_query_dim_mismatch_raises_value_error` (IdMap)
- `test_add_with_ids_noncontiguous_vectors_raises_value_error`
- `test_add_with_ids_rejects_nan_with_value_error`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three test gaps surfaced by the concurrency audit. None of these exercised actual bugs (Rust's borrow checker rules out the race classes the audit considered), but they pin contracts the existing `concurrent_search.rs` doesn't: - `concurrent_search_after_load_is_safe` — a `TurboQuantIndex::load`d index starts with empty OnceLock caches; the race window mirrors the freshly-built one. The doc explicitly mentions `load` as a `prepare()`-skip target, but no test covered concurrent search immediately after `load`. - `id_map_concurrent_search_is_deterministic_across_threads` — `IdMapIndex::search` had zero concurrent-search coverage. - `concurrent_prepare_races_with_search_safely` — multiple `prepare` threads interleaved with multiple `search` threads on a fresh (unprepared) index. Pins that OnceLock's "closure runs at most once" guarantee holds end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two audit-found bugs in the TQ+ calibration state machine that silently degrade index correctness without any error signal. ## Bug F1: v2-loaded index + add silently mis-encoded new vectors A v2 (pre-TQ+) file loads with empty `tqplus_shift`. On the next `add()`, the `existing` slot was set to `None` (the lazy-first-add signal), `encode()` fit fresh calibration on the new batch and baked it into the packed codes — but the `n_vectors != 0` else branch only extended `packed_codes` / `scales`, never persisting the fitted shift / scale_tq. The new vectors ended up encoded with calibration that search later treated as identity, silently corrupting scores. Fix: in `TurboQuantIndex::from_parts`, populate explicit identity TQ+ vectors when `tqplus_shift` is empty and `n_vectors > 0`. The loaded state then matches the on-the-wire reality (those vectors were encoded without calibration), and the next `add` sees `existing = Some(identity)` and encodes new vectors against identity too — keeping the entire index in a single coordinate system. ## Bug F2: Empty first add froze identity calibration forever `add(&[])` with n=0 hit `n < TQPLUS_MIN_SAMPLES` in `encode`, returned identity `(zeros, ones)` of length dim, and the `n_vectors == 0` branch in `add` copied that identity into `self.tqplus_shift` / `self.tqplus_scale`. Now `tqplus_shift.is_empty()` was false, so every subsequent add — even a million-vector batch with rich distribution — saw `existing = Some(identity)` and silently skipped fitting fresh calibration. The user lost TQ+ entirely with no warning. Fix: short-circuit `add` to a true no-op when n=0, before touching any state. Empty input now means "no change", end-to-end. ## Regression tests - `empty_first_add_does_not_freeze_identity_calibration` — writes the index after empty + 1500-vec add, parses the persisted TQ+ trailer, asserts at least one shift / scale value is non-trivial (pre-fix the trailer was exactly identity). - `v2_loaded_index_populates_identity_calibration` — hand-constructs a v2 .tv file, loads it, adds 1500 vectors, writes back, asserts the trailer is exactly identity (pre-fix the fitted-then-discarded calibration would have leaked through silently). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug in the wave-5 fix for IdMap empty-query shape consistency: the `effective_k` computation for `nq == 0` used `allow_slice.len()` (raw length), but the kernel internally dedups the allowlist via a packed bool mask. So `allowlist=[1, 1, 1]` would return shape `(0, 3)` for empty queries and `(N, 1)` for non-empty queries — still divergent, just in a different shape than before wave 5. Fix: dedup via HashSet in the nq=0 path to match the kernel's mask-based dedup for nq>0. Test: `test_search_empty_queries_dedups_allowlist_for_effective_k` pins both shapes (empty and real query) and asserts they share the trailing `effective_k` dim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wave-5 input validation (NaN/Inf/overflow rejection) added new panic conditions to `add` / `add_2d` / `search` / `search_with_mask`, but the rustdoc comments still listed only the pre-wave-5 panic triggers. Users reading the docs would expect their indexes to handle non-finite inputs gracefully when they actually panic loudly. Doc updates: - `TurboQuantIndex::add` — now lists all three panic conditions (dim-not-set, non-multiple buffer, invalid input value) in a # Panics section. - `TurboQuantIndex::add_2d` — adds `InvalidInputValue` to the typed return list and documents that bad buffer length still panics. - `TurboQuantIndex::search` — adds a # Panics section covering both buffer-length and invalid-query-value cases. - `TurboQuantIndex::search_with_mask` — same plus the mask-length panic. Plus two small cleanups: - Delete a misleading internal comment in `error.rs` that referenced a manual `PartialEq` impl which doesn't exist (the derive handles it fine; the only change wave 5 made was dropping `Eq`). - Drop the `IndexIDMap2` FAISS analogy from `id_map.rs` per the positioning memory entry: turbovec docs should describe features on their own terms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`search` on `x86_64` runtime-dispatches to AVX-512 BW or AVX2 inside
an `unsafe { if/else if }` block with no `else` branch. If neither
`is_x86_feature_detected!("avx512bw")` nor `is_x86_feature_detected!("avx2")`
returned true — pre-Haswell x86 (rare in 2026), or a VM / emulator
that doesn't expose AVX2 to userspace — neither kernel ran,
`heap_sizes` stayed at `[0; batch_nq]`, and `search` returned **empty
top-k results** for every query with no error signal.
The `cfg(not(any(target_arch = "aarch64", target_arch = "x86_64")))`
scalar fallback at the bottom of the function is gated on the
architecture, not the runtime feature flag, so it didn't cover this
case.
Fix: extract the per-query scalar scoring (previously inlined in the
catch-all `cfg` block) into a free helper `score_query_into_heap`,
then add an `else` branch inside the x86_64 `unsafe { }` that calls
it once per query in the current batch. The catch-all scalar block
is also refactored to call the same helper, eliminating the prior
duplication.
The workspace `.cargo/config.toml` sets `target-cpu=x86-64-v3` which
requires AVX2 at the compile level, so this affects downstream Rust
users building turbovec without that config, not turbovec's own
release wheels. But the silent-empty-results failure mode is the
worst kind, so it gets a fix even though the reproducer is rare.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`from_parts` is `pub(crate)` and the integrity gateway for both `TurboQuantIndex::load` and `IdMapIndex::load`. The wave-6 fix added identity-population logic that relies on length invariants holding (`tqplus_shift.len() == tqplus_scale.len()`, packed/scales lengths matching `n_vectors * dim * bit_width / 8` and `n_vectors` respectively), but neither `from_parts` nor any test pinned them. Today the read-side validation in `io::load_*` upholds these invariants — but a future caller bypassing the IO layer (or a refactor that drops a read-layer check) could silently construct a malformed `TurboQuantIndex`. `search` and `swap_remove` index `packed_codes`, `scales`, and `tqplus_*` assuming the invariants hold; out-of-bounds reads / underflows would manifest as silent garbage scores rather than a clear error. Fix: assert structural invariants at `from_parts` entry, before the identity-population logic that depends on them: - `tqplus_shift.len() == tqplus_scale.len()` always. - For an eager (`dim = Some(d)`) state: - `packed_codes.len() == n_vectors * d * bit_width / 8` - `scales.len() == n_vectors` - non-empty TQ+ must have `tqplus_shift.len() == d` - For a lazy (`dim = None`) state: `n_vectors == 0` and every storage field empty. Plus 7 unit tests in a new `from_parts_tests` module covering each panic-on-violation case and two acceptance cases (lazy uncommitted, eager with consistent lengths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Six waves of audit-driven work on the integrations and the Rust core. All in one PR because they share the same investigation thread.
Wave 1 — Structural fidelity bug fixes in integrations (4 commits)
Follow-up to #81. Audited each integration vs its reference store; found analogous "cherry-picked side-car schema silently drops fields" bugs in haystack, llama-index, agno. Plus the original #81 fix cherry-picked from #82.
Wave 2 — Reference-parity tests (4 commits, +30 tests)
Audited each library's test surface vs ours; pinned 30 behaviours their in-tree reference store tests but we didn't.
Wave 3 — Python integration field-completeness (4 commits, 1 fix + 17 tests)
LlamaIndex
add()was silently corrupting state on intra-batch duplicate node_ids. Plus exhaustive return-field audit on all four integrations.Wave 4 — Rust core audit (4 commits, 1 fix + 22 tests)
IdMapIndex::add_with_ids_2dwas mutating ID tables before delegating to inner add; onErrreturns the tables retained ghost entries pointing at slots that didn't exist. Plus TurboQuantIndex API gaps, state-sequence tests, internal module gaps.Wave 5 — Numerical + bindings + concurrency (3 commits, 8 fixes + 29 tests)
4 silent data-corruption bugs in the encode/search pipeline:
len()but is silently unreachable.>= 1e16→ f32 norm overflows to +Inf, slot wins top-k against every query.Fix:
AddError::InvalidInputValue+ finiteness/magnitude validation atadd/add_2d/search/search_with_maskentry.4 Python binding hygiene fixes: non-contiguous arrays → typed
ValueError; wrong query dim → typedValueError;swap_removeout-of-bounds → typedIndexError; cross-class shape consistency betweenTurboQuantIndex.searchandIdMapIndex.searchfor empty queries.Plus 3 new concurrency tests (
load+search races,IdMapIndexconcurrent search, prepare-with-search interleaving).Wave 6 — TQ+ calibration + allowlist dedup + doc drift (3 commits, 3 fixes + 3 tests)
🔴 TQ+ F1 — v2-loaded index + add silently mis-encoded new vectors. Loading a v2 (pre-TQ+) file left
tqplus_shiftempty; the nextaddsawexisting = None, encoded vectors with fresh-fitted calibration, but the else branch silently dropped the fitted shift/scale_tq. New vectors were encoded with calibration but searched as identity → silently corrupted scores. Fix infrom_parts: populate explicit identity when loading a v2-shaped state.🔴 TQ+ F2 — Empty first add silently froze identity calibration forever.
add(&[])returned identity fromencode, then_vectors == 0branch wrote it toself.tqplus_shift, and every subsequent add (even 1M-vector batches) sawexisting = Some(identity)and skipped fitting. Fix:addshort-circuits to a true no-op whenn == 0.🔴 Wave-5 allowlist dedup bug (in my own wave-5 fix). For
nq == 0,effective_kcounted raw allowlist length but the kernel internally dedups via packed-bool mask. Soallowlist=[1,1,1]returned shape(0, 3)for empty queries vs(N, 1)for non-empty. Fix: dedup via HashSet in the nq=0 path.Plus doc updates for the wave-5 panic conditions (rustdoc on
add/add_2d/search/search_with_mask), a deleted misleading comment inerror.rs, and dropped the FAISS analogy fromid_map.rsper the no-faiss-analogues memory entry.Verified during the audits
VectorDbABC only requiresname_exists/id_exists/content_hash_exists; an audit flag about missingcontent_id_existswas a false positive.Out of scope — follow-ups worth filing
bm25_retrieval/bm25_retrieval_asyncfor the haystack store.Document.content_origin/size/embeddingpreservation (currently pinned as deliberate divergences from LanceDb).packmodule (currently exercised transitively viakernel_correctness.rs).TQPLUS_MIN_SAMPLESthreshold could be made discoverable / logged on first-batch-too-small (currently silent).Headline totals
Test plan
🤖 Generated with Claude Code