Audit-driven integration + Rust core fidelity work (7 waves) by RyanCodrai · Pull Request #83 · RyanCodrai/turbovec

RyanCodrai · 2026-05-30T10:12:41Z

Summary

Six waves of audit-driven work on the integrations and the Rust core. All in one PR because they share the same investigation thread.

Wave 1 — Structural fidelity bug fixes in integrations (4 commits)

Follow-up to #81. Audited each integration vs its reference store; found analogous "cherry-picked side-car schema silently drops fields" bugs in haystack, llama-index, agno. Plus the original #81 fix cherry-picked from #82.

Wave 2 — Reference-parity tests (4 commits, +30 tests)

Audited each library's test surface vs ours; pinned 30 behaviours their in-tree reference store tests but we didn't.

Wave 3 — Python integration field-completeness (4 commits, 1 fix + 17 tests)

LlamaIndex add() was silently corrupting state on intra-batch duplicate node_ids. Plus exhaustive return-field audit on all four integrations.

Wave 4 — Rust core audit (4 commits, 1 fix + 22 tests)

IdMapIndex::add_with_ids_2d was mutating ID tables before delegating to inner add; on Err returns the tables retained ghost entries pointing at slots that didn't exist. Plus TurboQuantIndex API gaps, state-sequence tests, internal module gaps.

Wave 5 — Numerical + bindings + concurrency (3 commits, 8 fixes + 29 tests)

4 silent data-corruption bugs in the encode/search pipeline:

NaN input → vec_scales poisoned, slot exists in len() but is silently unreachable.
Inf input → same NaN-poisoning path.
Magnitude >= 1e16 → f32 norm overflows to +Inf, slot wins top-k against every query.
NaN/Inf query → kernel produces NaN scores, arbitrary indices returned.

Fix: AddError::InvalidInputValue + finiteness/magnitude validation at add / add_2d / search / search_with_mask entry.

4 Python binding hygiene fixes: non-contiguous arrays → typed ValueError; wrong query dim → typed ValueError; swap_remove out-of-bounds → typed IndexError; cross-class shape consistency between TurboQuantIndex.search and IdMapIndex.search for empty queries.

Plus 3 new concurrency tests (load+search races, IdMapIndex concurrent search, prepare-with-search interleaving).

Wave 6 — TQ+ calibration + allowlist dedup + doc drift (3 commits, 3 fixes + 3 tests)

🔴 TQ+ F1 — v2-loaded index + add silently mis-encoded new vectors. Loading a v2 (pre-TQ+) file left tqplus_shift empty; the next add saw existing = None, encoded vectors with fresh-fitted calibration, but the else branch silently dropped the fitted shift/scale_tq. New vectors were encoded with calibration but searched as identity → silently corrupted scores. Fix in from_parts: populate explicit identity when loading a v2-shaped state.

🔴 TQ+ F2 — Empty first add silently froze identity calibration forever. add(&[]) returned identity from encode, the n_vectors == 0 branch wrote it to self.tqplus_shift, and every subsequent add (even 1M-vector batches) saw existing = Some(identity) and skipped fitting. Fix: add short-circuits to a true no-op when n == 0.

🔴 Wave-5 allowlist dedup bug (in my own wave-5 fix). For nq == 0, effective_k counted raw allowlist length but the kernel internally dedups via packed-bool mask. So allowlist=[1,1,1] returned shape (0, 3) for empty queries vs (N, 1) for non-empty. Fix: dedup via HashSet in the nq=0 path.

Plus doc updates for the wave-5 panic conditions (rustdoc on add / add_2d / search / search_with_mask), a deleted misleading comment in error.rs, and dropped the FAISS analogy from id_map.rs per the no-faiss-analogues memory entry.

Verified during the audits

agno's VectorDb ABC only requires name_exists / id_exists / content_hash_exists; an audit flag about missing content_id_exists was a false positive.

Out of scope — follow-ups worth filing

Full bm25_retrieval / bm25_retrieval_async for the haystack store.
agno Document.content_origin / size / embedding preservation (currently pinned as deliberate divergences from LanceDb).
Direct tests for the pack module (currently exercised transitively via kernel_correctness.rs).
Wider encode coverage (TQ+ identity-fallback branch on per-coord degeneracy).
TQPLUS_MIN_SAMPLES threshold could be made discoverable / logged on first-batch-too-small (currently silent).

Headline totals

13 active bugs found and fixed across waves 1, 3, 4, 5, 6.
~125 tests added across the 6 waves (regression + parity + field-completeness + state-sequence + numerical + concurrency + audit-driven coverage).

Test plan

Full Rust test suite locally: 135/135 pass.
Full Python test suite locally: 331/331 pass.
CI on Linux / macOS / Windows.

🤖 Generated with Claude Code

The store's side-car schema cherry-picked `{id, content, meta}` on write and reconstructed Documents with only those fields plus `score`. `Document.blob` (binary payloads, e.g. images/audio) and `Document.sparse_embedding` (hybrid-search sparse vectors) were silently dropped on every write_documents call and never re-attached on filter_documents / embedding_retrieval / storage / save_to_disk. This is the same shape of bug as #81 — fields available at write time were not propagated to the result-object constructor. The reference `InMemoryDocumentStore` preserves both fields verbatim. Fix: - Store `blob` and `sparse_embedding` in the in-memory dict alongside `id` / `content` / `meta`. - Pass them through `_reconstruct` so every retrieval path (filter, embedding_retrieval, storage, save/load) returns Documents with both fields populated when they were set on write. - Add JSON serializers (`_serialize_doc_data` / `_deserialize_doc_data`) for save_to_disk / load_from_disk — `ByteStream.to_dict()` and `SparseEmbedding.to_dict()` give us JSON-safe forms. - Bump docstore schema version 1 → 2; keep v1 loadable with blob / sparse_embedding defaulting to None so existing on-disk stores reload without manual migration. Also in this PR: - Tighten `_validate_filters` to match `InMemoryDocumentStore` (`document_store.py:504-509`): a bare `{"field": "x"}` with no `operator` or `conditions` is malformed; the reference rejects it and we now do too. - Scope back the module docstring's drop-in parity claim — BM25 retrieval (`bm25_retrieval` / `bm25_retrieval_async`) is not implemented, so the unqualified "matches the public surface of InMemoryDocumentStore" sentence was overpromising. Pipelines that need keyword search alongside vector search should wire in an explicit `InMemoryBM25Retriever` against a separate store. Tests added (6): - `test_blob_field_round_trips_through_filter_and_retrieval` - `test_sparse_embedding_field_round_trips_through_filter_and_retrieval` - `test_blob_and_sparse_embedding_survive_save_load_roundtrip` - `test_documents_without_blob_or_sparse_embedding_round_trip_as_none` - `test_load_accepts_v1_schema_with_no_blob_or_sparse_fields` - `test_filter_documents_rejects_field_without_operator` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The side-car schema cherry-picked {text, metadata, ref_doc_id} on write, so `_reconstruct_node` could only build a bare `TextNode(id_, text, metadata)` plus the SOURCE relationship. Every other field LlamaIndex preserves through `SimpleVectorStore` was silently dropped on every `query` / `get_nodes` / persist+load: - relationships other than SOURCE (PREVIOUS / NEXT / PARENT / CHILD) — these power hierarchical RAG retrievers like AutoMergingRetriever and RecursiveRetriever - `excluded_embed_metadata_keys` / `excluded_llm_metadata_keys` — silently changed downstream prompt content and embedding inputs - `text_template`, `metadata_template`, `metadata_separator` — change `get_content()` output for callers using non-default templates - `start_char_idx` / `end_char_idx` — citations / highlighting - `mimetype` — defaulted to text/plain on retrieval Fix: use llama-index's own `node_to_metadata_dict` / `metadata_dict_to_node` (the framework's canonical full-fidelity round-trip) and store the resulting `node_dict` alongside the existing metadata + ref_doc_id (kept at top level for fast filter / doc-id lookup without re-parsing _node_content per hit). `_reconstruct_node` dispatches on schema shape: v2 entries carry `node_dict` and round-trip via `metadata_dict_to_node`; v1 entries (persisted before this fix) keep loading with their original narrow fidelity rather than failing. Nodes schema bumps 1 → 2. Also in this commit (filter / query parity against `SimpleVectorStore` and `utils.build_metadata_filter_fn`): - `FilterCondition.NOT` is implemented (was raising NotImplementedError): matches when none of the inner filters match, per the reference. - `FilterOperator.TEXT_MATCH` is now case-SENSITIVE; previous impl silently lowercased both sides (semantic divergence — our results disagreed with SimpleVectorStore on mixed-case keys). - `FilterOperator.TEXT_MATCH_INSENSITIVE` is now supported as the opt-in case-folding form. - `FilterOperator.ALL` and `FilterOperator.ANY` are now supported (list-of-tags matching) instead of raising NotImplementedError. - Both TEXT_MATCH variants raise `TypeError` on non-string operands, matching the reference (`utils.py:141-144, 148-151`). - `query.mode != VectorStoreQueryMode.DEFAULT` now raises NotImplementedError instead of silently treating MMR / SVM / hybrid as DEFAULT — those modes need full-precision vectors which the quantized index discards. Tests: - Renamed `test_query_text_match_is_case_insensitive` → `test_query_text_match_is_case_sensitive` (semantics changed). - Replaced the now-stale `test_query_unsupported_filter_operator_raises` with `test_query_text_match_raises_type_error_on_non_string_operands`. - Added: TEXT_MATCH_INSENSITIVE, ALL, ANY, NOT, query.mode!=DEFAULT, v1 backward-compat load, full-node fidelity through query + get_nodes + persist (asserts every relationship type, excluded metadata keys, char_idx, mimetype, templates). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`_build_results` constructed returned Documents without `embedder=`, so downstream code couldn't call `doc.embed()` / `doc.async_embed()` on a retrieved hit — both methods fall back to `self.embedder`, which was None, raising "No embedder provided" at runtime. Matches `LanceDb._build_search_results`, which is the named drop-in target in the module docstring and passes `embedder=self.embedder` explicitly. Tests: - `test_search_results_carry_embedder` asserts the field is set on hits returned from `db.search`. - `test_async_search_results_also_carry_embedder` asserts the same for `db.async_search`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`_search_vector` constructed the returned Documents without `id=sid`, so `similarity_search`, `similarity_search_with_score`, and `similarity_search_by_vector` all returned `Document.id = None` — breaking downstream LangChain code (retrievers, chains) that expects to identify search hits by id. The fix is one kwarg in two places: - The Document used in the callable-filter predicate (so user filters can match on `doc.id`, matching the InMemoryVectorStore convention). - The Document attached to each (Document, score) result tuple. Both call sites already had `sid` in scope; it just wasn't being passed to the constructor. `get_by_ids` already used the right pattern (`Document(id=sid, page_content=..., metadata=...)`). Adds two regression tests in test_langchain.py covering: - Each of the three similarity_search variants returns Documents with `.id` populated (both explicit ids and store-generated UUIDs). - The callable filter receives a Document with `.id` set so predicates can filter on it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tests langchain-core's in-tree InMemoryVectorStore suite covers that we didn't replicate, surfaced by an audit of the reference test file. Each pins behaviour that's a drop-in regression risk: - `test_async_methods_await_aembed_functions` — patches aembed_* with AsyncMock to verify our async paths actually await the async embedder methods, not silently fall back to sync (would block the event loop with a real async embedder). - `test_add_documents_upsert_replaces_metadata` — re-adding a Document with the same id and new metadata must let the new metadata win. - `test_add_documents_does_not_mutate_inputs` — caller-passed Document objects must not be mutated (no in-place .id assignment, metadata dict identity preserved). - `test_add_documents_with_ids_is_idempotent` — re-running ingestion on an unchanged corpus must not accrete duplicates. - `test_get_by_ids_empty_input_and_order_preserved` — empty input returns [] without error; non-empty preserves input id order so callers can zip with parallel arrays. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Covers operators we implement but don't exercise — each one is its own branch in the filter dispatch (llama_index.py:382-433), and untested branches are easy to silently swap or regress: - `test_query_is_empty_treats_missing_key_as_match` — IS_EMPTY's trickiest semantic: missing key counts as a hit. - `test_query_contains_operator_matches_list_membership` — CONTAINS (scalar-in-list), distinct from ALL / ANY. - `test_query_with_lt_filter` / `test_query_with_lte_filter` — the entire LT / LTE half of the numeric range was untested. - `test_query_with_gte_filter` — boundary case (GTE includes the threshold; GT excludes). - `test_query_with_nin_filter` — complements existing IN coverage. - `test_query_contradictive_same_key_and_returns_empty` — confirms AND is genuinely conjunctive over same-key duplicates rather than last-wins / OR. - `test_query_returns_results_sorted_by_similarity` — top-1 of a self-query is the matching node; similarities are monotonically non-increasing. Only guard against a regression that returns hits in insertion order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tests the haystack in-tree DocumentStoreBaseTests suite covers that we didn't replicate. Each pins behaviour that's a drop-in regression risk: - `test_filter_documents_rejects_malformed_filter_shapes` (parametrised over 3 shapes) — our outer `_validate_filters` catches one shape; haystack's `document_matches_filter` catches the others. Both must fail loudly, not silently match everything. - `test_filter_documents_with_and_or_not_operators` — compound filter dicts (the standard production shape) work end-to-end through both `filter_documents` and `embedding_retrieval` (the latter as an allowlist). - `test_embedding_retrieval_rejects_empty_query_embedding` — empty list surfaces as a clean ValueError, not a kernel panic. - `test_filter_documents_equality_with_missing_meta_key` — `== None` matches docs where the field is absent. - `test_delete_documents_on_lazy_empty_store_is_noop` — lazy store with no committed dim doesn't blow up on delete-unknown-id. - `test_get_metadata_field_min_max_handles_float_meta_prefix_and_single_value` — three uncovered branches: float values, "meta.x" prefix, single- value collection (min == max). - `test_two_stores_have_independent_state` — refactor canary against accidental class-level mutable state. - `test_async_concurrent_embedding_retrievals_are_consistent` — 10 concurrent async retrievals all produce the same top-k as a single sync call, proving the to_thread wrappers under load. - `test_return_embedding_flag_is_inert_for_turbovec` — pins the deliberate divergence (quantization discards full-precision embeddings) so a caller doesn't quietly start relying on the flag. - `test_shutdown_closes_async_executor` — owned executor rejects new tasks after shutdown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tests agno's LanceDb unit-test suite covers that we didn't replicate. Each pins behaviour that's a drop-in regression risk: - `test_update_metadata_updates_all_docs_sharing_content_id` — content_id is many-to-one; update must hit every doc with that cid, not just the first match. - `test_update_metadata_preserves_quantized_codes` — search results are bit-identical before and after a metadata update (no silent re-embed of stored content). - `test_insert_reembeds_document_with_empty_list_embedding` — agno's async pipeline surfaces failed embeds as embedding=[] (distinct path from None); the integration must re-embed those. - `test_insert_empty_document_list_is_noop` — empty input from upstream filtering doesn't change store state or raise. - `test_search_with_empty_query_returns_empty` — empty / None query short-circuits to [], matching LanceDb. Without the short-circuit the embedder would hash "" and return arbitrary near-matches. Required a 2-line addition to `search` and `async_search`. - `test_delete_by_metadata_handles_non_string_values` — bool / float values in metadata equality predicates work natively. - `test_update_metadata_with_empty_dict_is_noop` — empty update dict preserves existing metadata. - `test_search_does_not_dedupe_distinct_documents_with_identical_content` — pins our deliberate divergence from LanceDb (which dedupes by content string). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up round of audit surfaced four #81-family gaps where we populate a field but no test asserts on it (silent-regression risk): - `test_similarity_search_with_score_returns_descending_scores_and_self_match` — pins the actual semantics of the float in `(Document, float)`: self-match wins, scores are monotonically non-increasing. Existing tests only asserted `isinstance(score, float)`. - `test_load_then_add_assigns_fresh_handles_without_collision` — `_next_u64` is persisted across dump/load; if it were dropped, new handles would collide with old ones and corrupt search results. - `test_embeddings_property_returns_supplied_embedder` — pins the `embeddings` property override so a refactor dropping it doesn't silently break `similarity_search_with_relevance_scores` discovery. - `test_aget_by_ids_preserves_order_and_returns_documents_with_id` — async mirror of the get_by_ids order test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Active bug fix: `add()` deduplicated against the existing store but not against the input batch itself. If a single call contained two nodes with the same node_id, the store ended up with N vectors in the index but only the last node_id mapped back to one of them — the earlier handles became orphans that `query` later resolved through the duplicate node_id, returning the second node's payload attached to the first node's vector. Now raises `ValueError` on intra-batch duplicates rather than silently corrupting state. Plus three field-completeness tests: - `test_add_raises_on_intra_batch_duplicate_node_id` — covers the fix above, and verifies the store is left in a clean state (no half-written index). - `test_query_round_trips_image_node_subtype` — pins ImageNode fidelity (image_url, image_mimetype, etc.) through query. PR #83 verified full BaseNode fidelity but exclusively via TextNode; ImageNode / IndexNode coverage was missing. - `test_query_round_trips_index_node_subtype` — same for IndexNode (used by composable indexes / routers / sub-question pipelines). - `test_query_returned_node_always_has_none_embedding` — pins the contract that turbovec discards full-precision embeddings on add, so query / get_nodes results have `embedding=None`. Aligns with `get()` raising NotImplementedError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Six gaps surfaced by an exhaustive return-field audit. Each pins a value the current code returns but no existing test asserts — the same shape as #81 but at finer granularity: - `test_filter_documents_returns_documents_with_score_none` — score is only set on embedding_retrieval; filter_documents must return docs with `score=None`. Guards against a future cache leak that carries a stale score between read paths. - `test_storage_property_documents_have_no_blob_sparse_or_score_when_unset` — extends the existing storage test to assert all four optional fields default to None when not set on write. - `test_embedding_retrieval_preserves_content_and_meta` — no existing retrieval test asserts the actual `Document.content` string survives the round-trip; if `_reconstruct` ever dropped content, only the blob / sparse-embedding tests would notice indirectly. - `test_to_dict_includes_all_init_params_and_type_key` — pins all four `init_parameters` keys + the outer `type` key (used by Haystack pipeline serialization to resolve the class) in a single test. Existing tests cover subsets but never the full shape. - `test_save_load_preserves_similarity_function_and_return_embedding` — `load_from_disk` uses `.get()` with defaults for both non-bit_width params; if `save_to_disk` ever stopped writing them, load would silently fall back to defaults. - `test_embedding_retrieval_all_results_have_finite_float_scores` — the existing top-k test only asserts `results[0].score is not None`; this extends to all results and adds `math.isfinite` so a kernel NaN-on-tail-hit regression doesn't slip through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Five gaps from the round-2 audit. Three pin deliberate divergences from LanceDb (the named drop-in target) so a future change to either side becomes a deliberate decision rather than a silent surprise. Two pin behavioural contracts that lacked assertions. - `test_search_results_content_origin_divergence_from_lancedb` — LanceDb drops `Document.content_origin` on insert; turbovec mirrors that behaviour. Pin the divergence explicitly. - `test_search_results_size_divergence_from_lancedb` — same pin for `Document.size`. - `test_search_results_embedding_is_none_divergence_from_lancedb` — LanceDb sets `embedding=item["vector"]` on returned hits so callers can read the original vector. turbovec discards full-precision vectors after quantization, so we return `embedding=None`. Pin the divergence so callers don't get a runtime surprise. - `test_reranker_output_documents_carry_reranking_score` — pins that `reranking_score` survives the post-rerank result list. Existing reranker tests only check ordering; this catches a refactor that drops fields the reranker mutated. - `test_delete_by_metadata_returns_false_when_no_match` — pins the False branch of the bool return contract; other delete tests cover True but not False, so a regression always-returning-True wouldn't be detected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`add_with_ids_2d` was mutating the ID tables (`id_to_slot` / `slot_to_id`) BEFORE delegating to `self.inner.add_2d`. If the inner call returned `Err` — most commonly `DimMismatch` when a caller passed the wrong dim on a committed-dim index — the ID tables retained `n` ghost entries pointing at slots that don't exist in the inner index. The next `search_with_allowlist` would index a ghost id and read out-of-bounds packed-code bytes; the next `remove` would corrupt the table further. Fix: capture `base_slot = self.inner.len()` before the inner add, call `self.inner.add_2d(vectors, dim)?` first, then mutate the ID tables only on success. `add_with_ids_2d_rolls_back_id_tables_on_inner_dim_mismatch` is the regression test — without the fix it would either reject the second correctly-dim'd add (ids 40/50 stuck as ghosts) or leak ghosts into later searches. Plus six audit-driven IdMapIndex tests: - `add_with_ids_2d_rejects_non_multiple_buffer` — the `VectorBufferNotMultipleOfDim` error variant was never asserted. - `add_with_ids_2d_rejects_zero_dim` — same variant, dim=0 sub-branch. - `search_returns_descending_scores_aligned_with_ids` — the `(Vec<f32>, Vec<u64>)` parallel-array contract had no length / ordering / finiteness assertion at the IdMapIndex layer (same shape as #81: return value populated, never checked). - `search_multi_query_results_are_row_major` — every existing test used a single query, leaving the documented row-major layout for multi-query unverified at this layer. - `remove_keeps_swapped_id_addressable_in_both_tables` — the swap- and-pop consistency between `slot_to_id` and `id_to_slot` was only verified indirectly via self-query; a bug updating only one table could mask itself. - `prepare_does_not_change_search_results` — `prepare()` had no observable post-condition assertion at the IdMapIndex layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three tests covering populated-but-unasserted state on TurboQuantIndex that prior tests didn't reach: - `search_single_query_sets_nq_to_one` — `SearchResults.nq` was only asserted in multi-query batches; a regression dropping nq to 0 in the single-query path wouldn't fail any existing test. - `is_empty_tracks_len` — `is_empty()` was never called by any test; a wrong-polarity regression (e.g. `self.n_vectors > 0`) would compile and pass the suite. - `add_2d_rejects_non_multiple_of_8_dim_on_lazy_index` — the `DimNotMultipleOf8` AddError variant is only reachable from a lazy index's first `add_2d` and was never asserted; a regression flipping the branch to Ok or DimMismatch wouldn't have failed. Also extends `search_on_lazy_uncommitted_returns_empty` to assert `res.nq == 0` (the lazy-empty path for nq). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Most existing tests exercise operations in isolation; bugs that live in the transition between two operations (cache invalidation, slot reuse, persistence capturing post-mutation state) can slip past those. The audit that surfaced the LlamaIndex intra-batch `add()` corruption pointed at the same risk in the Rust core. Nine sequences pinned in a new test file: High-priority (plausible bug surfaces): - `second_add_after_search_lets_new_vectors_be_found` — blocked-cache invalidation correctness after a second add. - `add_swap_remove_add_then_self_query_finds_all_three_phases` — mixed shrink+grow keeps packed_codes / n_vectors consistent. - `swap_remove_after_load_produces_correct_search` — the load-then-mutate-then-search path through OnceLock state. - `swap_remove_then_round_trip_matches_in_memory_search` — persistence captures the post-removal state, not stale tail bytes. - `id_map_re_added_id_returns_new_vector_not_old` — re-add after remove returns the NEW vector; existing test only checked `contains`. - `prepare_then_add_invalidates_blocked_cache` — prepare-warmed cache must still invalidate on subsequent add. - `id_map_remove_last_then_add_keeps_slot_tables_consistent` — the no-swap branch of `remove(last)` leaves no stale tail entry in `slot_to_id` for the next add. Defensive: - `add_after_load_extends_index` — loaded index can be extended. - `prepare_then_swap_remove_invalidates_cache` — the prepare-then- delete-then-search path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four tests pinning Rust internal-module branches the audit found under-covered: - `tv_truncated_payload_errors_cleanly` — write a v3 .tv file, then truncate it. Loader must surface UnexpectedEof, not panic. The `read_exact` error path was previously unverified. - `tv_unsupported_version_errors_with_useful_message` — hand-construct a .tv with version=99. Loader must surface InvalidData with the "unsupported format version" message; the catch-all arm in `read_core_versioned` was unreached. - `tv_v3_invalid_n_calib_errors_cleanly` — hand-construct a v3 file with n_calib=7 (neither 0 nor dim). Loader must reject with InvalidData per the io.rs contract; this branch was unreached. - `produces_expected_shape_for_bit_width_three` — extends the parametrised shape test (which only covered 2 and 4) to bit_width=3, whose pack layout is its own branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The encode and search pipelines silently corrupted state on degenerate inputs: - NaN coord: `simd_norm` returns NaN, the `n_val > 1e-10` branch goes to `inv = 0.0`, but `0.0 * NaN = NaN` propagates through rotation and poisons `vec_scales[slot] = NaN`. The kernel's `s > heap_min` comparison is false for NaN, so the slot exists in `len()` but is **silently unreachable** through search. - +/- Inf coord: same NaN-poisoning path via `1.0 / Inf = 0.0`, followed by `0.0 * Inf = NaN`. - Magnitude >= ~2e19: the f32 sum-of-squares in `simd_norm` overflows to `+Inf`, `scale[i] = Inf` gets stored, and the slot **silently wins top-k against every query** because `Inf * anything = Inf`. - NaN/Inf query value: kernel produces NaN scores, heap silently drops them, returned indices are arbitrary with NEG_INFINITY scores. Fix: validate at the entry of `add`, `add_2d`, `search`, and `search_with_mask` that every value is finite AND `|value| < 1e16` (conservative bound that prevents f32 overflow for dims up to 2^16). The validation cost is one pass over the input — cheaper than making every downstream stage NaN/Inf-safe. API surfaces: - `TurboQuantIndex::add_2d` and `IdMapIndex::add_with_ids_2d` return a new `AddError::InvalidInputValue { vector_index, coord_index, value }` variant. Callers handling untrusted input should prefer these `_2d` paths. - `TurboQuantIndex::add` and the `search` family panic with a clear message — consistent with their existing "caller violated precondition" panic style. `AddError` drops its `Eq` derive (still `PartialEq`) because the new variant carries an `f32`, which is not `Eq` (NaN != NaN). 15 tests across the entry points (NaN, +Inf, -Inf, huge magnitude, boundary value just below the threshold, multi-vector batch reporting, lazy-uncommitted skip-validation path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four binding-layer bugs surfacing as Rust panics (PanicException to Python) instead of typed exceptions: - Non-contiguous numpy arrays passed to `add` / `search` / `add_with_ids` panicked via `as_slice().expect("vectors must be contiguous")`. Now raise `ValueError("vectors must be C-contiguous; call np.ascontiguousarray(...) first")` via `ok_or_else`. - Query with wrong `ncols` propagated to the core's `assert_eq!(queries.len(), nq * dim)` and surfaced as PanicException. The binding now validates `ncols == self.dim_opt()` before calling inner and raises `ValueError("query dim X does not match index dim Y")`. - `swap_remove(idx)` with idx >= len panicked via core's `assert!(idx < n_vectors)`. Binding now raises `IndexError` with a range-style message. - `IdMapIndex.search` returned shape `(0, k)` for empty queries while `TurboQuantIndex.search` returned `(0, min(k, n_vectors, n_allowed))`. Cross-class inconsistency — fixed by computing the effective_k identically for the nq=0 path. Plus 11 tests covering the new behaviour and the Rust core's `InvalidInputValue` error surfacing through the binding as `ValueError`: - `test_add_noncontiguous_vectors_raises_value_error` - `test_search_query_dim_mismatch_raises_value_error` - `test_search_noncontiguous_query_raises_value_error` - `test_swap_remove_out_of_bounds_raises_index_error` - `test_add_rejects_nan_with_value_error` - `test_add_rejects_huge_magnitude_with_value_error` - `test_search_with_nan_query_raises` - `test_search_empty_queries_returns_consistent_shape_across_index_types` - `test_search_query_dim_mismatch_raises_value_error` (IdMap) - `test_add_with_ids_noncontiguous_vectors_raises_value_error` - `test_add_with_ids_rejects_nan_with_value_error` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three test gaps surfaced by the concurrency audit. None of these exercised actual bugs (Rust's borrow checker rules out the race classes the audit considered), but they pin contracts the existing `concurrent_search.rs` doesn't: - `concurrent_search_after_load_is_safe` — a `TurboQuantIndex::load`d index starts with empty OnceLock caches; the race window mirrors the freshly-built one. The doc explicitly mentions `load` as a `prepare()`-skip target, but no test covered concurrent search immediately after `load`. - `id_map_concurrent_search_is_deterministic_across_threads` — `IdMapIndex::search` had zero concurrent-search coverage. - `concurrent_prepare_races_with_search_safely` — multiple `prepare` threads interleaved with multiple `search` threads on a fresh (unprepared) index. Pins that OnceLock's "closure runs at most once" guarantee holds end-to-end. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two audit-found bugs in the TQ+ calibration state machine that silently degrade index correctness without any error signal. ## Bug F1: v2-loaded index + add silently mis-encoded new vectors A v2 (pre-TQ+) file loads with empty `tqplus_shift`. On the next `add()`, the `existing` slot was set to `None` (the lazy-first-add signal), `encode()` fit fresh calibration on the new batch and baked it into the packed codes — but the `n_vectors != 0` else branch only extended `packed_codes` / `scales`, never persisting the fitted shift / scale_tq. The new vectors ended up encoded with calibration that search later treated as identity, silently corrupting scores. Fix: in `TurboQuantIndex::from_parts`, populate explicit identity TQ+ vectors when `tqplus_shift` is empty and `n_vectors > 0`. The loaded state then matches the on-the-wire reality (those vectors were encoded without calibration), and the next `add` sees `existing = Some(identity)` and encodes new vectors against identity too — keeping the entire index in a single coordinate system. ## Bug F2: Empty first add froze identity calibration forever `add(&[])` with n=0 hit `n < TQPLUS_MIN_SAMPLES` in `encode`, returned identity `(zeros, ones)` of length dim, and the `n_vectors == 0` branch in `add` copied that identity into `self.tqplus_shift` / `self.tqplus_scale`. Now `tqplus_shift.is_empty()` was false, so every subsequent add — even a million-vector batch with rich distribution — saw `existing = Some(identity)` and silently skipped fitting fresh calibration. The user lost TQ+ entirely with no warning. Fix: short-circuit `add` to a true no-op when n=0, before touching any state. Empty input now means "no change", end-to-end. ## Regression tests - `empty_first_add_does_not_freeze_identity_calibration` — writes the index after empty + 1500-vec add, parses the persisted TQ+ trailer, asserts at least one shift / scale value is non-trivial (pre-fix the trailer was exactly identity). - `v2_loaded_index_populates_identity_calibration` — hand-constructs a v2 .tv file, loads it, adds 1500 vectors, writes back, asserts the trailer is exactly identity (pre-fix the fitted-then-discarded calibration would have leaked through silently). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bug in the wave-5 fix for IdMap empty-query shape consistency: the `effective_k` computation for `nq == 0` used `allow_slice.len()` (raw length), but the kernel internally dedups the allowlist via a packed bool mask. So `allowlist=[1, 1, 1]` would return shape `(0, 3)` for empty queries and `(N, 1)` for non-empty queries — still divergent, just in a different shape than before wave 5. Fix: dedup via HashSet in the nq=0 path to match the kernel's mask-based dedup for nq>0. Test: `test_search_empty_queries_dedups_allowlist_for_effective_k` pins both shapes (empty and real query) and asserts they share the trailing `effective_k` dim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The wave-5 input validation (NaN/Inf/overflow rejection) added new panic conditions to `add` / `add_2d` / `search` / `search_with_mask`, but the rustdoc comments still listed only the pre-wave-5 panic triggers. Users reading the docs would expect their indexes to handle non-finite inputs gracefully when they actually panic loudly. Doc updates: - `TurboQuantIndex::add` — now lists all three panic conditions (dim-not-set, non-multiple buffer, invalid input value) in a # Panics section. - `TurboQuantIndex::add_2d` — adds `InvalidInputValue` to the typed return list and documents that bad buffer length still panics. - `TurboQuantIndex::search` — adds a # Panics section covering both buffer-length and invalid-query-value cases. - `TurboQuantIndex::search_with_mask` — same plus the mask-length panic. Plus two small cleanups: - Delete a misleading internal comment in `error.rs` that referenced a manual `PartialEq` impl which doesn't exist (the derive handles it fine; the only change wave 5 made was dropping `Eq`). - Drop the `IndexIDMap2` FAISS analogy from `id_map.rs` per the positioning memory entry: turbovec docs should describe features on their own terms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`search` on `x86_64` runtime-dispatches to AVX-512 BW or AVX2 inside an `unsafe { if/else if }` block with no `else` branch. If neither `is_x86_feature_detected!("avx512bw")` nor `is_x86_feature_detected!("avx2")` returned true — pre-Haswell x86 (rare in 2026), or a VM / emulator that doesn't expose AVX2 to userspace — neither kernel ran, `heap_sizes` stayed at `[0; batch_nq]`, and `search` returned **empty top-k results** for every query with no error signal. The `cfg(not(any(target_arch = "aarch64", target_arch = "x86_64")))` scalar fallback at the bottom of the function is gated on the architecture, not the runtime feature flag, so it didn't cover this case. Fix: extract the per-query scalar scoring (previously inlined in the catch-all `cfg` block) into a free helper `score_query_into_heap`, then add an `else` branch inside the x86_64 `unsafe { }` that calls it once per query in the current batch. The catch-all scalar block is also refactored to call the same helper, eliminating the prior duplication. The workspace `.cargo/config.toml` sets `target-cpu=x86-64-v3` which requires AVX2 at the compile level, so this affects downstream Rust users building turbovec without that config, not turbovec's own release wheels. But the silent-empty-results failure mode is the worst kind, so it gets a fix even though the reproducer is rare. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`from_parts` is `pub(crate)` and the integrity gateway for both `TurboQuantIndex::load` and `IdMapIndex::load`. The wave-6 fix added identity-population logic that relies on length invariants holding (`tqplus_shift.len() == tqplus_scale.len()`, packed/scales lengths matching `n_vectors * dim * bit_width / 8` and `n_vectors` respectively), but neither `from_parts` nor any test pinned them. Today the read-side validation in `io::load_*` upholds these invariants — but a future caller bypassing the IO layer (or a refactor that drops a read-layer check) could silently construct a malformed `TurboQuantIndex`. `search` and `swap_remove` index `packed_codes`, `scales`, and `tqplus_*` assuming the invariants hold; out-of-bounds reads / underflows would manifest as silent garbage scores rather than a clear error. Fix: assert structural invariants at `from_parts` entry, before the identity-population logic that depends on them: - `tqplus_shift.len() == tqplus_scale.len()` always. - For an eager (`dim = Some(d)`) state: - `packed_codes.len() == n_vectors * d * bit_width / 8` - `scales.len() == n_vectors` - non-empty TQ+ must have `tqplus_shift.len() == d` - For a lazy (`dim = None`) state: `n_vectors == 0` and every storage field empty. Plus 7 unit tests in a new `from_parts_tests` module covering each panic-on-violation case and two acceptance cases (lazy uncommitted, eager with consistent lengths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RyanCodrai and others added 3 commits May 30, 2026 11:12

RyanCodrai changed the title ~~Round-trip Document.blob + sparse_embedding through Haystack store~~ Field-fidelity audit fixes across Haystack, LlamaIndex, and Agno integrations May 30, 2026

RyanCodrai and others added 5 commits May 30, 2026 11:40

RyanCodrai changed the title ~~Field-fidelity audit fixes across Haystack, LlamaIndex, and Agno integrations~~ Audit-driven integration fidelity: bug fixes + 30 reference-parity tests May 30, 2026

RyanCodrai mentioned this pull request May 30, 2026

Populate Document.id in langchain similarity_search results (issue #81) #82

Closed

2 tasks

RyanCodrai and others added 8 commits May 30, 2026 12:09

RyanCodrai changed the title ~~Audit-driven integration fidelity: bug fixes + 30 reference-parity tests~~ Audit-driven integration + Rust core fidelity work May 30, 2026

RyanCodrai and others added 6 commits May 30, 2026 12:59

RyanCodrai changed the title ~~Audit-driven integration + Rust core fidelity work~~ Audit-driven integration + Rust core fidelity work (6 waves) May 30, 2026

RyanCodrai and others added 2 commits May 30, 2026 13:43

RyanCodrai changed the title ~~Audit-driven integration + Rust core fidelity work (6 waves)~~ Audit-driven integration + Rust core fidelity work (7 waves) May 30, 2026

RyanCodrai merged commit 73866a3 into main May 30, 2026
6 checks passed

RyanCodrai deleted the fix/haystack-blob-sparse-roundtrip branch May 30, 2026 12:49

This was referenced May 30, 2026

bug: langchain don't return id with documents #81

Closed

Release: turbovec 0.7.0 (Python) + 0.8.0 (Rust crate) #84

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit-driven integration + Rust core fidelity work (7 waves)#83

Audit-driven integration + Rust core fidelity work (7 waves)#83
RyanCodrai merged 24 commits into
mainfrom
fix/haystack-blob-sparse-roundtrip

RyanCodrai commented May 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyanCodrai commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Wave 1 — Structural fidelity bug fixes in integrations (4 commits)

Wave 2 — Reference-parity tests (4 commits, +30 tests)

Wave 3 — Python integration field-completeness (4 commits, 1 fix + 17 tests)

Wave 4 — Rust core audit (4 commits, 1 fix + 22 tests)

Wave 5 — Numerical + bindings + concurrency (3 commits, 8 fixes + 29 tests)

Wave 6 — TQ+ calibration + allowlist dedup + doc drift (3 commits, 3 fixes + 3 tests)

Verified during the audits

Out of scope — follow-ups worth filing

Headline totals

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RyanCodrai commented May 30, 2026 •

edited

Loading