Skip to content

Conversation

rfsaliev
Copy link
Collaborator

Describe the changes in the pull request

This PR adds python flow tests for TieredSVSIndex.

Which issues this PR fixes

  1. #...
  2. MOD...

Main objects this PR modified

  1. ...
  2. ...

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Copy link

codecov bot commented Jun 26, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.86%. Comparing base (63ba350) to head (b21ed77).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #707      +/-   ##
==========================================
+ Coverage   96.84%   96.86%   +0.01%     
==========================================
  Files         122      122              
  Lines        7393     7393              
==========================================
+ Hits         7160     7161       +1     
+ Misses        233      232       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a suite of Python flow tests for the TieredSVSIndex to verify its behavior under various operations such as insertion, deletion, batch querying, and range queries.

  • Added comprehensive tests for single-label and FLOAT16 configurations.
  • Updated helper functions in common.py to support SVS parameters and enhanced Python bindings in src/python_bindings/bindings.cpp.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/flow/test_svs_tiered.py Added multiple tests covering creation, insertion, search, deletion, batch iteration, and recall measurement.
tests/flow/common.py Introduced a helper to create SVS parameters with configurable options.
src/python_bindings/bindings.cpp Updated comments for clarity, added default num_threads logic, and exposed svs_label_count.

Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good, few comments and requests

# Wait for all repair jobs to be done.
index.wait_for_index(5)
test_logger.info(f"Done deleting half of the index")
assert index.svs_label_count() >= (num_elements / 2) - indices_ctx.tiered_svs_params.updateTriggerThreshold
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that the check here? When we are deleting, we are marking vector as deleted in svs and the number of labels is decreasing in place, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is copied from HNSW tiered flow test, but according to SVS Tiered batch update logic, part of vectors can be kept in flat index, so the label count in backend index can be less than num_elements / 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but to validate that vectors were deleted I would also chack that index.svs_label_count() <= (num_elements / 2) + indices_ctx.tiered_svs_params.updateTriggerThreshold

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one more assert.

test_logger.info("Test create FLOAT16 tiered svs index")
create_tiered_index(test_logger, is_multi=False, data_type=VecSimType_FLOAT16)

def test_search_insert(test_logger):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also cover test_search_insert for compressed svs types in FP16 and FP32? These are the interesting cases to check with large random dataset, where we probably expect some lower recall

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added LVQ8, LVQ4 tests.
LeanVec makes no sense for randomized dataset: recall = 0.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's interesting, can you explain the reason why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of an alternative test that will fit LeanVec? It does not have to use random vectors, but we do want to validate that the search is executed correctly during indexing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will look for a method to generate better sample dataset for LeanVec

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added correlated data generator and LeanVec tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a dataset generator with correlated values to emulate real life.
And added LeanVec search tests.

@rfsaliev rfsaliev force-pushed the rfsaliev/svs-tiered-flow-tests branch from 447333d to f581464 Compare July 9, 2025 11:35
@rfsaliev rfsaliev force-pushed the rfsaliev/svs-tiered-flow-tests branch from 444a217 to b21ed77 Compare July 11, 2025 12:16
@rfsaliev rfsaliev added this pull request to the merge queue Jul 21, 2025
Merged via the queue into main with commit 4135eff Jul 21, 2025
19 checks passed
@rfsaliev rfsaliev deleted the rfsaliev/svs-tiered-flow-tests branch July 21, 2025 16:24
@alonre24
Copy link
Collaborator

/backport

Copy link

Successfully created backport PR for 8.2:

github-actions bot pushed a commit that referenced this pull request Jul 21, 2025
* Draft SVS Tiered flow test

* Make tests pass with the new tiered flow

* Code review s1e1

* Code review s1e2

(cherry picked from commit 4135eff)
github-merge-queue bot pushed a commit that referenced this pull request Jul 21, 2025
[SVS] Add TieredSVSIndex flow tests. (#707)

* Draft SVS Tiered flow test

* Make tests pass with the new tiered flow

* Code review s1e1

* Code review s1e2

(cherry picked from commit 4135eff)

Co-authored-by: Rafik Saliev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants