-
Notifications
You must be signed in to change notification settings - Fork 21
[SVS] Add TieredSVSIndex flow tests. #707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #707 +/- ##
==========================================
+ Coverage 96.84% 96.86% +0.01%
==========================================
Files 122 122
Lines 7393 7393
==========================================
+ Hits 7160 7161 +1
+ Misses 233 232 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a suite of Python flow tests for the TieredSVSIndex to verify its behavior under various operations such as insertion, deletion, batch querying, and range queries.
- Added comprehensive tests for single-label and FLOAT16 configurations.
- Updated helper functions in common.py to support SVS parameters and enhanced Python bindings in src/python_bindings/bindings.cpp.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
tests/flow/test_svs_tiered.py | Added multiple tests covering creation, insertion, search, deletion, batch iteration, and recall measurement. |
tests/flow/common.py | Introduced a helper to create SVS parameters with configurable options. |
src/python_bindings/bindings.cpp | Updated comments for clarity, added default num_threads logic, and exposed svs_label_count. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good, few comments and requests
tests/flow/test_svs_tiered.py
Outdated
# Wait for all repair jobs to be done. | ||
index.wait_for_index(5) | ||
test_logger.info(f"Done deleting half of the index") | ||
assert index.svs_label_count() >= (num_elements / 2) - indices_ctx.tiered_svs_params.updateTriggerThreshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that the check here? When we are deleting, we are marking vector as deleted in svs and the number of labels is decreasing in place, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is copied from HNSW tiered flow test, but according to SVS Tiered batch update logic, part of vectors can be kept in flat index, so the label count in backend index can be less than num_elements / 2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but to validate that vectors were deleted I would also chack that index.svs_label_count() <= (num_elements / 2) + indices_ctx.tiered_svs_params.updateTriggerThreshold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one more assert.
test_logger.info("Test create FLOAT16 tiered svs index") | ||
create_tiered_index(test_logger, is_multi=False, data_type=VecSimType_FLOAT16) | ||
|
||
def test_search_insert(test_logger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also cover test_search_insert
for compressed svs types in FP16 and FP32? These are the interesting cases to check with large random dataset, where we probably expect some lower recall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added LVQ8, LVQ4 tests.
LeanVec makes no sense for randomized dataset: recall = 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, can you explain the reason why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LeanVec compression is based on SVD Linear Dimensionality Reduction.
Dimensionality reduction, by nature, loses some information, just like image compression for instance. Consequently, it will often reduce the prediction quality of your model, but depending on the data it could also leave it unchanged or even improve it in some cases (very noisy data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of an alternative test that will fit LeanVec? It does not have to use random vectors, but we do want to validate that the search is executed correctly during indexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will look for a method to generate better sample dataset for LeanVec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added correlated data generator and LeanVec tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a dataset generator with correlated values to emulate real life.
And added LeanVec search tests.
447333d
to
f581464
Compare
444a217
to
b21ed77
Compare
/backport |
Successfully created backport PR for |
* Draft SVS Tiered flow test * Make tests pass with the new tiered flow * Code review s1e1 * Code review s1e2 (cherry picked from commit 4135eff)
[SVS] Add TieredSVSIndex flow tests. (#707) * Draft SVS Tiered flow test * Make tests pass with the new tiered flow * Code review s1e1 * Code review s1e2 (cherry picked from commit 4135eff) Co-authored-by: Rafik Saliev <[email protected]>
Describe the changes in the pull request
This PR adds python flow tests for TieredSVSIndex.
Which issues this PR fixes
Main objects this PR modified
Mark if applicable