-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iteratively build graph index #612
base: branch-25.02
Are you sure you want to change the base?
Iteratively build graph index #612
Conversation
This PR is related to #610 |
/ok to test |
cbf708b
to
bce7efa
Compare
/ok to test |
0100b9c
to
ce17775
Compare
Conflicts with branch-25.02 have been fixed. |
/ok to test |
/ok to test |
/ok to test |
@@ -104,7 +108,8 @@ struct index_params : cuvs::neighbors::index_params { | |||
*/ | |||
std::variant<std::monostate, | |||
graph_build_params::ivf_pq_params, | |||
graph_build_params::nn_descent_params> | |||
graph_build_params::nn_descent_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have explicit docs for each of these arguments. I understand the iterative search params are experimental to start, but can we at least add them to the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I will add comments about iterative_search_params here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Akira for this PR, it is very nice to see this feature in cuvs. The PR looks great overall, I just have a few smaller suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the changes look very good and clean to me
// Determine initial graph size. | ||
uint64_t final_graph_size = (uint64_t)dataset.extent(0); | ||
uint64_t initial_graph_size = (final_graph_size + 1) / 2; | ||
while (initial_graph_size > graph_degree * 64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nitpick, but perhaps we could add RAFT_EXPECTS(graph_degree > 0);
assertion at the top of this function to make sure it doesn't cause an infinite loop if invalid (zero) graph_degree is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Akira for the updates, the PR looks good to me.
This PR is about how CAGRA's search() and optimize() can be used to iteratively create and improve graph index.
Currently, IVFPQ and NND are used to create the initial kNN graph, which is then optimized to create the CAGRA search graph. So, for example, if you want to support a new data type in CAGRA, you need to create an initial kNN graph with that data type, and IVFPQ or NND must also support that new data type. This is a bit of hassle.
This PR is one solution to that problem. With functionality of this PR, once the CAGRA search supports the new data type, it can be used to create a graph index with it.