Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iteratively build graph index #612

Open
wants to merge 13 commits into
base: branch-25.02
Choose a base branch
from

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Jan 24, 2025

This PR is about how CAGRA's search() and optimize() can be used to iteratively create and improve graph index.

Currently, IVFPQ and NND are used to create the initial kNN graph, which is then optimized to create the CAGRA search graph. So, for example, if you want to support a new data type in CAGRA, you need to create an initial kNN graph with that data type, and IVFPQ or NND must also support that new data type. This is a bit of hassle.

This PR is one solution to that problem. With functionality of this PR, once the CAGRA search supports the new data type, it can be used to create a graph index with it.

@anaruse anaruse requested a review from a team as a code owner January 24, 2025 07:14
Copy link

copy-pr-bot bot commented Jan 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Jan 24, 2025
@anaruse
Copy link
Contributor Author

anaruse commented Jan 24, 2025

This PR is related to #610

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jan 25, 2025
@cjnolet
Copy link
Member

cjnolet commented Jan 27, 2025

/ok to test

@anaruse anaruse force-pushed the iteratively_build_graph_index branch 2 times, most recently from cbf708b to bce7efa Compare January 27, 2025 11:23
@cjnolet
Copy link
Member

cjnolet commented Jan 27, 2025

/ok to test

@anaruse anaruse force-pushed the iteratively_build_graph_index branch from 0100b9c to ce17775 Compare January 28, 2025 05:45
@anaruse
Copy link
Contributor Author

anaruse commented Jan 28, 2025

Conflicts with branch-25.02 have been fixed.

@cjnolet
Copy link
Member

cjnolet commented Jan 28, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 29, 2025

@anaruse can you install precommit and run your changes through the hooks? That'll get your formatting passing the style checker.

Guide to install the commit hooks is here.

@anaruse
Copy link
Contributor Author

anaruse commented Jan 29, 2025

@anaruse can you install precommit and run your changes through the hooks? That'll get your formatting passing the style checker.

Guide to install the commit hooks is here.

I've already install and set up the precommit. I'm not sure how this happened. I'll fix it anyway.

@cjnolet
Copy link
Member

cjnolet commented Jan 29, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Jan 30, 2025

/ok to test

@@ -104,7 +108,8 @@ struct index_params : cuvs::neighbors::index_params {
*/
std::variant<std::monostate,
graph_build_params::ivf_pq_params,
graph_build_params::nn_descent_params>
graph_build_params::nn_descent_params,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have explicit docs for each of these arguments. I understand the iterative search params are experimental to start, but can we at least add them to the docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will add comments about iterative_search_params here.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Akira for this PR, it is very nice to see this feature in cuvs. The PR looks great overall, I just have a few smaller suggestion.

cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated Show resolved Hide resolved
cpp/src/neighbors/detail/cagra/cagra_build.cuh Outdated Show resolved Hide resolved
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the changes look very good and clean to me

// Determine initial graph size.
uint64_t final_graph_size = (uint64_t)dataset.extent(0);
uint64_t initial_graph_size = (final_graph_size + 1) / 2;
while (initial_graph_size > graph_degree * 64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick, but perhaps we could add RAFT_EXPECTS(graph_degree > 0); assertion at the top of this function to make sure it doesn't cause an infinite loop if invalid (zero) graph_degree is set?

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Akira for the updates, the PR looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants