Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net MEVD: Consider making Top a mandatory parameter #10193

Open
roji opened this issue Jan 15, 2025 · 0 comments
Open

.Net MEVD: Consider making Top a mandatory parameter #10193

roji opened this issue Jan 15, 2025 · 0 comments
Assignees
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code

Comments

@roji
Copy link
Member

roji commented Jan 15, 2025

The maximum number of results to return (termed "top" or K) is currently a represented by VectorSearchOptions.Top, which is an optional property that defaults to 3.

Most vector databases (AFAIK) require this value as a mandatory parameter in their API/SDK; this is important as similarity search always returns all elements in the database unless some limit (or filter) is specified (FWIW, it's almost always a good idea to set LIMIT even in relational databases as well). I think our general concept with the abstraction is to expose underlying databases/SDKs, and so we'd ideally avoid making decisions or determining defaults on the user's behalf. If that makes sense, we can move Top to be a mandatory parameter directly on the method signature.

Some vector databases exist which allow not specifying the value (and themselves have some built-in default or similar); for example, Azure Search seems to allow KNearestNeighborsCount to be unset, and defaults to 50 (docs). Our current code overrides that default, setting it to 3.

To support database-side defaults, we could change our Limit to a nullable int without a default, allowing us to not send the value to the database. Providers which do require the limit would throw if it's unset. However, despite Azure Search's default of 50, I personally don't believe that a database-side default makes sense here - each application has the number of results that makes sense for it, and developers should make a decision on that, rather than get e.g. 50 (or 3) by default. I also think Azure Search is rare in allowing the limit to be unset (but this would need to be confirmed).

/cc @westey-m

@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code triage labels Jan 15, 2025
@github-actions github-actions bot changed the title .NET MEVD: Consider making Top a mandatory parameter .Net MEVD: Consider making Top a mandatory parameter Jan 15, 2025
@markwallace-microsoft markwallace-microsoft added msft.ext.vectordata Related to Microsoft.Extensions.VectorData and removed triage labels Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
msft.ext.vectordata Related to Microsoft.Extensions.VectorData .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

No branches or pull requests

2 participants