.Net MEVD: Consider making Top a mandatory parameter #10193
Labels
msft.ext.vectordata
Related to Microsoft.Extensions.VectorData
.NET
Issue or Pull requests regarding .NET code
The maximum number of results to return (termed "top" or K) is currently a represented by VectorSearchOptions.Top, which is an optional property that defaults to 3.
Most vector databases (AFAIK) require this value as a mandatory parameter in their API/SDK; this is important as similarity search always returns all elements in the database unless some limit (or filter) is specified (FWIW, it's almost always a good idea to set LIMIT even in relational databases as well). I think our general concept with the abstraction is to expose underlying databases/SDKs, and so we'd ideally avoid making decisions or determining defaults on the user's behalf. If that makes sense, we can move Top to be a mandatory parameter directly on the method signature.
Some vector databases exist which allow not specifying the value (and themselves have some built-in default or similar); for example, Azure Search seems to allow KNearestNeighborsCount to be unset, and defaults to 50 (docs). Our current code overrides that default, setting it to 3.
To support database-side defaults, we could change our Limit to a nullable int without a default, allowing us to not send the value to the database. Providers which do require the limit would throw if it's unset. However, despite Azure Search's default of 50, I personally don't believe that a database-side default makes sense here - each application has the number of results that makes sense for it, and developers should make a decision on that, rather than get e.g. 50 (or 3) by default. I also think Azure Search is rare in allowing the limit to be unset (but this would need to be confirmed).
/cc @westey-m
The text was updated successfully, but these errors were encountered: