-
Notifications
You must be signed in to change notification settings - Fork 1.3k
CSHARP-5717: Typed builders for Atlas indexes #1769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
As discussed with Boris, these are used for index building in EF Core. There isn't any strongly typed API for vector indexes in the driver yet, but, when there is, then these enums will be used there as well as being used by EF.
3e80e4e
to
5c2e17e
Compare
@BorisDog Can you take a look at this and see if it is going in the right direction? If so, then I'll do the non-vector search indexes as well. (Put this on the wrong PR before.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks fine. See comments.
public class CreateAtlasVectorIndexModel<TDocument> : CreateSearchIndexModel | ||
{ | ||
private readonly RenderArgs<TDocument> _renderArgs | ||
= new(BsonSerializer.LookupSerializer<TDocument>(), BsonSerializer.SerializerRegistry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentSerializer
should come from the collection.DocumentSerializer
and not from the registry.
That's why we have Render
methods with RenderArgs
so that the document serializer can be passed in later when it is known.
=> SearchIndexType.VectorSearch; | ||
|
||
/// <inheritdoc/> | ||
public override BsonDocument Definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit odd to see a property getter doing so much work. Usually properties simply return an existing value.
But... making this a method would be a breaking change.
{ | ||
get | ||
{ | ||
if (base.Definition != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't really happen right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. The idea here is that the type is immutable, and we don't want to re-create the document every time the property is accessed--as you mentioned elsewhere, ideally this should not be a property. So, the first time this is called, the definition stored in the base is null. It is then built and cached so it doesn't have to be built again if the property is accessed again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't cache the result of Render
because if Render
is called again with different RenderArgs
the result could be different.
I think you should throw an exception if the Document
property is accessed on this subclass. Probably the very existence of this property is dubious.
{ "type", BsonString.Create("vector") }, | ||
{ "path", Field.Render(_renderArgs).FieldName }, | ||
{ "numDimensions", BsonInt32.Create(Dimensions) }, | ||
{ "similarity", BsonString.Create(similarityValue) }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call BsonString.Create
or BsonInt32.Create
.
There are implicit conversions to simplify code like this.
} | ||
} | ||
|
||
base.Definition = new BsonDocument { { "fields", BsonArray.Create(fieldDocuments) } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have made the fieldDocuments
a BsonArray
in the first place. No need for the intermediate List
.
var vectorField = new BsonDocument | ||
{ | ||
{ "type", BsonString.Create("vector") }, | ||
{ "path", Field.Render(_renderArgs).FieldName }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really possible to use a cached renderArgs
.
This implies that this functionality should really be in a Render
method that TAKES a renderArgs
.
How to do that given that this is a property is a conundrum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would someone have done this correctly (that is, create the document with the correct serializer) before my changes? What happens if they do it wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstam I've been pondering this some more. Given that we are building metadata for the server here, rather than user documents, shouldn't we be using a standard form of serialization always? In other words, what is the scenario where this metadata should be built with custom serializers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't usually use serializers to build commands, just for data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstam Since we are not building user data here, but rather just metadata, then should we not use serializers here? I'm not sure what you mean by "commands" above? If we shouldn't be using serializers, then how to we go from a FieldDefinition to path in the metadata without using a serializer? Or, to ask the same thing another way, why was this originally written to use serializers to create metadata? How would use of those serializers make the metadata generated different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "command" is a BSON document we send to the server telling the server to do something. In this case the command is "createSearchIndexes", though in THIS file we are just creating a small part of that command, i.e. the "indexes" field of the "createSearchIndexes" command.
I didn't mean to say that we shouldn't use serializers AT ALL when creating a command, just that we don't use serializers to create the bulk of the command. Of course we have to consult the corresponding serializer when a user uses an Expression
to identify a field, so that we can ask the serializer what the element name should be. But that's actually handled by the appropriate overload of Render
.
And when creating a command that includes user defined POCOs as data to the command we use serializers to convert the POCOs to BSON inside the command, but that's not the case for this command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, to ask the same thing another way, why was this originally written to use serializers to create metadata
I don't understand this question.
What part of this was originally written to use serializers?
{ | ||
get | ||
{ | ||
if (base.Definition != null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't cache the result of Render
because if Render
is called again with different RenderArgs
the result could be different.
I think you should throw an exception if the Document
property is accessed on this subclass. Probably the very existence of this property is dubious.
var vectorField = new BsonDocument | ||
{ | ||
{ "type", BsonString.Create("vector") }, | ||
{ "path", Field.Render(_renderArgs).FieldName }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't usually use serializers to build commands, just for data.
Again as discussed with Boris, this has been update to be a typed builder implementation for Atlas indexes.
Original:
As discussed with Boris, these are used for index building in EF Core. There isn't any strongly typed API for vector indexes in the driver yet, but, when there is, then these enums will be used there as well as being used by EF.