feat: add support for automatic embeddings for the insert many tool MCP-236 #688

gagik · 2025-10-24T11:59:22Z

Adds automatic generation using the Voyage API key, with a system to minimize requests being made in bulk.

src/tools/mongodb/create/insertMany.ts

coveralls · 2025-10-29T14:38:03Z

Pull Request Test Coverage Report for Build 18912204743

Details

92 of 151 (60.93%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.3%) to 79.885%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/tools/mongodb/read/aggregate.ts	4	14	28.57%
src/tools/mongodb/create/insertMany.ts	45	94	47.87%

Totals
Change from base Build 18909867313:	-0.3%
Covered Lines:	6412
Relevant Lines:	7890

💛 - Coveralls

Copilot

Pull Request Overview

This PR introduces support for automatic vector embeddings generation in the insert-many tool, allowing users to generate embeddings for fields with vector search indexes during document insertion. The implementation adds new functionality to generate embeddings using external providers like Voyage AI while maintaining backward compatibility.

Key changes:

Enhanced the insert-many tool to accept optional embedding parameters for automatic embedding generation
Improved vector search validation with dedicated assertion methods for better error handling
Added comprehensive test coverage including integration tests with real embedding generation and accuracy tests

Reviewed Changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/tools/mongodb/create/insertMany.ts`	Core implementation of embedding generation logic with validation and document processing
`src/common/search/vectorSearchEmbeddingsManager.ts`	Refactored validation methods and added assertion utilities for vector search operations
`src/tools/mongodb/read/aggregate.ts`	Updated to use new assertion method for vector search index validation
`tests/integration/tools/mongodb/create/insertMany.test.ts`	Comprehensive integration tests for embedding generation scenarios
`tests/unit/common/search/vectorSearchEmbeddingsManager.test.ts`	Unit tests for new assertion functionality
`tests/accuracy/insertMany.embeddings.test.ts`	Accuracy tests to validate tool behavior with different AI models
`tests/accuracy/sdk/matcher.ts`	Added null matcher utility for test assertions
`src/common/errors.ts`	Added new error code for unexpected errors
`CONTRIBUTING.md`	Documentation update for colima Docker configuration

tests/integration/tools/mongodb/create/insertMany.test.ts

Copilot · 2025-10-29T14:40:09Z

src/tools/mongodb/create/insertMany.ts

+        await this.session.vectorSearchEmbeddingsManager.assertFieldsHaveCorrectEmbeddings(
+            { database, collection },
+            documents
+        );


This validation is performed after potentially expensive embedding generation. Consider moving this validation before the embedding generation step to fail fast and avoid unnecessary API calls when documents have invalid embeddings.

well the embeddings generation (when it happens) is relevant for asserting if the embeddings are correct, mr robot

…gagik/accuracy-embeddings

gagik · 2025-10-29T14:56:20Z

tests/integration/tools/mongodb/create/insertMany.test.ts

        beforeEach(async () => {
            await integration.connectMcpClient();
-            collection = await integration.mongoClient().db(integration.randomDbName()).createCollection("test");
+            database = integration.randomDbName();


I regret doing this because there's too many calls of this function but I'd not have expected integration.randomDbName() to always return the same value so switched to storing in a variable

gagik · 2025-10-29T14:59:11Z

CONTRIBUTING.md

 npm test -- path/to/directory
 ```

+#### Accuracy Tests and colima


I don't know if this is considered niche but I find colima to be a nice simple way to run docker on Mac and if this existed I'd not have lost many painful moments of figuring this bit out

gagik · 2025-10-29T15:02:45Z

src/tools/mongodb/create/insertMany.ts

    protected description = "Insert an array of documents into a MongoDB collection";
    protected argsShape = {
        ...DbOperationArgs,
        documents: z


okay so I did my best to prompt engineer it to generate embeddings by supplying plain text to the vector search indexed fields as strings inside documents.
However, in my experience models get confused about this idea of "the schema says it's a vector but I need to ignore this and supply a string" and I also think relying on the description of the documents field for this is less scalable.
So, the best way I found to force the model to provide these vector index raw strings is to put it as a required input as part of embeddingParameters. It seems to do a good job at mapping the 2 arrays and respective indexes. It does sometimes provide a string in place of the original document's field in the input, but this gets overriden so seems quite fine.

Open to suggestions if I may be overlooking something here.

I think this should work fine, we have the accuracy tests if something goes wary in the future and we can safely tweak the prompt.

gagik · 2025-10-29T15:07:17Z

I'll look into adding a few more tests I guess to keep coveralls happy

blva · 2025-10-29T15:22:25Z

src/tools/mongodb/create/insertMany.ts

+        await this.session.vectorSearchEmbeddingsManager.assertFieldsHaveCorrectEmbeddings(
+            { database, collection },
+            documents


[q] do we need to run this if embeddings are not enabled?

this was existing behavior so I kept it as we would want to prevent the model from adding artbirary data to a vector search indexed field.
that said, I'm wondering if the insert many call would have failed at DB-level anyhow; not sure if that's redundant then.
cc: @kmruiz

So @gagik is completely right. It's to prevent the agent adding a raw field in a place where the server expects an embedding. Currently, the server does not reject these invalid values (unless the user specifies a JSON Schema) and it would break the VS Index and the behaviour can be pretty inconsistent depending on how it breaks.

kmruiz · 2025-10-29T17:01:32Z

src/common/search/vectorSearchEmbeddingsManager.ts

        embeddingParameters: SupportedEmbeddingParameters;
        inputType: EmbeddingParameters["inputType"];
-    }): Promise<unknown[]> {
+    }): Promise<number[][]> {


Sadly, this is not correct. An Embedding can be also other types, like object[] (for different types of quantisations). That's why it was an unknown[].

github-advanced-security bot found potential problems Oct 24, 2025

View reviewed changes

src/tools/mongodb/create/insertMany.ts Fixed Show fixed Hide fixed

chore: simplify

3bf73da

gagik force-pushed the gagik/accuracy-embeddings branch from 9326ed4 to 3bf73da Compare October 27, 2025 12:50

github-advanced-security bot found potential problems Oct 27, 2025

View reviewed changes

src/tools/mongodb/create/insertMany.ts Fixed Show fixed Hide fixed

gagik added 4 commits October 28, 2025 13:50

chore: inside-field

5917de1

chore: simplify, cleanup, add tests

43c4d6c

chore: fix tests

ab71348

chore: fix more tests

746e6da

gagik marked this pull request as ready for review October 29, 2025 14:38

gagik requested a review from a team as a code owner October 29, 2025 14:38

Copilot AI review requested due to automatic review settings October 29, 2025 14:38

gagik changed the title ~~WIP - insert many with generated embeddings~~ feat: add support for automatic embeddings for the insert many tool Oct 29, 2025

Copilot AI reviewed Oct 29, 2025

View reviewed changes

gagik added 3 commits October 29, 2025 15:42

Merge branch 'main' of github.com:mongodb-js/mongodb-mcp-server into …

2bb10e8

…gagik/accuracy-embeddings

chore: conditional usage of vector search param

bffe2eb

chore: more usage of database instead of randomDbName

645ff86

gagik commented Oct 29, 2025

View reviewed changes

gagik changed the title ~~feat: add support for automatic embeddings for the insert many tool~~ feat: add support for automatic embeddings for the insert many tool MCP-236 Oct 29, 2025

blva reviewed Oct 29, 2025

View reviewed changes

kmruiz reviewed Oct 29, 2025

View reviewed changes

kmruiz approved these changes Oct 29, 2025

View reviewed changes

Uh oh!

feat: add support for automatic embeddings for the insert many tool MCP-236 #688

Are you sure you want to change the base?

feat: add support for automatic embeddings for the insert many tool MCP-236 #688

Conversation

gagik commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coveralls commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 18912204743

Details

💛 - Coveralls

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gagik Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gagik Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gagik Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gagik Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmruiz Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gagik commented Oct 29, 2025

Uh oh!

blva Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

gagik Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kmruiz Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

kmruiz Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gagik commented Oct 24, 2025 •

edited

Loading

coveralls commented Oct 29, 2025 •

edited

Loading

gagik Oct 29, 2025 •

edited

Loading

gagik Oct 29, 2025 •

edited

Loading

gagik Oct 29, 2025 •

edited

Loading