A comprehensive MongoDB plugin for Genkit that provides vector search, text search, hybrid search, CRUD operations, and search index management capabilities.
- MongoDB 6.0+ with Atlas Search or local search indexes
- Node.js 18+
- Genkit framework
Note: The plugin itself only requires the above prerequisites. The Google Cloud Project and Google AI API access mentioned in the testapp examples are only needed for the multimodal processing examples (image and document flows) in the test application.
This repository contains two main components:
The MongoDB plugin library that can be installed and used in your Genkit applications.
What it is:
- A reusable library/plugin for Genkit
- Can be built but not run directly
- Designed to be imported and configured in applications
- Provides MongoDB integration capabilities
How to use:
# Install the plugin in your project
pnpm add genkitx-mongodb
# Build the plugin (for development)
cd plugin
pnpm run build
A comprehensive demonstration application that showcases all the plugin's capabilities.
What it is:
- A complete Genkit application that uses the plugin
- Demonstrates all features with working examples
- Can be run and interacted with via Genkit UI
- Includes sample data and workflows
How to use:
# Install dependencies
cd testapp
pnpm install
# Build the application
pnpm run build
# Run in development mode
pnpm run dev
# Start with Genkit UI
pnpm run start
pnpm add genkitx-mongodb
- Vector Search: Semantic search using embeddings with MongoDB's vector search capabilities
- Text Search: Full-text search with fuzzy matching and synonyms support
- Hybrid Search: Combine vector and text search using MongoDB's
$rankFusion
for enhanced results - CRUD Operations: Create, read, update, and delete documents by ID
- Search Index Management: Create, list, and drop search indexes
- Batch Indexing: Efficient document indexing with configurable batch sizes
- Retry Logic: Built-in retry mechanisms with configurable attempts, delays, and jitter
- Flexible Field Configuration: Customizable field names for data, metadata, and embeddings
- Multiple Connection Support: Configure multiple MongoDB connections with different settings
- Multimodal Support: Process images and documents with multimodal embeddings
- Pipeline Support: Custom aggregation pipelines for advanced querying
If you want to use the MongoDB plugin in your own Genkit application:
- Install the plugin:
pnpm add genkitx-mongodb
-
Follow the usage examples below to integrate it into your Genkit application
-
Optional: Check out the testapp for comprehensive examples and workflows
If you want to explore, test, or contribute to the plugin:
-
Clone the repository:
git clone <repository-url> cd genkitx-mongodb
-
Build the plugin:
cd plugin
pnpm install
pnpm run build
-
Run the test application:
cd testapp pnpm install pnpm run build pnpm run start
-
Explore the examples in the testapp to see all features in action
If you want to understand how the plugin works and test its capabilities:
- Start with the testapp - it's a complete working example
- Follow the testapp README for detailed setup and usage instructions
- Use the Genkit UI to interact with all the features
- Examine the code to understand how to integrate the plugin in your own applications
import { genkit } from "genkit";
import { mongodb } from "genkitx-mongodb";
import { googleAI } from "@genkit-ai/googleai";
const ai = genkit({
plugins: [
mongodb([
{
url: "mongodb://localhost:27017",
mongoClientOptions: {
// Optional MongoDB client options
},
indexer: {
id: "indexer",
retry: {
retryAttempts: 3,
baseDelay: 1000,
jitterFactor: 0.1,
},
},
retriever: {
id: "retriever",
retry: {
retryAttempts: 2,
baseDelay: 500,
},
},
crudTools: {
id: "crud",
},
searchIndexTools: {
id: "search-index",
},
},
]),
],
});
You can configure multiple MongoDB connections with different settings:
mongodb([
{
url: "mongodb://primary:27017",
indexer: {
id: "primary-indexer",
retry: {
retryAttempts: 3,
baseDelay: 1000,
},
},
retriever: {
id: "primary-retriever",
retry: {
retryAttempts: 2,
baseDelay: 500,
},
},
crudTools: { id: "primary-crud" },
searchIndexTools: { id: "primary-search" },
},
{
url: "mongodb://secondary:27017",
indexer: {
id: "secondary-indexer",
retry: {
retryAttempts: 5,
baseDelay: 2000,
jitterFactor: 0.2,
},
},
retriever: { id: "secondary-retriever" },
crudTools: { id: "secondary-crud" },
searchIndexTools: { id: "secondary-search" },
},
]);
import { Document } from "genkit";
import { mongoIndexerRef } from "genkitx-mongodb";
const documents = [
Document.fromText("Sample document content", { id: "1", category: "sample" }),
Document.fromText("Another document", { id: "2", category: "example" }),
];
await ai.index({
indexer: mongoIndexerRef("indexer"),
documents,
options: {
dbName: "myDatabase",
collectionName: "myCollection",
embedder: googleAI.embedder("text-embedding-004"),
embeddingField: "embedding",
batchSize: 100,
skipData: false, // Optional: Set to true to exclude original data from storage
dataField: "data", // Optional: Custom field name for document data
metadataField: "metadata", // Optional: Custom field name for metadata
dataTypeField: "dataType", // Optional: Custom field name for data type
},
});
import { mongoRetrieverRef } from "genkitx-mongodb";
const results = await ai.retrieve({
retriever: mongoRetrieverRef("retriever"),
query: "search query",
options: {
dbName: "myDatabase",
collectionName: "myCollection",
embedder: googleAI.embedder("text-embedding-004"),
vectorSearch: {
index: "embedding_index",
path: "embedding",
exact: false,
numCandidates: 100,
limit: 10,
filter: { category: "sample" },
},
},
});
const results = await ai.retrieve({
retriever: mongoRetrieverRef("retriever"),
query: "search query",
options: {
dbName: "myDatabase",
collectionName: "myCollection",
search: {
index: "text_index",
text: {
path: "content",
matchCriteria: "any",
fuzzy: {
maxEdits: 2,
prefixLength: 0,
maxExpansions: 50,
},
},
},
pipelines: [{ $limit: 10 }, { $sort: { score: -1 } }],
},
});
The plugin supports hybrid search using MongoDB's $rankFusion
aggregation, which combines vector and text search results for enhanced retrieval:
const results = await ai.retrieve({
retriever: mongoRetrieverRef("retriever"),
query: "search query",
options: {
dbName: "myDatabase",
collectionName: "myCollection",
embedder: googleAI.embedder("text-embedding-004"),
hybridSearch: {
search: {
index: "text_index",
text: {
path: "content",
fuzzy: {
maxEdits: 2,
prefixLength: 0,
maxExpansions: 50,
},
},
},
vectorSearch: {
index: "embedding_index",
path: "embedding",
exact: false,
numCandidates: 100,
limit: 10,
},
combination: {
weights: {
vectorPipeline: 0.7, // Weight for vector search results
fullTextPipeline: 0.3, // Weight for text search results
},
},
scoreDetails: true, // Include detailed scoring information
},
},
});
The hybrid search combines the strengths of both vector and text search:
- Vector Pipeline: Uses semantic similarity for finding conceptually related content
- Text Pipeline: Uses exact text matching with fuzzy search capabilities
- Rank Fusion: Combines results using configurable weights and scoring
- Score Details: Optional detailed scoring information for debugging
{
search: TextSearchOptions, // Text search configuration
vectorSearch: VectorSearchOptions, // Vector search configuration
combination?: {
weights?: {
vectorPipeline?: number, // Weight for vector results (0-1, default: 0.5)
fullTextPipeline?: number, // Weight for text results (0-1, default: 0.5)
},
},
scoreDetails?: boolean, // Include score details (default: false)
}
The plugin provides tools for basic CRUD operations by document ID:
// Create a document
await ai.runTool({
name: "mongodb/crud/create",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
document: { name: "John", age: 30 },
},
});
// Read a document by ID
const result = await ai.runTool({
name: "mongodb/crud/read",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
id: "507f1f77bcf86cd799439011",
},
});
// Update a document by ID
await ai.runTool({
name: "mongodb/crud/update",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
id: "507f1f77bcf86cd799439011",
document: { age: 31 },
},
});
// Delete a document by ID
await ai.runTool({
name: "mongodb/crud/delete",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
id: "507f1f77bcf86cd799439011",
},
});
// Create a search index
await ai.runTool({
name: "mongodb/search-index/create",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
indexName: "text_index",
definition: {
mappings: {
dynamic: true,
fields: {
content: {
type: "string",
analyzer: "lucene.english",
},
},
},
},
},
});
// List search indexes
const indexes = await ai.runTool({
name: "mongodb/search-index/list",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
},
});
// Drop a search index
await ai.runTool({
name: "mongodb/search-index/drop",
input: {
dbName: "myDatabase",
collectionName: "myCollection",
indexName: "text_index",
},
});
The plugin supports multimodal embeddings for processing images and documents:
import { multimodalEmbedding001 } from "@genkit-ai/vertexai";
// Index images with multimodal embeddings
await ai.index({
indexer: mongoIndexerRef("indexer"),
documents: imageDocuments,
options: {
dbName: "myDatabase",
collectionName: "imageCollection",
embedder: multimodalEmbedding001,
embeddingField: "imageEmbedding",
dataField: "imageData",
metadataField: "imageMetadata",
dataTypeField: "imageType",
},
});
// Retrieve similar images
const results = await ai.retrieve({
retriever: mongoRetrieverRef("retriever"),
query: "find images similar to a cat",
options: {
dbName: "myDatabase",
collectionName: "imageCollection",
embedder: multimodalEmbedding001,
vectorSearch: {
index: "image_embedding_index",
path: "imageEmbedding",
numCandidates: 50,
limit: 5,
},
},
});
{
url: string; // MongoDB connection string
mongoClientOptions?: object; // MongoDB client options
indexer?: BaseDefinition; // Indexer configuration
retriever?: BaseDefinition; // Retriever configuration
crudTools?: BaseDefinition; // CRUD tools configuration
searchIndexTools?: BaseDefinition; // Search index tools configuration
}
Each component (indexer, retriever, crudTools, searchIndexTools) uses a base definition:
{
id: string; // Unique identifier for the component
retry?: RetryOptions; // Optional retry options for this component
}
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
embedder: EmbedderArgument; // Embedder for generating vectors
embedderOptions?: object; // Optional embedder-specific options
embeddingField?: string; // Field name for embeddings (default: 'embedding')
batchSize?: number; // Batch size for indexing (default: 100)
skipData?: boolean; // Optional: Skip storing original data (default: false)
dataField?: string; // Field name for data (default: 'data')
metadataField?: string; // Field name for metadata (default: 'metadata')
dataTypeField?: string; // Field name for data type (default: 'dataType')
}
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
// For vector search:
embedder?: EmbedderArgument; // Embedder for query vectorization
embedderOptions?: object; // Optional embedder-specific options
vectorSearch?: {
index: string; // Vector search index name
path: string; // Field path for vectors
exact?: boolean; // Use exact search
numCandidates?: number; // Number of candidates (max: 10000)
limit?: number; // Result limit
filter?: object; // MongoDB filter
};
// For text search:
search?: {
index: string; // Text search index name
text: {
path: string; // Field path for text
matchCriteria?: 'any' | 'all';
fuzzy?: {
maxEdits?: number; // Maximum edit distance (1-2)
prefixLength?: number; // Prefix length
maxExpansions?: number; // Maximum expansions
};
score?: object; // Score configuration
synonyms?: string; // Synonyms collection
};
};
// For hybrid search:
hybridSearch?: {
search: TextSearchOptions; // Text search configuration
vectorSearch: VectorSearchOptions; // Vector search configuration
combination?: {
weights?: {
vectorPipeline?: number; // Weight for vector results (0-1, default: 0.5)
fullTextPipeline?: number; // Weight for text results (0-1, default: 0.5)
};
};
scoreDetails?: boolean; // Include score details (default: false)
};
pipelines?: array; // Aggregation pipeline stages
}
// Create
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
document: object; // Document to create
}
// Read
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
id: string; // Document ID (24-character hex string)
}
// Update
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
id: string; // Document ID (24-character hex string)
document: object; // Update document (use MongoDB operators like $set)
}
// Delete
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
id: string; // Document ID (24-character hex string)
}
// Create
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
indexName: string; // Index name
definition: object; // Index definition
}
// List
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
}
// Drop
{
dbName: string; // Database name
dbOptions?: object; // Database options
collectionName: string; // Collection name
collectionOptions?: object; // Collection options
indexName: string; // Index name to drop
}
Retry options can be configured for individual components (indexer, retriever, crudTools, searchIndexTools):
{
retryAttempts?: number; // Number of retry attempts (default: 0)
baseDelay?: number; // Base delay in milliseconds (default: 1000)
jitterFactor?: number; // Jitter factor for exponential backoff (default: 0.1)
}
Each component can have its own retry configuration, allowing fine-grained control over retry behavior for different operations.
The plugin provides helper functions to generate tool references:
import {
mongoCrudToolsRefArray,
mongoSearchIndexToolsRefArray,
} from "genkitx-mongodb";
// Get all CRUD tool references for a connection
const crudTools = mongoCrudToolsRefArray("my-connection-id");
// Returns: ['mongodb/my-connection-id/create', 'mongodb/my-connection-id/read', ...]
// Get all search index tool references for a connection
const searchIndexTools = mongoSearchIndexToolsRefArray("my-connection-id");
// Returns: ['mongodb/my-connection-id/create', 'mongodb/my-connection-id/list', ...]
// Configure hybrid search with custom pipeline weights
const results = await ai.retrieve({
retriever: mongoRetrieverRef("retriever"),
query: "find documents about machine learning",
options: {
dbName: "myDatabase",
collectionName: "myCollection",
embedder: googleAI.embedder("text-embedding-004"),
hybridSearch: {
search: {
index: "content_search_index",
text: {
path: "content",
fuzzy: { maxEdits: 1, maxExpansions: 20 },
},
},
vectorSearch: {
index: "content_vector_index",
path: "embedding",
numCandidates: 50,
limit: 20,
},
combination: {
weights: {
vectorPipeline: 0.8, // Prioritize semantic similarity
fullTextPipeline: 0.2, // Lower weight for exact matches
},
},
scoreDetails: true, // Enable detailed scoring for analysis
},
pipelines: [{ $limit: 10 }, { $sort: { score: -1 } }],
},
});
// Configure different connections for different use cases
mongodb([
{
url: "mongodb://primary:27017",
indexer: {
id: "primary-indexer",
retry: { retryAttempts: 5, baseDelay: 2000 },
},
retriever: {
id: "primary-retriever",
retry: { retryAttempts: 3, baseDelay: 1000 },
},
},
{
url: "mongodb://analytics:27017",
indexer: {
id: "analytics-indexer",
retry: { retryAttempts: 10, baseDelay: 5000 },
},
retriever: {
id: "analytics-retriever",
retry: { retryAttempts: 2, baseDelay: 500 },
},
},
]);
// Use custom field names for different data types
await ai.index({
indexer: mongoIndexerRef("indexer"),
documents: imageDocuments,
options: {
dbName: "myDatabase",
collectionName: "images",
embedder: multimodalEmbedding001,
embeddingField: "imageEmbedding",
dataField: "imageData",
metadataField: "imageMetadata",
dataTypeField: "imageType",
skipData: false, // Store original image data
},
});
// Retrieve with custom field mapping
const results = await ai.retrieve({
retriever: mongoRetrieverRef("retriever"),
query: "find similar images",
options: {
dbName: "myDatabase",
collectionName: "images",
embedder: multimodalEmbedding001,
dataField: "imageData",
metadataField: "imageMetadata",
dataTypeField: "imageType",
vectorSearch: {
index: "image_vector_index",
path: "imageEmbedding",
numCandidates: 20,
limit: 5,
},
},
});
The test application provides comprehensive, working examples of all plugin features:
- Menu Understanding: Restaurant menu analysis with vector, text, and hybrid search
- Image Processing: Multimodal image indexing and similarity search
- Document Processing: PDF document processing with text chunking and image extraction
- CRUD Operations: Create, read, update, and delete documents by ID
- Search Index Management: Create, list, and drop search indexes
- Interactive UI: Use Genkit UI to test all features
- Sample Data: Pre-configured examples for each feature
- Complete Workflows: End-to-end demonstrations
- Environment Setup: Detailed configuration examples
- Code Examples: Real implementation patterns you can adapt
- Quick Start: Follow the testapp README for setup
- Interactive Testing: Use
pnpm run start
to launch the Genkit UI - Code Study: Examine the source code to understand integration patterns
- Customization: Adapt the examples for your own use cases
The testapp serves as both a demonstration and a reference implementation for the plugin.
The test application requires different environment variables depending on the features you want to use:
MONGODB_URL=mongodb://localhost:27017
MONGODB_DB_NAME=your_database_name
MONGODB_COLLECTION_NAME=your_collection_name
MONGODB_IMAGE_COLLECTION_NAME=your_image_collection
MONGODB_DOCUMENT_COLLECTION_NAME=your_document_collection
PROJECT_ID=your_google_cloud_project_id
LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json
GEMINI_API_KEY=your_gemini_api_key
- Menu Analysis: Index menu items and perform semantic, text, and hybrid search
- Image Search: Index images with descriptions and find similar images
- Document Processing: Process PDF documents with text extraction and image extraction
- Database Management: Perform CRUD operations on documents
- Search Index Management: Create and manage search indexes
Apache 2.0
This is an independent MongoDB plugin for Genkit. Please file issues and pull requests against this repository.
Usage information and reference details can be found in Genkit documentation.