MongoDB plugin for Genkit

A comprehensive MongoDB plugin for Genkit that provides vector search, text search, hybrid search, CRUD operations, and search index management capabilities.

Prerequisites

MongoDB 6.0+ with Atlas Search or local search indexes
Node.js 18+
Genkit framework

Note: The plugin itself only requires the above prerequisites. The Google Cloud Project and Google AI API access mentioned in the testapp examples are only needed for the multimodal processing examples (image and document flows) in the test application.

Project Structure

This repository contains two main components:

1. Plugin (`plugin/`)

The MongoDB plugin library that can be installed and used in your Genkit applications.

What it is:

A reusable library/plugin for Genkit
Can be built but not run directly
Designed to be imported and configured in applications
Provides MongoDB integration capabilities

How to use:

# Install the plugin in your project
pnpm add genkitx-mongodb

# Build the plugin (for development)
cd plugin
pnpm run build

2. Test Application (`testapp/`)

A comprehensive demonstration application that showcases all the plugin's capabilities.

What it is:

A complete Genkit application that uses the plugin
Demonstrates all features with working examples
Can be run and interacted with via Genkit UI
Includes sample data and workflows

How to use:

# Install dependencies
cd testapp
pnpm install

# Build the application
pnpm run build

# Run in development mode
pnpm run dev

# Start with Genkit UI
pnpm run start

Installing the plugin

pnpm add genkitx-mongodb

Features

Vector Search: Semantic search using embeddings with MongoDB's vector search capabilities
Text Search: Full-text search with fuzzy matching and synonyms support
Hybrid Search: Combine vector and text search using MongoDB's $rankFusion for enhanced results
CRUD Operations: Create, read, update, and delete documents by ID
Search Index Management: Create, list, and drop search indexes
Batch Indexing: Efficient document indexing with configurable batch sizes
Retry Logic: Built-in retry mechanisms with configurable attempts, delays, and jitter
Flexible Field Configuration: Customizable field names for data, metadata, and embeddings
Multiple Connection Support: Configure multiple MongoDB connections with different settings
Multimodal Support: Process images and documents with multimodal embeddings
Pipeline Support: Custom aggregation pipelines for advanced querying

Getting Started

For Plugin Users (Most Common)

If you want to use the MongoDB plugin in your own Genkit application:

Install the plugin:

pnpm add genkitx-mongodb

Follow the usage examples below to integrate it into your Genkit application
Optional: Check out the testapp for comprehensive examples and workflows

For Developers/Contributors

If you want to explore, test, or contribute to the plugin:

Clone the repository:

git clone <repository-url>
cd genkitx-mongodb

Build the plugin:

cd plugin
pnpm install
pnpm run build

Run the test application:

cd testapp
pnpm install
pnpm run build
pnpm run start

Explore the examples in the testapp to see all features in action

For Learning/Testing

If you want to understand how the plugin works and test its capabilities:

Start with the testapp - it's a complete working example
Follow the testapp README for detailed setup and usage instructions
Use the Genkit UI to interact with all the features
Examine the code to understand how to integrate the plugin in your own applications

Using the plugin

Basic Setup

import { genkit } from "genkit";
import { mongodb } from "genkitx-mongodb";
import { googleAI } from "@genkit-ai/googleai";

const ai = genkit({
  plugins: [
    mongodb([
      {
        url: "mongodb://localhost:27017",
        mongoClientOptions: {
          // Optional MongoDB client options
        },
        indexer: {
          id: "indexer",
          retry: {
            retryAttempts: 3,
            baseDelay: 1000,
            jitterFactor: 0.1,
          },
        },
        retriever: {
          id: "retriever",
          retry: {
            retryAttempts: 2,
            baseDelay: 500,
          },
        },
        crudTools: {
          id: "crud",
        },
        searchIndexTools: {
          id: "search-index",
        },
      },
    ]),
  ],
});

Multiple Connections

You can configure multiple MongoDB connections with different settings:

mongodb([
  {
    url: "mongodb://primary:27017",
    indexer: {
      id: "primary-indexer",
      retry: {
        retryAttempts: 3,
        baseDelay: 1000,
      },
    },
    retriever: {
      id: "primary-retriever",
      retry: {
        retryAttempts: 2,
        baseDelay: 500,
      },
    },
    crudTools: { id: "primary-crud" },
    searchIndexTools: { id: "primary-search" },
  },
  {
    url: "mongodb://secondary:27017",
    indexer: {
      id: "secondary-indexer",
      retry: {
        retryAttempts: 5,
        baseDelay: 2000,
        jitterFactor: 0.2,
      },
    },
    retriever: { id: "secondary-retriever" },
    crudTools: { id: "secondary-crud" },
    searchIndexTools: { id: "secondary-search" },
  },
]);

Indexing Documents

import { Document } from "genkit";
import { mongoIndexerRef } from "genkitx-mongodb";

const documents = [
  Document.fromText("Sample document content", { id: "1", category: "sample" }),
  Document.fromText("Another document", { id: "2", category: "example" }),
];

await ai.index({
  indexer: mongoIndexerRef("indexer"),
  documents,
  options: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    embedder: googleAI.embedder("text-embedding-004"),
    embeddingField: "embedding",
    batchSize: 100,
    skipData: false, // Optional: Set to true to exclude original data from storage
    dataField: "data", // Optional: Custom field name for document data
    metadataField: "metadata", // Optional: Custom field name for metadata
    dataTypeField: "dataType", // Optional: Custom field name for data type
  },
});

Vector Search

import { mongoRetrieverRef } from "genkitx-mongodb";

const results = await ai.retrieve({
  retriever: mongoRetrieverRef("retriever"),
  query: "search query",
  options: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    embedder: googleAI.embedder("text-embedding-004"),
    vectorSearch: {
      index: "embedding_index",
      path: "embedding",
      exact: false,
      numCandidates: 100,
      limit: 10,
      filter: { category: "sample" },
    },
  },
});

Text Search

const results = await ai.retrieve({
  retriever: mongoRetrieverRef("retriever"),
  query: "search query",
  options: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    search: {
      index: "text_index",
      text: {
        path: "content",
        matchCriteria: "any",
        fuzzy: {
          maxEdits: 2,
          prefixLength: 0,
          maxExpansions: 50,
        },
      },
    },
    pipelines: [{ $limit: 10 }, { $sort: { score: -1 } }],
  },
});

Hybrid Search

The plugin supports hybrid search using MongoDB's $rankFusion aggregation, which combines vector and text search results for enhanced retrieval:

const results = await ai.retrieve({
  retriever: mongoRetrieverRef("retriever"),
  query: "search query",
  options: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    embedder: googleAI.embedder("text-embedding-004"),
    hybridSearch: {
      search: {
        index: "text_index",
        text: {
          path: "content",
          fuzzy: {
            maxEdits: 2,
            prefixLength: 0,
            maxExpansions: 50,
          },
        },
      },
      vectorSearch: {
        index: "embedding_index",
        path: "embedding",
        exact: false,
        numCandidates: 100,
        limit: 10,
      },
      combination: {
        weights: {
          vectorPipeline: 0.7, // Weight for vector search results
          fullTextPipeline: 0.3, // Weight for text search results
        },
      },
      scoreDetails: true, // Include detailed scoring information
    },
  },
});

Hybrid Search Configuration

The hybrid search combines the strengths of both vector and text search:

Vector Pipeline: Uses semantic similarity for finding conceptually related content
Text Pipeline: Uses exact text matching with fuzzy search capabilities
Rank Fusion: Combines results using configurable weights and scoring
Score Details: Optional detailed scoring information for debugging

Hybrid Search Options

{
  search: TextSearchOptions,           // Text search configuration
  vectorSearch: VectorSearchOptions,   // Vector search configuration
  combination?: {
    weights?: {
      vectorPipeline?: number,         // Weight for vector results (0-1, default: 0.5)
      fullTextPipeline?: number,       // Weight for text results (0-1, default: 0.5)
    },
  },
  scoreDetails?: boolean,              // Include score details (default: false)
}

CRUD Operations by Document ID

The plugin provides tools for basic CRUD operations by document ID:

// Create a document
await ai.runTool({
  name: "mongodb/crud/create",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    document: { name: "John", age: 30 },
  },
});

// Read a document by ID
const result = await ai.runTool({
  name: "mongodb/crud/read",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    id: "507f1f77bcf86cd799439011",
  },
});

// Update a document by ID
await ai.runTool({
  name: "mongodb/crud/update",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    id: "507f1f77bcf86cd799439011",
    document: { age: 31 },
  },
});

// Delete a document by ID
await ai.runTool({
  name: "mongodb/crud/delete",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    id: "507f1f77bcf86cd799439011",
  },
});

Search Index Management

// Create a search index
await ai.runTool({
  name: "mongodb/search-index/create",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    indexName: "text_index",
    definition: {
      mappings: {
        dynamic: true,
        fields: {
          content: {
            type: "string",
            analyzer: "lucene.english",
          },
        },
      },
    },
  },
});

// List search indexes
const indexes = await ai.runTool({
  name: "mongodb/search-index/list",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
  },
});

// Drop a search index
await ai.runTool({
  name: "mongodb/search-index/drop",
  input: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    indexName: "text_index",
  },
});

Multimodal Document Processing

The plugin supports multimodal embeddings for processing images and documents:

import { multimodalEmbedding001 } from "@genkit-ai/vertexai";

// Index images with multimodal embeddings
await ai.index({
  indexer: mongoIndexerRef("indexer"),
  documents: imageDocuments,
  options: {
    dbName: "myDatabase",
    collectionName: "imageCollection",
    embedder: multimodalEmbedding001,
    embeddingField: "imageEmbedding",
    dataField: "imageData",
    metadataField: "imageMetadata",
    dataTypeField: "imageType",
  },
});

// Retrieve similar images
const results = await ai.retrieve({
  retriever: mongoRetrieverRef("retriever"),
  query: "find images similar to a cat",
  options: {
    dbName: "myDatabase",
    collectionName: "imageCollection",
    embedder: multimodalEmbedding001,
    vectorSearch: {
      index: "image_embedding_index",
      path: "imageEmbedding",
      numCandidates: 50,
      limit: 5,
    },
  },
});

Configuration Options

Connection Configuration

{
  url: string;                   // MongoDB connection string
  mongoClientOptions?: object;   // MongoDB client options
  indexer?: BaseDefinition;      // Indexer configuration
  retriever?: BaseDefinition;    // Retriever configuration
  crudTools?: BaseDefinition;    // CRUD tools configuration
  searchIndexTools?: BaseDefinition; // Search index tools configuration
}

Base Definition Configuration

Each component (indexer, retriever, crudTools, searchIndexTools) uses a base definition:

{
  id: string;                    // Unique identifier for the component
  retry?: RetryOptions;          // Optional retry options for this component
}

Indexer Options

{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  embedder: EmbedderArgument;    // Embedder for generating vectors
  embedderOptions?: object;      // Optional embedder-specific options
  embeddingField?: string;       // Field name for embeddings (default: 'embedding')
  batchSize?: number;            // Batch size for indexing (default: 100)
  skipData?: boolean;            // Optional: Skip storing original data (default: false)
  dataField?: string;            // Field name for data (default: 'data')
  metadataField?: string;        // Field name for metadata (default: 'metadata')
  dataTypeField?: string;        // Field name for data type (default: 'dataType')
}

Retriever Options

{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  // For vector search:
  embedder?: EmbedderArgument;   // Embedder for query vectorization
  embedderOptions?: object;      // Optional embedder-specific options
  vectorSearch?: {
    index: string;               // Vector search index name
    path: string;                // Field path for vectors
    exact?: boolean;             // Use exact search
    numCandidates?: number;      // Number of candidates (max: 10000)
    limit?: number;              // Result limit
    filter?: object;             // MongoDB filter
  };
  // For text search:
  search?: {
    index: string;               // Text search index name
    text: {
      path: string;              // Field path for text
      matchCriteria?: 'any' | 'all';
      fuzzy?: {
        maxEdits?: number;       // Maximum edit distance (1-2)
        prefixLength?: number;   // Prefix length
        maxExpansions?: number;  // Maximum expansions
      };
      score?: object;            // Score configuration
      synonyms?: string;         // Synonyms collection
    };
  };
  // For hybrid search:
  hybridSearch?: {
    search: TextSearchOptions;   // Text search configuration
    vectorSearch: VectorSearchOptions; // Vector search configuration
    combination?: {
      weights?: {
        vectorPipeline?: number; // Weight for vector results (0-1, default: 0.5)
        fullTextPipeline?: number; // Weight for text results (0-1, default: 0.5)
      };
    };
    scoreDetails?: boolean;      // Include score details (default: false)
  };
  pipelines?: array;             // Aggregation pipeline stages
}

CRUD Tool Options

// Create
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  document: object;              // Document to create
}

// Read
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  id: string;                    // Document ID (24-character hex string)
}

// Update
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  id: string;                    // Document ID (24-character hex string)
  document: object;              // Update document (use MongoDB operators like $set)
}

// Delete
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  id: string;                    // Document ID (24-character hex string)
}

Search Index Tool Options

// Create
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  indexName: string;             // Index name
  definition: object;            // Index definition
}

// List
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
}

// Drop
{
  dbName: string;                // Database name
  dbOptions?: object;            // Database options
  collectionName: string;        // Collection name
  collectionOptions?: object;    // Collection options
  indexName: string;             // Index name to drop
}

Retry Options

Retry options can be configured for individual components (indexer, retriever, crudTools, searchIndexTools):

{
  retryAttempts?: number;        // Number of retry attempts (default: 0)
  baseDelay?: number;            // Base delay in milliseconds (default: 1000)
  jitterFactor?: number;         // Jitter factor for exponential backoff (default: 0.1)
}

Each component can have its own retry configuration, allowing fine-grained control over retry behavior for different operations.

Tool References

The plugin provides helper functions to generate tool references:

import {
  mongoCrudToolsRefArray,
  mongoSearchIndexToolsRefArray,
} from "genkitx-mongodb";

// Get all CRUD tool references for a connection
const crudTools = mongoCrudToolsRefArray("my-connection-id");
// Returns: ['mongodb/my-connection-id/create', 'mongodb/my-connection-id/read', ...]

// Get all search index tool references for a connection
const searchIndexTools = mongoSearchIndexToolsRefArray("my-connection-id");
// Returns: ['mongodb/my-connection-id/create', 'mongodb/my-connection-id/list', ...]

Advanced Usage Examples

Hybrid Search with Custom Weights

// Configure hybrid search with custom pipeline weights
const results = await ai.retrieve({
  retriever: mongoRetrieverRef("retriever"),
  query: "find documents about machine learning",
  options: {
    dbName: "myDatabase",
    collectionName: "myCollection",
    embedder: googleAI.embedder("text-embedding-004"),
    hybridSearch: {
      search: {
        index: "content_search_index",
        text: {
          path: "content",
          fuzzy: { maxEdits: 1, maxExpansions: 20 },
        },
      },
      vectorSearch: {
        index: "content_vector_index",
        path: "embedding",
        numCandidates: 50,
        limit: 20,
      },
      combination: {
        weights: {
          vectorPipeline: 0.8, // Prioritize semantic similarity
          fullTextPipeline: 0.2, // Lower weight for exact matches
        },
      },
      scoreDetails: true, // Enable detailed scoring for analysis
    },
    pipelines: [{ $limit: 10 }, { $sort: { score: -1 } }],
  },
});

Multiple Connection Strategy

// Configure different connections for different use cases
mongodb([
  {
    url: "mongodb://primary:27017",
    indexer: {
      id: "primary-indexer",
      retry: { retryAttempts: 5, baseDelay: 2000 },
    },
    retriever: {
      id: "primary-retriever",
      retry: { retryAttempts: 3, baseDelay: 1000 },
    },
  },
  {
    url: "mongodb://analytics:27017",
    indexer: {
      id: "analytics-indexer",
      retry: { retryAttempts: 10, baseDelay: 5000 },
    },
    retriever: {
      id: "analytics-retriever",
      retry: { retryAttempts: 2, baseDelay: 500 },
    },
  },
]);

Custom Field Configuration

// Use custom field names for different data types
await ai.index({
  indexer: mongoIndexerRef("indexer"),
  documents: imageDocuments,
  options: {
    dbName: "myDatabase",
    collectionName: "images",
    embedder: multimodalEmbedding001,
    embeddingField: "imageEmbedding",
    dataField: "imageData",
    metadataField: "imageMetadata",
    dataTypeField: "imageType",
    skipData: false, // Store original image data
  },
});

// Retrieve with custom field mapping
const results = await ai.retrieve({
  retriever: mongoRetrieverRef("retriever"),
  query: "find similar images",
  options: {
    dbName: "myDatabase",
    collectionName: "images",
    embedder: multimodalEmbedding001,
    dataField: "imageData",
    metadataField: "imageMetadata",
    dataTypeField: "imageType",
    vectorSearch: {
      index: "image_vector_index",
      path: "imageEmbedding",
      numCandidates: 20,
      limit: 5,
    },
  },
});

Examples and Test Application

Complete Working Examples

The test application provides comprehensive, working examples of all plugin features:

Core Features Demonstrated

Menu Understanding: Restaurant menu analysis with vector, text, and hybrid search
Image Processing: Multimodal image indexing and similarity search
Document Processing: PDF document processing with text chunking and image extraction
CRUD Operations: Create, read, update, and delete documents by ID
Search Index Management: Create, list, and drop search indexes

What the Test App Provides

Interactive UI: Use Genkit UI to test all features
Sample Data: Pre-configured examples for each feature
Complete Workflows: End-to-end demonstrations
Environment Setup: Detailed configuration examples
Code Examples: Real implementation patterns you can adapt

Getting Started with Examples

Quick Start: Follow the testapp README for setup
Interactive Testing: Use pnpm run start to launch the Genkit UI
Code Study: Examine the source code to understand integration patterns
Customization: Adapt the examples for your own use cases

The testapp serves as both a demonstration and a reference implementation for the plugin.

Environment Variables

The test application requires different environment variables depending on the features you want to use:

Required for all features:

MONGODB_URL=mongodb://localhost:27017
MONGODB_DB_NAME=your_database_name
MONGODB_COLLECTION_NAME=your_collection_name
MONGODB_IMAGE_COLLECTION_NAME=your_image_collection
MONGODB_DOCUMENT_COLLECTION_NAME=your_document_collection

Required for image and document flows (multimodal processing):

PROJECT_ID=your_google_cloud_project_id
LOCATION=us-central1
GOOGLE_APPLICATION_CREDENTIALS=path/to/your/service-account-key.json

Required for text flows (menu understanding):

GEMINI_API_KEY=your_gemini_api_key

Example Workflows

Menu Analysis: Index menu items and perform semantic, text, and hybrid search
Image Search: Index images with descriptions and find similar images
Document Processing: Process PDF documents with text extraction and image extraction
Database Management: Perform CRUD operations on documents
Search Index Management: Create and manage search indexes

License

Apache 2.0

This is an independent MongoDB plugin for Genkit. Please file issues and pull requests against this repository.

Usage information and reference details can be found in Genkit documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
plugin		plugin
scripts		scripts
testapp		testapp
.gitignore		.gitignore
GENKIT_USAGE.md		GENKIT_USAGE.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
typedoc.json		typedoc.json

mongodb-partners/genkitx-mongodb

Folders and files

Latest commit

History

Repository files navigation