A lightweight vector database implementation with Model Context Protocol (MCP) server support, designed for local LLM applications. PyVector works with minimal dependencies and provides fallback implementations when optional dependencies are not available.
- Fast similarity search using FAISS indexing (with NumPy fallback)
- Text embedding generation with sentence transformers (with hash-based fallback)
- HTTP server for easy integration with local applications
- MCP server integration for seamless LLM tool use (optional)
- Multiple index types (flat, IVF, HNSW) for different performance needs
- Persistent storage with save/load functionality
- Metadata support for rich document storage
- Minimal dependencies - works with just NumPy and Pydantic
# Activate your conda environment
conda activate pyvector
# Install minimal version (NumPy + Pydantic only)
pip install -e .
# Install with full features (recommended)
pip install -e .[full]
# Install specific features
pip install -e .[embeddings] # Add sentence-transformers
pip install -e .[faiss] # Add FAISS indexingStart PyVector as an MCP server for LLM integration:
# Start the MCP server
python start_mcp_server.py
# The script will display connection details like:
# Add this to your MCP client configuration:
# {
# "mcpServers": {
# "pyvector": {
# "command": "python",
# "args": ["start_mcp_server.py"],
# "cwd": "/path/to/pyvector"
# }
# }
# }from pyvector import VectorDatabase
# Create database (uses fallback implementations if needed)
db = VectorDatabase()
# Add some texts
db.add_text("The quick brown fox jumps over the lazy dog")
db.add_text("Machine learning is a subset of artificial intelligence")
db.add_text("Vector databases enable semantic search capabilities")
# Search for similar content
results = db.search_text("AI and machine learning", k=2)
for vector_id, score, metadata in results:
print(f"Score: {score:.4f} - {metadata['text']}")# Start MCP server (auto-detects capabilities)
python start_mcp_server.py
# Force HTTP server mode
python start_mcp_server.py --http
# Custom host/port
python start_mcp_server.py --http --host 0.0.0.0 --port 9000The startup script will display connection details and available endpoints.
from pyvector.simple_server import PyVectorHTTPServer
# Start HTTP server
server = PyVectorHTTPServer("localhost", 8080)
server.start()
# Server provides REST API endpoints:
# GET /health - Health check
# GET /info - Database information
# POST /create_database - Create new database
# POST /add_text - Add text to database
# POST /search_text - Search for similar texts
# POST /save_database - Save database to disk
# POST /load_database - Load database from diskPyVector gracefully handles missing optional dependencies:
- No FAISS: Uses pure NumPy implementation for vector indexing
- No sentence-transformers: Uses hash-based text embeddings
- Warnings displayed: Clear indication when fallbacks are used
This ensures PyVector works in minimal environments while providing better performance when full dependencies are available.
Run the included examples:
# Basic usage example
python examples/basic_usage.py
# HTTP server example (requires requests)
pip install requests
python examples/http_server_example.py
# Start MCP server with connection details
python start_mcp_server.py --verbose- flat - Exact search, best for small datasets (<10K vectors)
- ivf - Inverted file index, good balance of speed/accuracy (requires FAISS)
- hnsw - Hierarchical navigable small world, fastest approximate search (requires FAISS)
When sentence-transformers is available:
all-MiniLM-L6-v2- Default lightweight model (384 dimensions)all-mpnet-base-v2- Higher quality (768 dimensions)paraphrase-multilingual-MiniLM-L12-v2- Multilingual support
When using fallback: Hash-based embeddings (384 dimensions)
# Install development dependencies
pip install -e .[dev]
# Run tests
python test_basic.py
# Format code
black .
# Lint code
flake8 .- Core: Vector database and embedding generation with fallbacks
- Server: HTTP server for REST API access
- Simple Server: MCP server implementation (when MCP available)
- Utils: Validation and helper functions
- Storage: FAISS-based or NumPy-based indexing with metadata persistence
numpy>=1.21.0- Core numerical operationspydantic>=2.0.0- Data validation
faiss-cpu>=1.7.0- High-performance vector indexingsentence-transformers>=2.2.0- Quality text embeddingsmcp>=1.0.0- Model Context Protocol server support
GNU General Public License v3.0 - see LICENSE file for details.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.