Production-grade retrieval system with vector search, hybrid ranking, and evaluation metrics. No external LLM required.
- Vector Store: In-memory database with cosine similarity search
- Hybrid Search: Semantic + keyword matching with tunable weights
- Evaluation: Precision@K, Recall@K, MRR, NDCG metrics
- Batch Embeddings: Efficient document encoding
- Indexing: ~5ms per document (batch mode)
- Search: <10ms for 10K documents
- Memory: ~1KB per document + embeddings
from rag_system import VectorStore
# Initialize
store = VectorStore(model_name="all-MiniLM-L6-v2")
# Add documents
docs = ["CUDA enables GPU acceleration", "Vector search uses embeddings"]
store.add_documents(docs)
# Search
results = store.search("GPU computing", k=5)
for doc, score in results:
print(f"{score:.3f}: {doc.text}")# Semantic only (alpha=1.0)
semantic = store.search(query, k=5)
# Balanced (alpha=0.7)
hybrid = store.hybrid_search(query, k=5, alpha=0.7)
# Keyword-heavy (alpha=0.3)
keyword_focused = store.hybrid_search(query, k=5, alpha=0.3)from rag_system import RAGEvaluator
evaluator = RAGEvaluator()
test_queries = [
{"query": "GPU computing", "relevant_ids": ["doc_0", "doc_3"]}
]
metrics = evaluator.evaluate_retrieval(store, test_queries)
# {'precision@5': 0.85, 'recall@5': 0.90, 'mrr': 0.92, 'ndcg@5': 0.88}Documents
│
▼
┌─────────────────┐
│ Sentence- │
│ Transformer │
└────────┬────────┘
│
▼ embeddings
┌─────────────────┐
│ Vector Store │
│ (NumPy matrix) │
└────────┬────────┘
│
Query ───┤
│
▼
┌─────────────────┐
│ Cosine │
│ Similarity │
└────────┬────────┘
│
▼ Top-K
Results
pip install sentence-transformers scikit-learn numpypython rag_system.pyOutput:
Indexing time: 234.56ms
Search time: 8.23ms
Query: How to accelerate computing with GPUs?
Top results:
1. [Score: 0.847] CUDA enables GPU-accelerated computing...
2. [Score: 0.623] Real-time systems require low-latency...
3. [Score: 0.501] Deep learning frameworks like PyTorch...
=== Retrieval Metrics ===
precision@5: 0.850
recall@5: 0.920
mrr: 0.875
ndcg@5: 0.892
Custom embedding model:
store = VectorStore(model_name="all-mpnet-base-v2") # Higher qualityFilter by metadata:
results = [
(doc, score) for doc, score in store.search(query)
if doc.metadata.get("category") == "research"
]| Feature | This Implementation | Pinecone | Weaviate |
|---|---|---|---|
| Setup time | <1 min | ~5 min | ~10 min |
| Dependencies | 3 packages | Cloud account | Docker |
| Cost | Free | $70+/mo | Self-hosted |
| Use case | <100K docs | Production | Production |
License: MIT | Model: sentence-transformers