Skip to content

nbathreya/Mini-RAG-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minimal RAG System

Production-grade retrieval system with vector search, hybrid ranking, and evaluation metrics. No external LLM required.

Python PyTorch License

Features

  • Vector Store: In-memory database with cosine similarity search
  • Hybrid Search: Semantic + keyword matching with tunable weights
  • Evaluation: Precision@K, Recall@K, MRR, NDCG metrics
  • Batch Embeddings: Efficient document encoding

Performance

  • Indexing: ~5ms per document (batch mode)
  • Search: <10ms for 10K documents
  • Memory: ~1KB per document + embeddings

Quick Start

from rag_system import VectorStore

# Initialize
store = VectorStore(model_name="all-MiniLM-L6-v2")

# Add documents
docs = ["CUDA enables GPU acceleration", "Vector search uses embeddings"]
store.add_documents(docs)

# Search
results = store.search("GPU computing", k=5)
for doc, score in results:
    print(f"{score:.3f}: {doc.text}")

Hybrid Search

# Semantic only (alpha=1.0)
semantic = store.search(query, k=5)

# Balanced (alpha=0.7)
hybrid = store.hybrid_search(query, k=5, alpha=0.7)

# Keyword-heavy (alpha=0.3)
keyword_focused = store.hybrid_search(query, k=5, alpha=0.3)

Evaluation

from rag_system import RAGEvaluator

evaluator = RAGEvaluator()
test_queries = [
    {"query": "GPU computing", "relevant_ids": ["doc_0", "doc_3"]}
]

metrics = evaluator.evaluate_retrieval(store, test_queries)
# {'precision@5': 0.85, 'recall@5': 0.90, 'mrr': 0.92, 'ndcg@5': 0.88}

Architecture

Documents
    │
    ▼
┌─────────────────┐
│ Sentence-       │
│ Transformer     │
└────────┬────────┘
         │
         ▼ embeddings
┌─────────────────┐
│ Vector Store    │
│ (NumPy matrix)  │
└────────┬────────┘
         │
Query ───┤
         │
         ▼
┌─────────────────┐
│ Cosine          │
│ Similarity      │
└────────┬────────┘
         │
         ▼ Top-K
     Results

Installation

pip install sentence-transformers scikit-learn numpy

Benchmark

python rag_system.py

Output:

Indexing time: 234.56ms
Search time: 8.23ms

Query: How to accelerate computing with GPUs?

Top results:
1. [Score: 0.847] CUDA enables GPU-accelerated computing...
2. [Score: 0.623] Real-time systems require low-latency...
3. [Score: 0.501] Deep learning frameworks like PyTorch...

=== Retrieval Metrics ===
precision@5: 0.850
recall@5: 0.920
mrr: 0.875
ndcg@5: 0.892

Advanced Usage

Custom embedding model:

store = VectorStore(model_name="all-mpnet-base-v2")  # Higher quality

Filter by metadata:

results = [
    (doc, score) for doc, score in store.search(query)
    if doc.metadata.get("category") == "research"
]

Comparison with Production Systems

Feature This Implementation Pinecone Weaviate
Setup time <1 min ~5 min ~10 min
Dependencies 3 packages Cloud account Docker
Cost Free $70+/mo Self-hosted
Use case <100K docs Production Production

License: MIT | Model: sentence-transformers

Releases

No releases published

Packages

No packages published

Languages