A Retrieval-Augmented Generation (RAG) system that answers cricket questions using Wikipedia knowledge, vector embeddings, and LLMs.
This system allows you to ask any cricket-related question and get accurate answers backed by Wikipedia sources. It uses:
- Vector Search (Qdrant) to find relevant information
- BGE Embeddings to understand semantic meaning
- Groq LLM (Llama 3.1) to generate natural language answers
- Streamlit UI for easy interaction
- FastAPI for REST API access
- Python 3.10+
- Docker (for Qdrant database)
1. Clone the repository
git clone https://github.com/sanyamj-081/cricket-rag-system.git
cd cricket-rag-system2. Install dependencies
pip install -r requirements.txt3. Set up environment variables
cp .env.example .envEdit .env and add your Groq API key:
GROQ_API_KEY=your_groq_api_key_hereGet a free API key from console.groq.com/keys
4. Start Qdrant database
docker run -d -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant5. Load data into Qdrant
python src/vector_storage.py6. Run the application
Streamlit UI:
streamlit run app.pyVisit http://localhost:8502
FastAPI:
uvicorn api.main:app --reload --port 8000Visit http://localhost:8000/docs
cricket-rag-system/
├── app.py # Streamlit UI
├── api/main.py # FastAPI REST API
├── config/settings.py # Configuration
├── src/
│ ├── llm_layer.py # RAG core logic
│ ├── vector_storage.py # Qdrant data loader
│ ├── crawler/ # Wikipedia scraper
│ ├── processing/ # Data cleaning
│ ├── chunking/ # Text chunking
│ └── embedding/ # Embedding generation
├── data/ # Generated data (not in git)
└── qdrant_storage/ # Vector database (not in git)
Edit config/settings.py to change:
- Embedding model (default:
BAAI/bge-small-en-v1.5) - LLM model (default:
llama-3.1-8b-instant) - Number of retrieved chunks (default:
TOP_K = 3) - Temperature (default:
0.2)
"Collection 'cricket_chunks' doesn't exist"
python src/vector_storage.py"Qdrant connection failed"
docker ps # Check if Qdrant is running
docker run -d -p 6333:6333 qdrant/qdrant # Start if not running"GROQ_API_KEY not found"
- Verify
.envfile exists in project root - Check API key is correct
Sanyam Jain
- GitHub: @sanyamj-081
- Email: sanyamj081@gmail.com
MIT License