Skip to content

sanyamj-081/Cricket-Rag-System

Repository files navigation

🏏 Cricket RAG System

A Retrieval-Augmented Generation (RAG) system that answers cricket questions using Wikipedia knowledge, vector embeddings, and LLMs.

📋 What is this?

This system allows you to ask any cricket-related question and get accurate answers backed by Wikipedia sources. It uses:

  • Vector Search (Qdrant) to find relevant information
  • BGE Embeddings to understand semantic meaning
  • Groq LLM (Llama 3.1) to generate natural language answers
  • Streamlit UI for easy interaction
  • FastAPI for REST API access

🚀 Quick Setup

Prerequisites

  • Python 3.10+
  • Docker (for Qdrant database)

Installation

1. Clone the repository

git clone https://github.com/sanyamj-081/cricket-rag-system.git
cd cricket-rag-system

2. Install dependencies

pip install -r requirements.txt

3. Set up environment variables

cp .env.example .env

Edit .env and add your Groq API key:

GROQ_API_KEY=your_groq_api_key_here

Get a free API key from console.groq.com/keys

4. Start Qdrant database

docker run -d -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant

5. Load data into Qdrant

python src/vector_storage.py

6. Run the application

Streamlit UI:

streamlit run app.py

Visit http://localhost:8502

FastAPI:

uvicorn api.main:app --reload --port 8000

Visit http://localhost:8000/docs


📁 Project Structure

cricket-rag-system/
├── app.py                  # Streamlit UI
├── api/main.py             # FastAPI REST API
├── config/settings.py      # Configuration
├── src/
│   ├── llm_layer.py        # RAG core logic
│   ├── vector_storage.py   # Qdrant data loader
│   ├── crawler/            # Wikipedia scraper
│   ├── processing/         # Data cleaning
│   ├── chunking/           # Text chunking
│   └── embedding/          # Embedding generation
├── data/                   # Generated data (not in git)
└── qdrant_storage/         # Vector database (not in git)

🔧 Configuration

Edit config/settings.py to change:

  • Embedding model (default: BAAI/bge-small-en-v1.5)
  • LLM model (default: llama-3.1-8b-instant)
  • Number of retrieved chunks (default: TOP_K = 3)
  • Temperature (default: 0.2)

🆘 Troubleshooting

"Collection 'cricket_chunks' doesn't exist"

python src/vector_storage.py

"Qdrant connection failed"

docker ps  # Check if Qdrant is running
docker run -d -p 6333:6333 qdrant/qdrant  # Start if not running

"GROQ_API_KEY not found"

  • Verify .env file exists in project root
  • Check API key is correct

👨‍💻 Author

Sanyam Jain


📄 License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages