🏏 Cricket RAG System

A Retrieval-Augmented Generation (RAG) system that answers cricket questions using Wikipedia knowledge, vector embeddings, and LLMs.

📋 What is this?

This system allows you to ask any cricket-related question and get accurate answers backed by Wikipedia sources. It uses:

Vector Search (Qdrant) to find relevant information
BGE Embeddings to understand semantic meaning
Groq LLM (Llama 3.1) to generate natural language answers
Streamlit UI for easy interaction
FastAPI for REST API access

🚀 Quick Setup

Prerequisites

Python 3.10+
Docker (for Qdrant database)

Installation

1. Clone the repository

git clone https://github.com/sanyamj-081/cricket-rag-system.git
cd cricket-rag-system

2. Install dependencies

pip install -r requirements.txt

3. Set up environment variables

cp .env.example .env

Edit .env and add your Groq API key:

GROQ_API_KEY=your_groq_api_key_here

Get a free API key from console.groq.com/keys

4. Start Qdrant database

docker run -d -p 6333:6333 -v qdrant_storage:/qdrant/storage qdrant/qdrant

5. Load data into Qdrant

python src/vector_storage.py

6. Run the application

Streamlit UI:

streamlit run app.py

Visit http://localhost:8502

FastAPI:

uvicorn api.main:app --reload --port 8000

Visit http://localhost:8000/docs

📁 Project Structure

cricket-rag-system/
├── app.py                  # Streamlit UI
├── api/main.py             # FastAPI REST API
├── config/settings.py      # Configuration
├── src/
│   ├── llm_layer.py        # RAG core logic
│   ├── vector_storage.py   # Qdrant data loader
│   ├── crawler/            # Wikipedia scraper
│   ├── processing/         # Data cleaning
│   ├── chunking/           # Text chunking
│   └── embedding/          # Embedding generation
├── data/                   # Generated data (not in git)
└── qdrant_storage/         # Vector database (not in git)

🔧 Configuration

Edit config/settings.py to change:

Embedding model (default: BAAI/bge-small-en-v1.5)
LLM model (default: llama-3.1-8b-instant)
Number of retrieved chunks (default: TOP_K = 3)
Temperature (default: 0.2)

🆘 Troubleshooting

"Collection 'cricket_chunks' doesn't exist"

python src/vector_storage.py

"Qdrant connection failed"

docker ps  # Check if Qdrant is running
docker run -d -p 6333:6333 qdrant/qdrant  # Start if not running

"GROQ_API_KEY not found"

Verify .env file exists in project root
Check API key is correct

👨‍💻 Author

Sanyam Jain

GitHub: @sanyamj-081
Email: sanyamj081@gmail.com

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.streamlit		.streamlit
api		api
config		config
data		data
qdrant_storage		qdrant_storage
src		src
unused		unused
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏏 Cricket RAG System

📋 What is this?

🚀 Quick Setup

Prerequisites

Installation

📁 Project Structure

🔧 Configuration

🆘 Troubleshooting

👨‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🏏 Cricket RAG System

📋 What is this?

🚀 Quick Setup

Prerequisites

Installation

📁 Project Structure

🔧 Configuration

🆘 Troubleshooting

👨‍💻 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages