Local-first RAG platform — Fast, private, no cloud required.
Get Started · Configuration · Develop · Community · Contributing
Table of Contents
RAGEve is a local-first RAG (Retrieval-Augmented Generation) platform built for developers and teams who want the power of RAG workflows without depending on external cloud services.
It combines Ollama for local LLM inference and embeddings, Qdrant as a high-performance vector database, and FastAPI + Next.js for a full-featured web interface. Everything runs on your own machine — no API keys, no data leaves your network.
Backend Architecture:
- FastAPI for high-performance async APIs
- Peewee ORM with a 27-table schema for persistent storage
- MySQL (or SQLite for single-node) via Peewee + connection pooling (900 connections)
- SQLAlchemy (temporary) for legacy chat history storage during migration
- Qdrant for vector search with hybrid retrieval (dense + sparse)
- Ollama for embeddings and LLM inference
RAGEve is designed for two audiences:
| User | Experience |
|---|---|
| Non-technical users | git clone && ./scripts/run.sh — everything starts automatically |
| Developers | ./scripts/backend.sh or manual uvicorn / npm run dev for full control |
Start RAGEve locally and open http://localhost:3000:
git clone https://github.com/bazzi24/RAGEve.git
cd RAGEve
./scripts/run.shTip: On first run,
install.shautomatically installsuv, Ollama, pulls the required models (~8 GB), and starts Docker services. This takes about 5–10 minutes once, then subsequent starts are instant.
- 2026-04-27 Peewee migration complete — 27-table schema, new
/dialogsand/knowledgebasesAPIs, transitional support for legacy routes - 2026-04-22 Enhanced PDF parsing — column detection, structured table extraction, hierarchical chunking, reading order optimization
- 2026-04-03 Evaluation matrix (16-cell benchmark) + Qdrant hybrid search fix
- 2026-04-01 9 production fixes: structured 500 handler, health checks, rate limiter proxy safety, request timeouts, streaming 404 fix, file upload limits, paginated datasets API
- 2026-04-01 Chat history with MySQL/SQLite, session panel, per-agent conversations
- 2026-03-28 Background HF dataset ingest with live progress tracking
- 2026-03-26 Real-time streaming upload with per-batch progress stages
- 2026-03-26 Cross-encoder reranking (sentence-transformers)
- 2026-03-25 E2E test suite, conversation persistence
- Ingest PDFs, Word docs, Excel, CSV, images, and more
- Enhanced PDF parsing: column detection, structured table extraction (markdown), heading hierarchy, reading order optimization
- Adaptive chunking with quality scoring per profile (clean text, OCR noisy, table-heavy, code)
- Intelligent text column selection for multi-column datasets
- Hierarchical chunking preserves section context for better semantic search
- Exact chunk references from source documents
- Quality scores exposed to the LLM via enriched context
- Session history-aware chat with up to 6 prior turns in context
- Dense vector search via Ollama embeddings
- Sparse keyword search
- Hybrid fusion combining both with configurable weights
- Cross-encoder reranking for improved precision
- Any Ollama model as the chat backend
- Any Ollama embedding model
- Configurable temperature, top-k, top-p, and context window size per dialog (agent)
- Browse, preview, and search HuggingFace datasets directly from the UI
- Download datasets to local storage
- Background ingest with real-time progress
- Multi-config and multi-split support
- Sessions stored in MySQL via Peewee ORM (or SQLite for single-node)
- Full conversation history per dialog (agent)
- Thumbs up/down feedback on individual messages
- Conversation context automatically injected into subsequent turns
- Request ID tracing and structured error responses
- CORS and API key authentication
- Circuit breaker and retry logic for Ollama calls
- Dependency health checks (
/healthpings Ollama and Qdrant)
scripts/run.sh— everything in one commandscripts/backend.sh— backend only for technical users- Docker Compose for infrastructure (Qdrant + MySQL)
- Full E2E and stress test suites
| Requirement | Version | Notes |
|---|---|---|
| Docker | >= 24.0.0 | Install Docker |
| Docker Compose | >= v2.26.1 | Usually bundled with Docker Desktop |
| macOS / Linux / WSL2 | — | Windows native not supported; use WSL2 |
| Disk | >= 50 GB | For models (~8 GB) and data |
| RAM | >= 16 GB | Recommended; CPU fallback is slower |
Windows: Enable WSL2 and run all commands from inside the WSL shell. Do not run scripts from PowerShell or CMD.
One command for everything — auto-installs if needed:
git clone https://github.com/bazzi24/RAGEve.git
cd RAGEve
./scripts/run.shThe first run will:
- Install
uv(Python package manager) - Install Ollama and pull models (
nomic-embed-text+llama3.2) - Start Docker containers (Qdrant + MySQL)
- Start the FastAPI backend and Next.js frontend
Open http://localhost:3000 when you see:
██████╗ █████╗ ██████╗ ███████╗██╗ ██╗███████╗
██╔══██╗██╔══██╗██╔════╝ ██╔════╝██║ ██║██╔════╝
██████╔╝███████║██║ ███╗█████╗ ██║ ██║█████╗
██╔══██╗██╔══██║██║ ██║██╔══╝ ╚██╗ ██╔╝██╔══╝
██║ ██║██║ ██║╚██████╔╝███████╗ ╚████╔╝ ███████╗
╚═╝ ╚═╝╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═══╝ ╚══════╝
AI-powered RAG platform — Ollama · Qdrant · FastAPI · Next.js
https://github.com/bazzi24/RAGEve
[*] Starting FastAPI backend...
[*] Starting Next.js frontend...
[✓] RAGEve is running!
Frontend http://localhost:3000
Backend http://localhost:8000
API docs http://localhost:8000/docs
Press Ctrl+C to stop all services cleanly.
RAGEve uses environment variables for configuration. Copy the example and customize:
# From the project root:
cp docker/.env.example .env # Recommended for Docker deployments
# OR
cp .env.example .env # If .env.example exists (legacy location)For developers who want full control over startup and debugging.
# 1. Start infrastructure
docker compose -f docker/docker-compose.yml up -d qdrant mysql
# 2. Start Ollama (keep running in a terminal)
ollama serve
# 3. Pull required models (first time only)
ollama pull nomic-embed-text
ollama pull llama3.2:latest
# 4. Install Python dependencies
uv sync
# 5. Install frontend dependencies
cd frontend && npm install && cd ..
# 6. Start FastAPI backend (port 8000)
# Do NOT use --reload — it crashes in-flight uploads
uv run uvicorn backend.main:app --host 0.0.0.0 --port 8000
# 7. Start Next.js frontend (port 3000) — in another terminal
cd frontend && npm run devOpen:
- Frontend: http://localhost:3000
- API Docs: http://localhost:8000/docs
- Qdrant Dashboard: http://localhost:6333/dashboard
For developers who run the frontend manually (e.g. in an IDE with hot reload):
./scripts/backend.shStarts: Docker (Qdrant + MySQL) → Ollama → FastAPI. No frontend.
- Bug Reports — report issues with clear reproduction steps
- Feature Requests — open a discussion or issue
- Contributing — see below
RAGEve grows through open-source collaboration. Contributions of all kinds are welcome — bug fixes, features, docs, tests, and feedback.
Before contributing:
- Fork the repository and create a feature branch from
main - Make your changes — all code must pass
bash -n scripts/*.sh(shell scripts) andcd frontend && npx tsc --noEmit(TypeScript) - Run the E2E test suite:
uv run python test/_test_e2e.py - Submit a pull request with a clear description of what changed and why
Development setup:
git clone https://github.com/bazzi24/RAGEve.git
cd RAGEve
cp .env.example .env # optional: fill in HF_TOKEN, API_KEY, etc.
./scripts/install.sh # one-time setup
./scripts/backend.sh # backend only for iterative developmentBuilt with ❤️ for local-first AI — RAGEve

