A Retrieval-Augmented Generation (RAG) system that uses OpenAI embeddings and ChromaDB for document question-answering.
- Document processing and chunking
- OpenAI embeddings for semantic search
- ChromaDB for vector storage
- Caching system for embeddings
- Question answering with context
- Clone the repository:
git clone <your-repo-url>
cd <repo-name>- Create and activate virtual environment:
python -m venv venv
source venv/bin/activate # On macOS/Linux- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
-
Place your text documents in the
news_articlesdirectory -
Run the application:
python app.pyapp.py: Main application coderequirements.txt: Python dependenciesnews_articles/: Directory for text documentschroma_db/: ChromaDB storage (gitignored)embedding_cache.json: Embedding cache (gitignored)
- Add your text documents to the
news_articlesdirectory - Run the application
- The system will:
- Process and chunk documents
- Generate embeddings
- Store in ChromaDB
- Allow question answering
- Python 3.10+
- OpenAI API
- ChromaDB
- Other requirements in requirements.txt