A document-based question answering system using vector embeddings and retrieval-augmented generation(RAG) powered by OpenAI and ChromaDB.
- π Document ingestion from directory
- βοΈ Smart text chunking with overlap
- π Semantic search with ChromaDB
- π€ AI-powered answers using GPT-3.5
- πΎ Persistent vector storage
- π§ Configurable chunking parameters
- Python 3.8+
- OpenAI API key
- Clone the repository:
git clone https://github.com/yourusername/AskDocs_AI.git cd AskDocs_AI - Set up virtual environment:
python -m venv venv source venv/bin/activate # Linux/Mac venv\Scripts\activate # Windows
- Install dependencies:
pip install -r requirements.txt
- Create .env file:
echo "OPENAI_API_KEY=your_api_key_here" > .env
- Add your text documents to ./news_articles directory
- Run the system:
python main.py
- Enter your questions when prompted
Modify these in main.py:
# Document processing
CHUNK_SIZE = 1000 # Characters per chunk
CHUNK_OVERLAP = 20 # Overlap between chunks
# Query settings
N_RESULTS = 2 # Number of chunks to retrieve
# Model settings
EMBEDDING_MODEL = "text-embedding-3-small"
LLM_MODEL = "gpt-3.5-turbo"