This repository demonstrates a hybrid RAG (Retrieval-Augmented Generation) setup using local Ollama embeddings and Azure OpenAI for chat.
This is a RAG (Retrieval-Augmented Generation) application that scans files and folders, builds a FAISS index of document chunks using vector embeddings, and allows fast semantic search and chat over large local document collections. The index uses approximate nearest neighbors (ANN) for efficient similarity search. The typical workflow is:
- Index: load documents, split them into chunks, compute vector embeddings, and store them in a FAISS index.
- Retrieve: given a user question, retrieve the most relevant chunks using ANN similarity search.
- Generate: pass retrieved context to a chat model to produce a concise answer, optionally returning source file paths.
This project uses nomic-embed-text (via the local Ollama embeddings) to compute vector representations and Azure OpenAI (chat) to generate responses.
- Build a FAISS index from a folder of documents for fast similarity search (ANN).
- Chat with files to quickly extract information from large folders.
- Return source file paths for traceability.
- Language: Python
- Vector store: FAISS (local)
- Embeddings: nomic-embed-text (local via Ollama)
- Chat / response generation: Azure OpenAI (chat deployment)
- Orchestration / helper libraries: LangChain and related extensions
- API: FastAPI (lightweight server example)
- UI: Streamlit (simple demo)
- Copy .env.example to .env and fill in your Azure credentials. Do NOT commit .env.
- Install dependencies: pip install -r requirements.txt
- Ensure Ollama is running locally if you use local embeddings.
- Build the index: python indexing/build_index.py
- Run the API: uvicorn core.api_server:app --reload
- Run the streamlit demo: streamlit run app/streamlit_app.py
- This repo uses environment variables for secrets.
- FAISS index and documents are ignored by .gitignore.