Enterprise RAG System

Backend-only Retrieval-Augmented Generation system. Answers questions strictly from ingested documents. Does not hallucinate.

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              FastAPI                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                  │
│  │ POST /ingest │  │ POST /query  │  │ GET /health  │                  │
│  └──────┬───────┘  └──────┬───────┘  └──────────────┘                  │
└─────────┼─────────────────┼─────────────────────────────────────────────┘
          │                 │
          ▼                 ▼
┌─────────────────┐  ┌─────────────────────────────────────────────────┐
│ IngestionService│  │              RetrieverService                    │
│                 │  │  ┌─────────┐  ┌─────────┐  ┌─────────┐          │
│ ┌─────────────┐ │  │  │Embedding│─▶│ FAISS   │─▶│  LLM    │          │
│ │DocumentLoader│ │  │  │ Service │  │ Search  │  │ Service │          │
│ ├─────────────┤ │  │  └─────────┘  └─────────┘  └─────────┘          │
│ │  Chunker    │ │  └─────────────────────────────────────────────────┘
│ └─────────────┘ │
└─────────────────┘
          │                 │
          ▼                 ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         VectorStoreService                               │
│                      (FAISS + JSON metadata)                             │
│                       Persisted to ./data/                               │
└─────────────────────────────────────────────────────────────────────────┘

RAG Flow

Ingestion (`POST /ingest`)

Document (PDF/DOCX/MD)
        │
        ▼
┌───────────────┐
│ DocumentLoader│  Extract text with metadata (page, section)
└───────┬───────┘
        │
        ▼
┌───────────────┐
│SemanticChunker│  Split into chunks (512 tokens, 50 overlap)
└───────┬───────┘
        │
        ▼
┌───────────────┐
│EmbeddingService│  Generate embeddings (text-embedding-3-small)
└───────┬───────┘
        │
        ▼
┌───────────────┐
│ VectorStore   │  Store in FAISS + persist to disk
└───────────────┘

Query (`POST /query`)

Question
    │
    ▼
┌──────────────────┐
│ Embed Question   │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ FAISS Similarity │  top_k=5, threshold=0.7
│     Search       │
└────────┬─────────┘
         │
         ├── NO RESULTS ──▶ Return: "Answer not found in documents."
         │                  confidence=0.0, sources=[]
         │
         ▼ (has results)
┌──────────────────┐
│   LLM Generate   │  System prompt: use ONLY context
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ Return answer +  │  Mandatory citations in sources[]
│    citations     │
└──────────────────┘

Hallucination Prevention

The system prevents hallucination through multiple mechanisms:

1. Retrieval Gate

If similarity search returns no results above threshold (default 0.7), generation is skipped entirely. Response:

{
  "answer": "Answer not found in documents.",
  "sources": [],
  "confidence": 0.0
}

2. System Prompt Constraint

The LLM receives this system prompt:

ABSOLUTE RULES - VIOLATION IS FORBIDDEN:
1. You may ONLY use information explicitly stated in the provided CONTEXT.
2. You must NEVER use your training data, prior knowledge, or make assumptions.
3. If the CONTEXT does not contain enough information to answer, you MUST respond:
   "I don't know based on the provided documents."
4. If the CONTEXT is empty or completely irrelevant, you MUST respond:
   "Answer not found in documents."

3. Context Isolation

The LLM only receives retrieved chunks, not the full document corpus. It cannot access information that wasn't retrieved.

4. Mandatory Citations

Every response must include source citations. Empty sources = no answer.

Setup

Requirements

Python 3.10+
OpenAI API key

Installation

cd RAG
python -m venv venv

# Windows
venv\Scripts\activate
# Linux/Mac
source venv/bin/activate

pip install -r requirements.txt

Configuration

Create .env file:

cp .env.example .env

Edit .env:

OPENAI_API_KEY=sk-your-key-here

Run

uvicorn app.main:app --reload --port 8000

Environment Variables

Variable	Required	Default	Description
`OPENAI_API_KEY`	Yes	-	OpenAI API key
`APP_ENV`	No	development	Environment (development/production)
`LOG_LEVEL`	No	INFO	Logging level
`EMBEDDING_MODEL`	No	text-embedding-3-small	OpenAI embedding model
`EMBEDDING_DIMENSION`	No	1536	Embedding vector size
`LLM_MODEL`	No	gpt-4-turbo-preview	OpenAI chat model
`LLM_TEMPERATURE`	No	0.0	LLM temperature (0=deterministic)
`SIMILARITY_THRESHOLD`	No	0.7	Min similarity for retrieval
`TOP_K`	No	5	Max documents to retrieve
`CHUNK_SIZE`	No	512	Chunk size in tokens
`CHUNK_OVERLAP`	No	50	Overlap between chunks
`FAISS_INDEX_PATH`	No	./data/faiss_index	FAISS persistence path
`DOCUMENT_STORE_PATH`	No	./data/documents	Metadata storage path

API

Health Check

curl http://localhost:8000/health

Response:

{
  "status": "healthy",
  "version": "1.0.0",
  "timestamp": "2024-01-15T10:30:00Z",
  "components": {
    "embeddings": "healthy",
    "vector_store": "healthy",
    "llm": "healthy"
  }
}

Ingest Document

PDF:

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "pdf",
    "file_path": "C:/path/to/document.pdf",
    "metadata": {"department": "HR"}
  }'

Markdown content:

curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "markdown",
    "content": "# Policy\n\nEmployees get 20 vacation days per year.",
    "metadata": {"source": "hr_policy"}
  }'

Response:

{
  "success": true,
  "message": "Successfully ingested with 5 chunks",
  "documents": [
    {
      "document_id": "doc_a1b2c3d4e5f6",
      "source": "document.pdf",
      "document_type": "pdf",
      "chunk_count": 5,
      "ingested_at": "2024-01-15T10:30:00Z",
      "metadata": {"department": "HR"}
    }
  ],
  "total_chunks": 5,
  "processing_time_ms": 1234.56
}

Query

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How many vacation days do employees get?"
  }'

Response (answer found):

{
  "success": true,
  "answer": "Employees get 20 vacation days per year. [Source 1]",
  "sources": [
    {
      "document_id": "doc_a1b2c3d4e5f6",
      "source": "hr_policy",
      "page_number": null,
      "section": "Policy",
      "relevance_score": 0.89,
      "chunk_text": "Employees get 20 vacation days per year."
    }
  ],
  "confidence": 0.85,
  "query_time_ms": 1456.78,
  "retrieval_time_ms": 45.23,
  "generation_time_ms": 1411.55
}

Response (no answer):

{
  "success": true,
  "answer": "Answer not found in documents.",
  "sources": [],
  "confidence": 0.0,
  "query_time_ms": 45.23,
  "retrieval_time_ms": 45.23,
  "generation_time_ms": 0.0
}

Query with Options

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What is the remote work policy?",
    "top_k": 3,
    "similarity_threshold": 0.8,
    "include_context": true,
    "metadata_filter": {"department": "HR"}
  }'

Project Structure

RAG/
├── app/
│   ├── __init__.py
│   ├── config.py              # Pydantic Settings
│   ├── main.py                # FastAPI app
│   ├── api/
│   │   ├── dependencies.py    # DI singletons
│   │   └── routes/
│   │       ├── health.py      # GET /health
│   │       ├── ingest.py      # POST /ingest
│   │       └── query.py       # POST /query
│   ├── core/
│   │   ├── embeddings.py      # OpenAI embeddings + cache
│   │   ├── llm.py             # OpenAI chat + no-hallucination prompt
│   │   ├── retriever.py       # RAG orchestration
│   │   └── vector_store.py    # FAISS + persistence
│   ├── ingestion/
│   │   ├── loader.py          # PDF/DOCX/MD loaders
│   │   ├── chunker.py         # Semantic chunking
│   │   └── pipeline.py        # Ingestion orchestration
│   └── schemas/
│       ├── common.py          # HealthResponse, ErrorResponse
│       ├── documents.py       # IngestRequest/Response
│       └── query.py           # QueryRequest/Response, SourceCitation
├── data/
│   ├── documents/             # Metadata JSON
│   └── faiss_index/           # FAISS index files
├── .env.example
├── requirements.txt
└── README.md

Data Persistence

FAISS index: ./data/faiss_index/faiss.index
Chunk metadata: ./data/faiss_index/chunks.json
Document metadata: ./data/faiss_index/documents.json
Embedding cache: ./data/documents/embedding_cache/

All data survives restarts. Delete ./data/ to reset.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
docs		docs
examples		examples
.env.example		.env.example
.gitignore		.gitignore
RAG_System_Documentation.pdf		RAG_System_Documentation.pdf
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enterprise RAG System

Architecture

RAG Flow

Ingestion (`POST /ingest`)

Query (`POST /query`)

Hallucination Prevention

1. Retrieval Gate

2. System Prompt Constraint

3. Context Isolation

4. Mandatory Citations

Setup

Requirements

Installation

Configuration

Run

Environment Variables

API

Health Check

Ingest Document

Query

Query with Options

Project Structure

Data Persistence

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Enterprise RAG System

Architecture

RAG Flow

Ingestion (POST /ingest)

Query (POST /query)

Hallucination Prevention

1. Retrieval Gate

2. System Prompt Constraint

3. Context Isolation

4. Mandatory Citations

Setup

Requirements

Installation

Configuration

Run

Environment Variables

API

Health Check

Ingest Document

Query

Query with Options

Project Structure

Data Persistence

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages

Ingestion (`POST /ingest`)

Query (`POST /query`)