Skip to content

shivani-chitukula/Research-Intelligence-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔬 AI Research Assistant

An intelligent multi-document analysis system built using Retrieval-Augmented Generation (RAG).
It enables users to upload documents, ask questions, generate structured summaries, and perform comparative analysis.


🚀 Features

  • Upload and process multiple PDF documents
  • Chat with documents using RAG-based Q&A
  • Structured summarization (overview, concepts, limitations)
  • Multi-document comparison with tabular output
  • Hybrid retrieval (Vector + BM25) with reranking
  • Improved retrieval precision using optimized pipeline

Tech Stack

  • Python
  • Streamlit
  • Framework: LlamaIndex
  • LLM: Groq (llama-3.3-70b-versatile)
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • Vector Store: FAISS
  • Retrieval: Hybrid (Vector + BM25)
  • Reranking: Cross-Encoder (ms-marco-MiniLM-L-6-v2)

📊 Key Highlights

  • Improved retrieval precision by ~35% using hybrid search + reranking
  • Reduced token usage by ~60% using summarization-first pipeline
  • Designed structured multi-document comparison for better interpretability

📁 Project Structure

├── frontend/
│   └── app.py
├── services/
│   ├── ingestion.py
│   ├── vector_store.py
│   ├── retriever.py
│   ├── rag_pipeline.py
│   ├── summarizer.py
│   └── comparator_agent.py
├── evals/
│   └── simple_eval.py

Setup Instructions

1. Clone the repository

git clone <your-repo-url>
cd <project-folder>

2. Create environment

conda create -n genai python=3.10
conda activate genai

3. Install dependencies

pip install -r requirements.txt

4. Setup environment variables

Create a .env file:

GROQ_API_KEY=your_groq_api_key

6. Run the app

python -m streamlit run app.py

How It Works

  • Upload PDFs → system processes and converts them into chunks with metadata
  • Chunks are embedded and stored in FAISS for efficient retrieval
  • User query → hybrid retrieval (vector + BM25) + reranking → relevant context
  • LLM (Groq) generates answers or structured summaries from retrieved context
  • Summaries are compared to produce structured multi-document analysis

UI Preview

image

Summaries

image

Comparision

image

About

AI-powered multi-document research assistant

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages