📄 RAG-based-on-PDF

RAG-based-on-PDF is a Retrieval-Augmented Generation (RAG) system that lets users ask questions and get document-grounded answers. In this version, the system is built on a single scraped web page from VS Code documentation.

It uses FAISS vector search, HuggingFace embeddings, and OpenRouter LLM via a Python backend, with a Streamlit frontend for interactive usage.

🚀 Key Features

Ask natural language questions from web-scraped documents
Efficient retrieval using FAISS embeddings
Context-aware answers using OpenRouter LLM
Streamlit frontend for interactive Q&A
Fully Python-based and modular architecture

🧠 Technology Stack

Python: FAISS, LangChain, HuggingFace Transformers, Streamlit
OpenRouter LLM (Cloud-based)
Pickle for embeddings storage

📂 Project Structure

RAG-based-on-PDF/
├─ .venv/                      # Python virtual environment
├─ scraped_docs/               # Scraped text files
│   └─ getting_started.txt      # VS Code Getting Started page
├─ .env                        # Environment variables
├─ app.py                       # (Optional backend for future extensions)
├─ App_streamlit.py            # Streamlit frontend for Q&A
├─ faiss_index.index           # FAISS index
├─ faiss_index.pkl             # FAISS metadata
├─ rag_engine.py               # RAG engine implementation
├─ web_scraping.py             # Script to scrape VS Code page
├─ README.md
├─ requirements.txt
└─ other project files

⚙️ Setup Instructions

1️⃣ Environment Setup

python -m venv .venv
.venv\Scripts\activate      # Windows
# source .venv/bin/activate  # Linux/Mac
pip install -r requirements.txt

Add your OpenRouter API key to .env:

OPENROUTER_API_KEY=your_openrouter_api_key_here

2️⃣ Run Streamlit Frontend

streamlit run App_streamlit.py

Open in browser: http://localhost:8501

▶️ Usage

Open the Streamlit app in your browser
Type a question in the input box
The system retrieves relevant content from the scraped VS Code page and generates an answer using OpenRouter LLM

Optional: You can show the retrieved context to see which text chunks were used.

🎥 Demo

Watch the demo video here: 👉 https://drive.google.com/drive/folders/1hQ1H5QJlVAmvm9sM1MyERI-o9GXKp5-z?usp=drive_link

🧩 RAG Workflow Overview

The VS Code page is scraped and saved as text (getting_started.txt)
The text is split into smaller chunks
Chunks are embedded using HuggingFace embeddings and stored in FAISS
User question is embedded
FAISS retrieves the most relevant chunks
Retrieved chunks are passed to OpenRouter LLM
LLM generates a factual, document-grounded answer

📌 Important Notes

Answers are strictly based on the scraped page content
If no relevant context is found, it returns “Not found in document”
No additional internet or external APIs are required during inference (except for OpenRouter LLM calls)

👩‍💻 Author

Iqra Khan AI Engineer | RAG Systems | LLM Applications | Streamlit

📄 License

For educational and demonstration purposes only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 RAG-based-on-PDF

🚀 Key Features

🧠 Technology Stack

📂 Project Structure

⚙️ Setup Instructions

1️⃣ Environment Setup

2️⃣ Run Streamlit Frontend

▶️ Usage

🎥 Demo

🧩 RAG Workflow Overview

📌 Important Notes

👩‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
__pycache__		__pycache__
scraped_docs		scraped_docs
.gitignore		.gitignore
App_streamlit.py		App_streamlit.py
README.md		README.md
app.py		app.py
faiss_index.index		faiss_index.index
faiss_index.pkl		faiss_index.pkl
getting_started.txt		getting_started.txt
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt
web_scraping.py		web_scraping.py

Folders and files

Latest commit

History

Repository files navigation

📄 RAG-based-on-PDF

🚀 Key Features

🧠 Technology Stack

📂 Project Structure

⚙️ Setup Instructions

1️⃣ Environment Setup

2️⃣ Run Streamlit Frontend

▶️ Usage

🎥 Demo

🧩 RAG Workflow Overview

📌 Important Notes

👩‍💻 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages