📘 PDF Insight Engine

Built with 💡 by Kavya Bhardwaj

The PDF Insight Engine is an AI-powered tool designed to extract valuable insights from PDF documents. It combines advanced embedding models, semantic vector storage, and powerful language-generation APIs to provide interactive and detailed answers based on your uploaded documents.

🚀 What This Tool Does

Given any PDF document, the engine:

📄 Extracts text accurately from PDFs.
🔖 Breaks documents into semantic chunks.
🧠 Generates embeddings using Sentence Transformers.
🔍 Stores and retrieves information efficiently via Pinecone Vectorstore.
✨ Provides refined, detailed answers using Cohere’s generative models.

🧠 Technologies Used

Component	Technology Used
PDF Text Extraction	✅ LangChain (PyPDFLoader)
Semantic Embeddings	✅ SentenceTransformers (all-MiniLM-L6-v2)
Vector Database	✅ Pinecone Vectorstore
Enhanced Responses	✅ Cohere API (command-xlarge)
Text Processing	✅ LangChain Text Splitter, NLTK
OCR (Optional)	✅ Poppler, Tesseract

🧩 Smart Features

✅ Intelligent semantic chunking of text.
✅ Rapid similarity-based search for document querying.
✅ Cohere API integration for contextually refined answers.
✅ Supports fallback and retry logic for robust processing.

📦 How to Run

1. Clone the Repository

git clone https://github.com/Kavya071/PDF-Insight-Engine.git
cd PDF-Insight-Engine

2. Install Dependencies

pip install openai==0.27.2 langchain-community sentence_transformers pinecone-client cohere nltk unstructured
sudo apt-get install poppler-utils tesseract-ocr

3. Set Up API Keys

Replace the placeholders with your actual API keys:

PINECONE_API_KEY = "your_pinecone_api_key"
COHERE_API_KEY = "your_cohere_api_key"

4. Run the Application

Launch the notebook (PDF_Insight.ipynb) in Google Colab or locally:

jupyter notebook PDF_Insight.ipynb

Then follow the prompts to upload PDFs and enter queries interactively.

📧 Contact

Feel free to connect:

✉️ bhardwajkavya099@gmail.com

Built with 💡 by Kavya Bhardwaj

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PDF_Insight.ipynb		PDF_Insight.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 PDF Insight Engine

🚀 What This Tool Does

🧠 Technologies Used

🧩 Smart Features

📦 How to Run

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📘 PDF Insight Engine

🚀 What This Tool Does

🧠 Technologies Used

🧩 Smart Features

📦 How to Run

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages