PDF and Image Text Analyzer

This application allows users to upload PDF files or images, extract text, generate summaries, break text into paragraphs, and create questions/answers for selected paragraphs.

Features

PDF and image text extraction
Text summarization using T5 model
Paragraph segmentation
Question and answer generation

Requirements

Python 3.7+
Tesseract OCR must be installed for image text extraction

Installation

Clone this repository or download the files
Install the required dependencies:

pip install -r requirements.txt

Install Tesseract OCR:
- Windows: Download and install from https://github.com/UB-Mannheim/tesseract/wiki
- Mac: brew install tesseract
- Linux: sudo apt install tesseract-ocr

Usage

Run the Streamlit app:

streamlit run app.py

Upload a PDF or image file
View the extracted text and summary
Explore the paragraphs
Select a paragraph number and click "Generate Q&A" to create questions and answers based on that paragraph

Important Notes

For large files, processing may take some time
The quality of text extraction from images depends on the clarity of the image
For optimal performance, ensure your PDF contains selectable text (not scanned images)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
static		static
templates		templates
README.md		README.md
app.py		app.py
backend.py		backend.py
backend_requirements.txt		backend_requirements.txt
index.html		index.html
requirements.txt		requirements.txt
script2.js		script2.js
static_server.py		static_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF and Image Text Analyzer

Features

Requirements

Installation

Usage

Important Notes

About

Uh oh!

Releases

Packages

Languages

pushkar2510/Doc_Sum

Folders and files

Latest commit

History

Repository files navigation

PDF and Image Text Analyzer

Features

Requirements

Installation

Usage

Important Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages