Pdf & images lab

A project to explore libraries to extract text from pdfs such as:

Besides, I explore others to extract text from images such as

Additionally, how to extract text from pdfs using LLMs is also explored

Setup

Step 1. Navigate to the root directory of the repository and create a new conda environment for development:

uv venv .venv

Step 2. Activate the environment:

source .venv/Scripts/activate

Step 3. Install the dependencies:

uv pip install -e .

Go to the notebook and select your environment to run the cells.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
donut		donut
img_easyocr		img_easyocr
img_pytesseract		img_pytesseract
img_transformers		img_transformers
llamaparse		llamaparse
pdf_mathpix		pdf_mathpix
pdfminer		pdfminer
pdfplumber		pdfplumber
pyMuPDF		pyMuPDF
pyPDF2		pyPDF2
pypdfium2		pypdfium2
.gitignore		.gitignore
.python-version		.python-version
AutoGen_LLM_agent.pdf		AutoGen_LLM_agent.pdf
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock