PDF-data-extraction

Working on different packages of python for extracting data from a pdf

There are many python packages to play with pdf files namely:

PyPDF2
PDFMiner
Slate
Tabula

PyPDF2

A python library built as a PDF toolkit. It is capable of:

Extracting document information (title, author, …)
Splitting documents page by page
Merging documents page by page
Cropping pages
Merging multiple pages into a single page
Encrypting and decrypting PDF file

Check Working with PyPDF2

PDFMiner

Check Working with PDFMiner

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data-extracter-utility		data-extracter-utility
pdf		pdf
PDFMiner.ipynb		PDFMiner.ipynb
PyPDF2.ipynb		PyPDF2.ipynb
README.md		README.md
Slate ( wrapper of PDFMiner).ipynb		Slate ( wrapper of PDFMiner).ipynb
rotated_ex2.pdf		rotated_ex2.pdf
w9.pdf		w9.pdf
watermark.pdf		watermark.pdf
watermarked_example.pdf		watermarked_example.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-data-extraction

PyPDF2

A python library built as a PDF toolkit. It is capable of:

PDFMiner

Using PyPDF2 with local and online pdfs and saving the text in a .txt file

Converting data of PDF file (both local and online pdfs to txt, json and html files usinf PDFMiner

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF-data-extraction

PyPDF2

A python library built as a PDF toolkit. It is capable of:

PDFMiner

Using PyPDF2 with local and online pdfs and saving the text in a .txt file

Converting data of PDF file (both local and online pdfs to txt, json and html files usinf PDFMiner

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages