27 lines (20 loc) · 1.09 KB

PDF-data-extraction

Working on different packages of python for extracting data from a pdf

There are many python packages to play with pdf files namely:

PyPDF2
PDFMiner
Slate
Tabula

PyPDF2

A python library built as a PDF toolkit. It is capable of:

Extracting document information (title, author, …)
Splitting documents page by page
Merging documents page by page
Cropping pages
Merging multiple pages into a single page
Encrypting and decrypting PDF file

Check Working with PyPDF2

PDFMiner

Check Working with PDFMiner

Using PyPDF2 with local and online pdfs and saving the text in a .txt file

Converting data of PDF file (both local and online pdfs to txt, json and html files usinf PDFMiner