Skip to content

Latest commit

 

History

History
27 lines (20 loc) · 1.09 KB

File metadata and controls

27 lines (20 loc) · 1.09 KB

PDF-data-extraction

Working on different packages of python for extracting data from a pdf

There are many python packages to play with pdf files namely:

  1. PyPDF2
  2. PDFMiner
  3. Slate
  4. Tabula

PyPDF2

A python library built as a PDF toolkit. It is capable of:
  • Extracting document information (title, author, …)
  • Splitting documents page by page
  • Merging documents page by page
  • Cropping pages
  • Merging multiple pages into a single page
  • Encrypting and decrypting PDF file

Check Working with PyPDF2

PDFMiner

Check Working with PDFMiner