Skip to content

oomol-flows/pdf-craft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

badae78 · Mar 20, 2025

History

9 Commits
Mar 14, 2025
Mar 20, 2025
Mar 20, 2025
Mar 18, 2025
Mar 14, 2025
Mar 18, 2025
Mar 14, 2025
Mar 20, 2025
Mar 14, 2025
Mar 20, 2025
Mar 20, 2025

Repository files navigation

PDF craft

This repo is a wrapper of the pdf-craft project of the same name.

About PDF craft

Introduction

PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books. The project has just started. If you encounter any problems or have any suggestions, please submit issues.

This project can read PDF pages one by one, and use DocLayout-YOLO mixed with an algorithm I wrote to extract the text from the book pages and filter out elements such as headers, footers, footnotes, and page numbers. In the process of crossing pages, the algorithm will be used to properly handle the problem of the connection between the previous and next pages, and finally generate semantically coherent text. The book pages will use OnnxOCR for text recognition. And use layoutreader to determine the reading order that conforms to human habits.

With only these AI models that can be executed locally (using local graphics devices to accelerate), PDF files can be converted to Markdown format. This is suitable for papers or small books.

However, if you want to parse books (generally more than 100 pages), it is recommended to convert them to EPUB format files. During the conversion process, this library will pass the data recognized by the local OCR to LLM, and build the structure of the book through specific information (such as the table of contents), and finally generate an EPUB file with a table of contents and chapters. During this parsing and construction process, the annotations and citations information of each page will be read through LLM, and then presented in the new format in the EPUB file. In addition, LLM can correct OCR errors to a certain extent. This step cannot be performed entirely locally. You need to configure the LLM service. It is recommended to use DeepSeek. The prompt of this library is based on V3 model testing.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages