Add PDF fragment loader plugin to directory #954
+1
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi! After making this feature for my other arxiv plugin i thought it could be useful for generic PDF's too! so i made
llm-plugin-pdf provides a
-f pdf:
loader that can load local or remote PDF files as fragments.A little wrapper around pyMuPDF that will try to parse a PDF text and images into markdown to provide a PDF's files contents as a fragment
this should use way less tokens than feeding a full PDF to a model directly, most papers are actually built from source so they have great support that doesn't rely on clunky OCR (i explored using grobid for other uses, but requires a server which made it a nono for this, pyMuPDF worked well on my tests and i was able to also parse the pdf.images into base64 encoded data so it's all passed as fragments to the model, not only text)