Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to xpdf 4.05 #173

Draft
wants to merge 18 commits into
base: master
Choose a base branch
from
Draft

Update to xpdf 4.05 #173

wants to merge 18 commits into from

Conversation

lfoppiano
Copy link
Collaborator

@lfoppiano lfoppiano commented Jan 11, 2025

This PR will provides:

  • update to xpdf 4.05.
  • enable font remapping

@lfoppiano lfoppiano linked an issue Jan 11, 2025 that may be closed by this pull request
@lfoppiano lfoppiano force-pushed the feature/update-xpdf-4.05 branch from 0697454 to f521a02 Compare February 22, 2025 22:40
@lfoppiano lfoppiano force-pushed the feature/update-xpdf-4.05 branch from f521a02 to 2ac4671 Compare February 22, 2025 22:40
@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Feb 22, 2025

I've made some tests (with around thousend of PDF documents), and there seems a few regressions as compared with version 0.5.
Here the number of documents that matches, when processed with version 0.5 and 0.6 on Linux (test made at char level and token level):
image

One example: elife-78526-v1.pdf

from a certain point it starts to leak text:
image

Here the text files for both versions:

elife-78526-v1-0.5.txt
elife-78526-v1-0.6.txt

@lfoppiano
Copy link
Collaborator Author

On mac M2, I've got way less discrepancies:

image

Most discrepancies are crap from figures being extracted by version 0.6, example:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

xpdf version 4.04
1 participant