Skip to content

pdfload: control region to be rendered via page_box (pdfium only) #4605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2025

Conversation

lovell
Copy link
Member

@lovell lovell commented Jul 10, 2025

Each page in a PDF document can specify up to 5 different, optional page boxes, namely media, crop, trim, bleed and art. A PDF renderer should, by default, use the crop box if it exists otherwise use the page dimensions, and this behaviour remains the same.

This PR adds a new page_box parameter to the pdfium-powered pdfload to allow control over which page box should define the region to render. If the requested page box is not present, it will fallback to using the crop box (which by default is the page dimensions).

As pdfium uses two different coordinate systems, this change ensures the page box calculations occur using the page coordinate system before we start to request dimensions in the device-specific client coordinate system. (When I started work on this I added some logic to libvips to convert between the two systems, but the final approach here ends up being much simpler.)

From what I can tell, the pdfium API calls to read/write page box info are not thread safe so these are wrapped in the mutex.

I've updated the pdfium-based operation descriptions to include a "pdfium" suffix to expose (in a rather brittle manner) which PDF library is being used as this is required by the tests. I'd be very happy to improve this if there's a better way.

The existing PDF tests are expanded to cover this new logic, including a new test fixture.

The fuzz test environment already includes pdfium and therefore includes coverage for pdfiumload so I've not added anything further here.

I investigated how we might implement this via Poppler, however its (glib-based) C API does not expose any page box information. To implement page box rendering with Poppler we would have to either switch to using its C++ API or modify Poppler itself, both of which are beyond the scope of this feature.

media crop bleed trim art

Closes #4597

@jcupitt
Copy link
Member

jcupitt commented Jul 11, 2025

How about adding this param to the poppler loader as well, but making it do nothing? Or maybe issue a warning? We could possibly wire it up to something in the future if poppler expose this feature or we switch to the poppler C++ API.

I think this is what we do for other operations which can change their behaviour with compile time flags.

I suppose we could have a abstract base PDF class with the args and then two implementation sub classes, but that sounds like a PITA.

@lovell lovell force-pushed the pdfload-add-page-box branch from e1c2bf9 to 94e116d Compare July 11, 2025 13:19
@jcupitt
Copy link
Member

jcupitt commented Jul 11, 2025

LGTM!

@lovell lovell merged commit e5e5c20 into libvips:master Jul 11, 2025
6 checks passed
@lovell lovell deleted the pdfload-add-page-box branch July 11, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for PDF boxes
2 participants