pdfload: control region to be rendered via page_box (pdfium only) #4605
+159
−6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Each page in a PDF document can specify up to 5 different, optional page boxes, namely media, crop, trim, bleed and art. A PDF renderer should, by default, use the crop box if it exists otherwise use the page dimensions, and this behaviour remains the same.
This PR adds a new
page_box
parameter to the pdfium-poweredpdfload
to allow control over which page box should define the region to render. If the requested page box is not present, it will fallback to using the crop box (which by default is the page dimensions).As pdfium uses two different coordinate systems, this change ensures the page box calculations occur using the page coordinate system before we start to request dimensions in the device-specific client coordinate system. (When I started work on this I added some logic to libvips to convert between the two systems, but the final approach here ends up being much simpler.)
From what I can tell, the pdfium API calls to read/write page box info are not thread safe so these are wrapped in the mutex.
I've updated the pdfium-based operation descriptions to include a "pdfium" suffix to expose (in a rather brittle manner) which PDF library is being used as this is required by the tests. I'd be very happy to improve this if there's a better way.
The existing PDF tests are expanded to cover this new logic, including a new test fixture.
The fuzz test environment already includes pdfium and therefore includes coverage for pdfiumload so I've not added anything further here.
I investigated how we might implement this via Poppler, however its (glib-based) C API does not expose any page box information. To implement page box rendering with Poppler we would have to either switch to using its C++ API or modify Poppler itself, both of which are beyond the scope of this feature.
Closes #4597