Connection between images #228

Meyer93 · 2025-03-05T17:14:46Z

Meyer93
Mar 5, 2025

After extracting a PDF with multiple images per page I receive the "images" list of objects with bounding boxes, block number an so on. On the other hand I get text with the filename of the image. Is there a way to know which image object is connected to which image - filename?

Thanks

Answered by JorjMcKie

Mar 6, 2025

The images extracted by markdown creation are "photos" of rectangular page areas - not actually extracted images. These page areas may wrap the boundary box of displayed images or vector graphics. There is no way of telling apart these origin types.
The only available information is the sequence number of the generated image file name: it follows a top-left to bottom-right order.
BTW: if you look at images displayed on a PDF page, you never know the file name of the original image embedded in the PDF. This information is not stored by PDF.

View full answer

JorjMcKie · 2025-03-06T00:45:51Z

JorjMcKie
Mar 6, 2025
Maintainer

The images extracted by markdown creation are "photos" of rectangular page areas - not actually extracted images. These page areas may wrap the boundary box of displayed images or vector graphics. There is no way of telling apart these origin types.
The only available information is the sequence number of the generated image file name: it follows a top-left to bottom-right order.
BTW: if you look at images displayed on a PDF page, you never know the file name of the original image embedded in the PDF. This information is not stored by PDF.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection between images #228

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Connection between images #228

Meyer93 Mar 5, 2025

Replies: 1 comment

JorjMcKie Mar 6, 2025 Maintainer

Meyer93
Mar 5, 2025

JorjMcKie
Mar 6, 2025
Maintainer