-
After extracting a PDF with multiple images per page I receive the "images" list of objects with bounding boxes, block number an so on. On the other hand I get text with the filename of the image. Is there a way to know which image object is connected to which image - filename? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The images extracted by markdown creation are "photos" of rectangular page areas - not actually extracted images. These page areas may wrap the boundary box of displayed images or vector graphics. There is no way of telling apart these origin types. |
Beta Was this translation helpful? Give feedback.
The images extracted by markdown creation are "photos" of rectangular page areas - not actually extracted images. These page areas may wrap the boundary box of displayed images or vector graphics. There is no way of telling apart these origin types.
The only available information is the sequence number of the generated image file name: it follows a top-left to bottom-right order.
BTW: if you look at images displayed on a PDF page, you never know the file name of the original image embedded in the PDF. This information is not stored by PDF.