-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Since the update fixed the memory issue I've decided to hop back to this library for analysis (not rendering yet, since I'm still having issues with rendering speed and also with bold fonts, but I might come back to those).
Now I am facing another issue with MuPDF. It uses a different coordinate system compared to pdf-lib/pdf.js. At first I thought it might just be the Y-Axis, which is flipped (for MuPDF, top = 0, bottom = big value <-> pdf.js) but I tried converting the bounding box rects I get from toStructuredText (using getWords) by using this formula: pdfLibY = pageHeight - muPdfY, but this yields a weird offset, which is not hardcodeable, since it seems to depend on the font-size (and maybe the font itself as well?).

for the sake of completeness, here is the function I use for converting the rects (note that I also have to flip rect[1] and rect[3], since the maximum value flips as well.)
const viewport = page.getBounds();
const convrect = (rect: Rect) => {
[rect[3], rect[1]] = [
viewport[3] - rect[1],
viewport[3] - rect[3]
];
return rect;
};
// convrect([0, 0, 0, 100]) yields [0, viewport[3] - 100, 0, viewport[3]]