Skip to content

Converting MuPDF coordinates to pdf-lib/pdf.js coordinates #151

@lbirkert

Description

@lbirkert

Since the update fixed the memory issue I've decided to hop back to this library for analysis (not rendering yet, since I'm still having issues with rendering speed and also with bold fonts, but I might come back to those).

Now I am facing another issue with MuPDF. It uses a different coordinate system compared to pdf-lib/pdf.js. At first I thought it might just be the Y-Axis, which is flipped (for MuPDF, top = 0, bottom = big value <-> pdf.js) but I tried converting the bounding box rects I get from toStructuredText (using getWords) by using this formula: pdfLibY = pageHeight - muPdfY, but this yields a weird offset, which is not hardcodeable, since it seems to depend on the font-size (and maybe the font itself as well?).

Image

for the sake of completeness, here is the function I use for converting the rects (note that I also have to flip rect[1] and rect[3], since the maximum value flips as well.)

const viewport = page.getBounds();
const convrect = (rect: Rect) => {
    [rect[3], rect[1]] = [
        viewport[3] - rect[1],
        viewport[3] - rect[3]
    ];
    return rect;
};

// convrect([0, 0, 0, 100]) yields [0, viewport[3] - 100, 0, viewport[3]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions