Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V65 fonts break if document contains named page with <img> tag only #2421

Open
LukasKlement opened this issue Mar 26, 2025 · 1 comment
Open

Comments

@LukasKlement
Copy link

When I have a document with a named page with just one image tag, then the fonts break in the document, e.g. it displays a series of numbers of symbols, on Google Chrome PDF viewer only.

Example HTML [broken]:

<html lang="pt">
<head>
  <title>Test</title>
  <meta content="text/html; charset=utf-8" http-equiv="Content-type">
  <style>
    .coverpage+.coverpage {
      page-break-before: always;
    }
  </style>
</head>

<body class="">
  <div class="coverpage">
    <img src="https://api.menutech.com/media/inline_images/Berlin_NZDC6WZ.jpg">
  </div>
  <div class="coverpage">
    <p>TEST STRING</p>
  </div>
</body>
</html>

When I add one more tag to the container with the image, then it works:

Example HTML [working]:

<html lang="pt">
<head>
  <title>Test</title>
  <meta content="text/html; charset=utf-8" http-equiv="Content-type">
  <style>
    .coverpage+.coverpage {
      page-break-before: always;
    }
  </style>
</head>

<body class="">
  <div class="coverpage">
    <p>TEST</p>
    <img src="https://api.menutech.com/media/inline_images/Berlin_NZDC6WZ.jpg">
  </div>
  <div class="coverpage">
    <p>TEST STRING</p>
  </div>
</body>
</html>

The bug occurs in Google Chrome only, under V65 (but also on previous versions V64). pdf.js e.g. in Firefox and Adobe Acrobat displays the text correctly. The bug occurs when a PDF is generated by WeasyPrint, if I re-save it in another application (e.g. Apple Preview), the document displays the correct text again.

@liZe
Copy link
Member

liZe commented Mar 26, 2025

Hi!

Thanks for the report, we discussed about this bug earlier with other users.

The problem is caused by a bug in Chrome’s OCR feature, it doesn’t happen only with WeasyPrint PDFs.

Until Chrome developers fix the bug, the known possibilities are:

  • to disable the "Make the text in PDF images interactable" flag in Chrome, or
  • to transform WeasyPrint’s PDF into another PDF with a third-party library such as Ghostscript (using gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf for example) or Apple Preview that you proposed.

As we don’t know where this bug comes from, the second solution may not work in some cases. I suppose (but I’m not sure) that it comes from the ToUnicode table that we always use in WeasyPrint, while other tools tend to use one of the default PDF encodings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants