Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVDA ignores line breaks in PDFs, making some types of text like source code unreadable with Adobe Reader #17313

Open
Neurrone opened this issue Oct 20, 2024 · 2 comments
Labels
app/adobe/acrobat p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation.

Comments

@Neurrone
Copy link

Continuation of #7275

Steps to reproduce:

Download this file and try to read it with NVDA in Adobe Reader

Actual behavior:

This is NVDA's speech output on the third line

class Node: def __init__(self, value): 

Expected behavior:

Visually (and obvious from context), this should be two separate lines, like so:

class Node:
  def __init__(self, value): 

These are two separate lines visually in the file.

From #7275 (comment)

PDF has semantic tags for paragraphs, lists, tables and the like. However, it does not differentiate author inserted line breaks (as in source code or poetry, sometimes known as hard line breaks) from line breaks used to wrap text which cannot fit on a single line (sometimes known as soft line breaks). Because NVDA splits text into lines itself (according to the "Maximum number of characters on one line" Browse Mode setting), we strip line break characters, as otherwise, you end up with a lot of long lines followed by short lines (as I recall happened in JAWS when I used it years ago). Having spoken to someone involved in PDF accessibility specification writing, my understanding is that the correct way to author such content is to tag each line as a separate list item or paragraph. Unfortunately, it seems no one actually does this in the wild.
I think the only way we could reasonably solve this is to ignore NVDA's own settings for splitting lines and instead use only the line breaks in the PDF. That would also require us to not treat line breaks as paragraphs for PDF. This would be somewhat inconsistent with browse mode everywhere else, but I think consistency is probably outweighed by usability here.

NVDA logs, crash dumps and other attachments:

System configuration

NVDA installed/portable/running from source:

Installed

NVDA version:

alpha-34198,67f6cb99 (2025.1.0.34198)

Windows version:

Windows 11 23H2 (OS Build 22631.4317)

Name and version of other software in use when reproducing the issue:

Adobe reader 2024.003.20180

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

Yes, this has been an issue since 2017

If NVDA add-ons are disabled, is your problem still occurring?

Yes

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Yes

@Neurrone Neurrone changed the title NVDA ignores line breaks in PDFs, making some types of text like source code unreadable NVDA ignores line breaks in PDFs, making some types of text like source code unreadable with Adobe Reader Oct 20, 2024
@seanbudd seanbudd added p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority app/adobe/acrobat triaged Has been triaged, issue is waiting for implementation. labels Oct 21, 2024
@Neurrone
Copy link
Author

Should this be P2 instead? I would imagine that reading PDFs with Adobe Reader is somewhat common.

@u-fischer
Copy link

Well the PDF is not tagged at all. If I produce a tagged version (attached) with a current lualatex then line breaks are inserted. In the speech-viewer I get then

Here is a linked list node for the following questions. The empty linked list will be represented as
None.
class Node:
def __init__(self, value):
self.value = value
self.next = None

test-verbatim.pdf

In the PDF the code lines start with real space chars, but sadly they are ignored and so the indentation (which can be meaningful in code) is lost.

The tagging is not the final version LaTeX will use, we are waiting on a verapdf update that would allow us to use Code for the code part and Sub for the single lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app/adobe/acrobat p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation.
Projects
None yet
Development

No branches or pull requests

3 participants