Fix handling of docstrings with tokenization errors (Fixes #18388) #18575

gareth-cross · 2025-01-31T03:49:37Z

This changes fixes #18388.

When parsing docstrings, stubgen will bail on a docstring when the first tokenization error is encountered. This behavior is brittle, because docstrings need not be entirely valid python and can contain characters that cause early failure. Consider the following example:

def thing():
  """
  thing(*args, **kwargs)
  Overloaded function.

  1. thing(x: int) -> None

  .. math::
    \mathbf{x} = 3 \cdot \mathbf{y}

  2. thing(x: int, y: int) -> str

  This signature will never get parsed due to TokenError.
  """

The presence of the LaTeX code will cause TokenError to occur, and the second overload will never get parsed.

This change causes mypy to resume parsing after an error is occurred, such that later overloads can still be discovered. The new behavior is somewhat more robust to failures of this kind. I also added two tests with example docstrings that previously failed.

github-actions · 2025-01-31T04:18:48Z

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

gareth-cross added 2 commits January 30, 2025 19:44

Fix handling of docstrings with tokenization errors

568b3ff

Fix type checking error in teststubgen.py

8cdb58c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of docstrings with tokenization errors (Fixes #18388) #18575

Fix handling of docstrings with tokenization errors (Fixes #18388) #18575

gareth-cross commented Jan 31, 2025

github-actions bot commented Jan 31, 2025

Fix handling of docstrings with tokenization errors (Fixes #18388) #18575

Are you sure you want to change the base?

Fix handling of docstrings with tokenization errors (Fixes #18388) #18575

Conversation

gareth-cross commented Jan 31, 2025

github-actions bot commented Jan 31, 2025