Cache parenthesized expression boundaries in the formatter by charliermarsh · Pull Request #26344 · astral-sh/ruff

charliermarsh · 2026-06-24T23:45:19Z

Summary

Formatting frequently asks whether an expression was parenthesized in the source. Prior to this change, each check re-tokenized the surrounding source and skipped trivia independently, even though the parser token stream is already available.

This PR builds TriviaRanges in the same token traversal that collects comment ranges. It stores those ranges alongside a ParenthesizedExpressions index backed by a single FxHashSet<TextRange>, so comment placement and PyFormatContext share the same source-boundary data. Hot-path checks become one lookup without an additional token traversal or source-scanning fallback.

Formatter paths that need the full parenthesized range continue to use the parsed-token parentheses_iterator.

astral-sh-bot · 2026-06-24T23:51:09Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

codspeed-hq · 2026-06-24T23:51:18Z

Merging this PR will improve performance by 6.4%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 4 improved benchmarks
✅ 143 untouched benchmarks
⏩ 4 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`formatter[large/dataset.py]`	9.2 ms	8.4 ms	+9.92%
⚡	Simulation	`formatter[numpy/ctypeslib.py]`	1.8 ms	1.7 ms	+5.52%
⚡	Simulation	`formatter[pydantic/types.py]`	3.5 ms	3.3 ms	+5.37%
⚡	Simulation	`formatter[unicode/pypinyin.py]`	669.8 µs	638.9 µs	+4.85%

Tip

Curious why this is faster? Use the CodSpeed MCP and ask your agent.

_{Comparing charlie/codex-formatter-parentheses-index (cc9584b) with main (645dca3)}

4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

MichaReiser

Nice

MichaReiser · 2026-06-26T07:05:18Z

+        comments: &'a Comments<'a>,
+        context: &PyFormatContext,


Do we need to pass both? context contains comments

MichaReiser · 2026-06-26T07:06:49Z

+/// this index once avoids re-tokenizing the source for every expression.
+#[derive(Debug)]
+pub(crate) struct ParenthesesIndex {
+    ranges: FxHashSet<TextRange>,


Have you compared this version with a Vec<TextRange> with binary search?

MichaReiser · 2026-06-26T07:08:23Z

+pub(crate) fn is_expression_parenthesized(expr: ExprRef, context: &PyFormatContext) -> bool {
+    context.is_expression_parenthesized(expr)
+}


Two alternatives:

a) Make it a method on context. IMO, reads slightly nicer: context.is_expression_parenthesized(expr)
b) Define a new trait and implement it for ExprRef, Expr which has a is_parenthesized(context) method

MichaReiser · 2026-06-26T07:09:24Z

+
 /// Returns `true` if the [`ExprRef`] is enclosed by parentheses in the source code.
-pub(crate) fn is_expression_parenthesized(
+pub(crate) fn is_expression_parenthesized(expr: ExprRef, context: &PyFormatContext) -> bool {


Can we delete this function. It looks like codex got tired of rewriting all call sites. But I rather prefer that over having this wrapper

MichaReiser · 2026-06-26T07:13:45Z

+
+/// Returns `true` if the [`ExprRef`] is enclosed by parentheses by re-tokenizing the surrounding
+/// source. Prefer [`is_expression_parenthesized`] when a formatting context is available.
+pub(crate) fn is_expression_parenthesized_in_source(


It's unfortunate that we still need this function only for comment placement. Any chance we could compute the ParenthesizedExpressions index earlier and pass it to context instead? That would also eliminate the need for AssertEquivalent (which codex claims isn't equivalent, the new version is stricter because it matches pairs where the old implementation did not)

MichaReiser · 2026-06-26T07:16:18Z

+        let mut stack = Vec::<Option<TextSize>>::new();
+        let mut previous_end = None;
+
+        for token in tokens {


It's a bit unfortunate that this requires a full tokens traversal.

I wonder if we could build it as part of Comments/CommentRanges (passed in to format_module_ast), because building CommentRanges also requires a full tokens pass. Building the struct earlier would also allow us to use it in CommentsBuilder

astral-sh-bot · 2026-06-26T12:35:36Z

Typing conformance results

No changes detected ✅

Current numbers

The percentage of diagnostics emitted that were expected errors held steady at 94.47%. The percentage of expected errors that received a diagnostic held steady at 89.19%. The number of fully passing files held steady at 95/134.

astral-sh-bot · 2026-06-26T12:36:30Z

Memory usage report

Memory usage unchanged ✅

astral-sh-bot · 2026-06-26T12:37:10Z

`ecosystem-analyzer` results

No diagnostic changes detected ✅

Flaky changes detected. This PR summary excludes flaky changes; see the HTML report for details.

Full report with detailed diff (timing results)

MichaReiser

wow, this is huge

MichaReiser · 2026-06-26T17:56:12Z

    }
 }

+impl From<&Tokens> for TriviaRanges {


Sorry for expanding scope, it's completely fine not to do this in this PR. Should we also replace the function that we use in the linter to use the new trivia ranges?

Good question, I think that should be a separate change.

MichaReiser · 2026-06-26T17:58:20Z

+                        parenthesized.insert(TextRange::new(start, end));
+                    }
+                }
+                _ => {


This opens an interesting question. Is it intentional that we set start if the current token is a comment?

Edit: We actually don't do this... I should not review code this late :)

MichaReiser · 2026-06-26T18:07:07Z

Oh, Codex is right here

[P3] TriviaRanges should not dereference to CommentRanges. Methods like trivia_ranges.is_empty() silently inspect only comments, despite the type also containing parenthesized-expression data. This ambiguity already appears in Comments::from_ast and could cause future misuse. Prefer explicit .comments() access and remove the Deref implementation.

I think it should be trivia.comments() and trivia.parenthesized()

This reverts commit 27fb258.

charliermarsh · 2026-06-26T19:04:45Z

<3

charliermarsh force-pushed the charlie/codex-formatter-parentheses-index branch from 6531bd4 to 3670e4a Compare June 25, 2026 17:50

charliermarsh added performance Potential performance improvement formatter Related to the formatter labels Jun 25, 2026

charliermarsh marked this pull request as ready for review June 25, 2026 18:49

charliermarsh requested a review from MichaReiser as a code owner June 25, 2026 18:49

This comment was marked as spam.

Sign in to view

MichaReiser reviewed Jun 26, 2026

View reviewed changes

charliermarsh marked this pull request as draft June 26, 2026 12:31

charliermarsh force-pushed the charlie/codex-formatter-parentheses-index branch from 74d960a to 045e89b Compare June 26, 2026 12:33

charliermarsh force-pushed the charlie/codex-formatter-parentheses-index branch from 045e89b to 27fb258 Compare June 26, 2026 14:22

charliermarsh marked this pull request as ready for review June 26, 2026 14:59

charliermarsh requested a review from MichaReiser June 26, 2026 14:59

MichaReiser approved these changes Jun 26, 2026

View reviewed changes

charliermarsh added 8 commits June 26, 2026 14:40

Cache parenthesized expression boundaries in formatter

0a1c825

Use parsed tokens for parenthesized ranges

017969c

Store parenthesized ranges in a single set

ac5d548

Avoid copying tokens in parentheses index

dac066c

Build formatter trivia ranges in one token pass

c8c7637

Use binary search for parenthesized ranges

a47a229

Revert "Use binary search for parenthesized ranges"

d81b427

This reverts commit 27fb258.

Move formatter trivia to context

cc9584b

charliermarsh force-pushed the charlie/codex-formatter-parentheses-index branch from 2539a56 to cc9584b Compare June 26, 2026 18:53

charliermarsh merged commit 2cd74f0 into main Jun 26, 2026
61 of 62 checks passed

charliermarsh deleted the charlie/codex-formatter-parentheses-index branch June 26, 2026 19:04

Uh oh!

Conversation

charliermarsh commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

astral-sh-bot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

codspeed-hq Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 6.4%

Performance Changes

Footnotes

Uh oh!

This comment was marked as spam.

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

astral-sh-bot Bot commented Jun 26, 2026

No changes detected ✅

Uh oh!

astral-sh-bot Bot commented Jun 26, 2026

Memory usage report

Uh oh!

astral-sh-bot Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ecosystem-analyzer results

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MichaReiser commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

charliermarsh commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

charliermarsh commented Jun 24, 2026 •

edited

Loading

astral-sh-bot Bot commented Jun 24, 2026 •

edited

Loading

`ruff-ecosystem` results

codspeed-hq Bot commented Jun 24, 2026 •

edited

Loading

astral-sh-bot Bot commented Jun 26, 2026 •

edited

Loading

`ecosystem-analyzer` results

MichaReiser Jun 26, 2026 •

edited

Loading

MichaReiser commented Jun 26, 2026 •

edited

Loading