Skip to content

Parse optionals and repeats without regexes #1826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 15, 2025

Conversation

traviscross
Copy link
Contributor

@traviscross traviscross commented May 15, 2025

Rather than parsing optionals and repeats fully in the recursive descent style, we were using regular expressions to do part of the matching and parsing. That's fine for what it is, but as we think about extending the grammar language surrounding repeats further, it might be more straightforward for this to be parsed in the more usual way. So let's do that. Doing this also results in better and more targeted errors when parsing malformed syntax.

We had been supporting a space between an expression and the optional and repeat sigils ?, *, and + (but not between an expression and the {a..b} ranged repeat syntax). In making this change, we drop this support and adjust the affected productions. We were only using this in a handful of places, and the clarity of the productions seem the same or better by removing these spaces.

We verified that, setting aside the removal of these spaces, the rendered output of the Reference is byte identical before and after this change.

cc @ehuss

@traviscross
Copy link
Contributor Author

The motivation for this refactoring is to next add repeats with separators.

@traviscross traviscross marked this pull request as ready for review May 15, 2025 13:33
@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label May 15, 2025
@traviscross traviscross force-pushed the TC/parse-repeats-without-regex branch from 747ca2f to 7e7e973 Compare May 15, 2025 13:36
Rather than parsing optionals and repeats fully in the recursive
descent style, we were using regular expressions to do part of the
matching and parsing.  That's fine for what it is, but as we think
about extending the grammar language surrounding repeats further, it
might be more straightforward for this to be parsed in the more usual
way.  So let's do that.  Doing this also results in better and more
targeted errors when parsing malformed syntax.

We had been supporting a space between an expression and the optional
and repeat sigils `?`, `*`, and `+` (but not between an expression and
the `{a..b}` ranged repeat syntax).  In making this change, we drop
this support and adjust the affected productions.  We were only using
this in a handful of places, and the clarity of the productions seem
the same or better by removing these spaces.

We verified that, setting aside the removal of these spaces, the
rendered output of the Reference is byte identical before and after
this change.
@traviscross traviscross force-pushed the TC/parse-repeats-without-regex branch from 7e7e973 to 3570070 Compare May 15, 2025 14:57
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ehuss ehuss added this pull request to the merge queue May 15, 2025
Merged via the queue into master with commit c703c8d May 15, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: The marked PR is awaiting review from a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants