Skip to content

Fix trailing ) from interfering with extraction in Clojure keywords #18345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eneroth
Copy link
Contributor

@eneroth eneroth commented Jun 19, 2025

Summary

In a form like,

(if condition :bg-white :bg-black)

:bg-black will fail to extract, while :bg-white is extracted as expected. This PR fixes this case, implements more comprehensive candidate filtering, and supersedes a previous PR.

Having recently submitted a PR for handling another special case with Clojure keywords (the presence of : inside of keywords), I thought it best to invert the previous strategy: Instead of handling special cases one by one, consume keywords according to the Clojure reader spec. Consume nothing else, other than strings.

Because of this, this PR is a tad more invasive rather than additive, for which I apologize. The strategy is this:

  • Strings begin with a " and ends with an unescaped ". Consume everything between these delimiters (existing case).
  • Keywords begin with :, and end with whitespace, or one out of a small set of specific reserved characters. Everything else is a valid character in a keyword. Consume everything between these delimiters, and apply the class splitting previously contained in the outer loop. My previous special case handling of : inside of keywords in Extract candidates with variants in Clojure/ClojureScript keywords #18338 is now redundant (and is removed), as this is a more general solution.
  • Discard everything else.

I'm hoping that a strategy that is based on Clojure's definition of strings and keywords will pre-empt any further issues with edge cases.

Closes #18344.

Test plan

  • Added failing tests.
  • cargo test -> failure
  • Added fix
  • cargo test -> success

@eneroth eneroth requested a review from a team as a code owner June 19, 2025 07:37
@thecrypticace thecrypticace self-assigned this Jun 27, 2025
@thecrypticace
Copy link
Contributor

@eneroth Can you see if the checkbox to allow maintainers to edit the PR is checked? I've got a few tweaks I want to make.

Comment on lines 9 to 14
fn is_keyword_terminator(byte: u8) -> bool {
matches!(
byte,
b'"' | b';' | b'@' | b'^' | b'`' | b'~' | b'(' | b')' | b'[' | b']' | b'{' | b'}' | b'\\'
) || byte.is_ascii_whitespace()
}
Copy link
Contributor

@thecrypticace thecrypticace Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn is_keyword_terminator(byte: u8) -> bool {
matches!(
byte,
b'"' | b';' | b'@' | b'^' | b'`' | b'~' | b'(' | b')' | b'[' | b']' | b'{' | b'}' | b'\\'
) || byte.is_ascii_whitespace()
}
fn is_keyword_character(byte: u8) -> bool {
matches!(
byte,
b'+' | b'-' | b'/' | b'*' | b'_' | b'#' | b'.' | b':' | b'?'
) | byte.is_ascii_alphanumeric()
}

I'd go for a positive check versus a negative one. Using | instead of || produces better, branchless assembly too: https://godbolt.org/z/94WqeEz1n

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, but this excludes a lot of characters that are valid in keywords. Are you explicitly selecting the ones valid in Tailwind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the assembly inspector is cool!

@thecrypticace
Copy link
Contributor

Also could use a changelog entry

In order to pre-empt any further problems from Clojure keywords, this commit inverts the previous logic: Instead of consuming everything and handling special cases, consume characters as defined by the Clojure keyword specification, and nothing else.

In addition, leave string consumption as is, but—again—drop the `"`s, in order to align with the strategy of throwing away everything that is not the content of a string or a keyword, including delimiters (`"`, `:` respectively).
@eneroth eneroth requested a review from thecrypticace June 28, 2025 06:28
@eneroth
Copy link
Contributor Author

eneroth commented Jun 28, 2025

Sorry @thecrypticace, from what I can see on Github docs, there's supposed to be a checkbox for me to check here:

Screenshot 2025-06-28 at 08 32 05

But I can't see any. In lieu of that, rebased on main and amended with your changes, and added a change log entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trailing ) interferes with candidate extraction from Clojure/Script keywords
2 participants