Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Empty expression" error when using \A inside group #439

Open
alvin55531 opened this issue Nov 14, 2024 · 4 comments
Open

"Empty expression" error when using \A inside group #439

alvin55531 opened this issue Nov 14, 2024 · 4 comments
Labels
question A question that has or needs further clarification

Comments

@alvin55531
Copy link

If I put \A inside a capture group or non-capture group, it'll give an error:
Command:

ugrep -P '(?:\A|.*Testword5)

(I have a file that contains "Testword5" for this test search)

Output:

ugrep: error: error at position 9
(?m)(?:\A|.*Testword5)
         \___empty expression

I've looked through the man page. The only thing that seems relevant is the --empty flag, but using it gives the same results.

@genivia-inc
Copy link
Member

genivia-inc commented Nov 14, 2024

Indeed, \A can only be used as an anchor when followed by some non-empty pattern. For example \A.* will work. Anchors are not boundaries. Boundaries can be used anywhere. Anchors are more restrictive to anchor a (or more typically all) matches

@genivia-inc genivia-inc added the question A question that has or needs further clarification label Nov 14, 2024
@genivia-inc
Copy link
Member

I should add that matching a single \A is not possible. Only the ^ and $ anchors can be used to match without a pattern, but the \A and \Z anchors need a pattern as "context" to assert the begin-of-file match.

@alvin55531
Copy link
Author

alvin55531 commented Nov 14, 2024

Thank you for the clarification!

So the non-empty expression must be inside the capture group/non-capture group with the \A? So it has to be (?:\A.*|.*Testword) and not (?:\A|.*Testword)someotherpattern?

Is this a limitation set by ugrep (perhaps for performance?) or a limitation of the regex engine ugrep uses. I tried the original regex (on Regex101) with PCRE2 and it would match successfully.

@genivia-inc
Copy link
Member

genivia-inc commented Nov 14, 2024

This is to avoid confusion and problems, i.e. when \A does not match anything when it isn't followed by a pattern. The syntax check only applies to the regex syntax, not to its meaning. Regex can be arbitrarily complex. An accurate check can only be done in the DFA, but that is not useful to find the location in the regex that caused it and it won't work with option -P (PCRE2) that also does not produce the expected output for a sole \A without a pattern that follows it.

For regex like (\A|aaa)bbb one can also write (\Abbb|aaabbb) which is the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A question that has or needs further clarification
Projects
None yet
Development

No branches or pull requests

2 participants