-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support capture groups with the RE/flex regex matcher? #95
Comments
Thank you for your feedback! Note that Flex does not support group captures and backreferences. I have group capture on the list of things to add to RE/flex. The RE/flex matcher does identify which pattern among alternations matched, kind of like a "global group capture". For example There are a few ways to support group captures in POSIX matching, which is a more recent research topic. It is not trivial, because POSIX DFAs translated from regex patterns do not encode position information of the original regex, so capturing parenthesis get lost. My student and I researched alternative ways to implement capturing groups and we came up with a new method called staDFA, which we compared to TDFA. Eventually I will use one of these methods and will work on this soon. |
Would it be possible to more easily add capture groups without posix support for those of us who do not need it? Or even just in ugrep? |
This needs some clarification. With the PCRE matcher in RE/flex you can use capturing groups. Use named captures
IMHO there is not a significant need (or any need whatsoever) to use group captures in lexical analyzers. Same for ugrep. You can use group captures (both numeric and named) with option |
If you're replying to me, I mentioned that I do not want to use PCRE to get captures. I opened this issue in the hopes to encourage native support for captures. I agree with your view that lexical analyzers do not need captures, but most of my use cases are not in lexical analyzers, so I need captures. |
Hello,
First off, I'm a huge fan of RE-flex for my projects and ugrep is by far my most valuable and most used tool at my workplace. I deal with a lot of text.
Out of all that usage, I consistently miss the capture group feature. I know it is supported with other regex libraries that RE-flex (and therefore ugrep) can use, but that is not the same as RE-flex supporting it directly. I love ugrep and RE-flex's speed and features, but it always hurts me that I have to compile with PCRE2 mode (-P) to get capture groups with ugrep and use Boost.Regex when directly using RE-flex. They're such a common thing to need--why does RE-flex not support them? I'd like to use it and only it so that I do not have to bother with Boost or PCRE2, and also for the performance benefits.
I know POSIX and other compatibility things are at play, but I don't see how supporting capture groups would violate them. The docs say that RE-flex supports the lazy quantifiers, which is not part of POSIX, and it also supports non-capturing groups, but not capture groups. Why is this? It seems like it would be easy to add support for capturing groups considering that non-capturing groups are supported. It makes me think there is some sort of design decision that has already been made which does not allow them, but I don't understand enough of the library to know what it is.
Thanks in advance for your patience if I am making an incorrect assumption or unaware of some functionality that would do this. I'm also sorry if this has already been addressed in another issue or in the docs, but I could not find it either here, ugrep's github project, or the docs.
The text was updated successfully, but these errors were encountered: