-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subroutines breaking capture tokenizing inside of referenced capture group #164
Comments
Workaround for microsoft/vscode-textmate#164 and similar issues
) More work towards #16 If we wanted to capture the `{` `}` delimiters with some scopes, I think we might run into microsoft/vscode-textmate#164 & related issues. But for now we aren't highlighting them so it's fine. --------- Co-authored-by: RedCMD <[email protected]>
The above line is a nice concise description, but the meaning is subtle and probably not obvious to all readers, since...
Also, it might help to describe the behavior outside of the context of TM grammars, and just focus on the subpattern results (and what they should be). Context: Unlike the subroutines from PCRE/Perl/Regex+ (which are arguably easier to reason about and more useful), Oniguruma subroutines replace the captured values of groups they reference (and captures created within the contents of groups they reference) if the subroutine occurs to the right of the referenced group. In other words, Thus you can think of any captures formed directly/indirectly by any number of subroutines as sharing the capture slots of the original capturing groups. And whichever capture (that's part of a set with subroutine/s) that last participated in the match overwrites the captured value in the shared slots. (In fact it gets hairier than this in edge cases with references to indirectly created duplicate named groups, but let's leave that aside.) @RedCMD, the behavior you're describing would make perfect sense (and not be a bug) if the more-recently-participating subroutine always overwrote the captured values for all captures within its contents. E.g., when But, it sounds like you're saying that although subroutines should continue to overwrite values for captures within their contents that participate in the match (you didn't state that part, but it is required to be correct), captures that don't participate via the subroutine match should not overwrite previously captured values from captures that did participate. Let me show an example of an expected match result in JS I just tested via the |
When trying to call a subroutine on a capture group via
\\g<1>
.The call will remove all the previous tokens from capture groups that aren't rechecked in the subroutine.
Create a syntax highlighting extension with this code
Expected outcome is that it will highlight all text in the format
[abcd]-[abcd]
Like so:
data:image/s3,"s3://crabby-images/e1c81/e1c817c81b6b7ec3cdf8f782ee02f9b2655b4186" alt="image"
But instead all tokens connected to capture groups that don't get rematched against (and fail) in the subroutine call get purged.
data:image/s3,"s3://crabby-images/8b3c9/8b3c9ccf1832cc42c4ec7dac1c50e82f3a4324a8" alt="image"
(capture groups 2 to 5)
The text was updated successfully, but these errors were encountered: