-
Hi guys, Been trying to learn langium by replicating mermaid syntax for sequence diagrams... I can't figure out what I need to do to fix this grammar:
When compiling I get:
If I understand Chevrotain's documentation correctly, this error caused because of specificity issues. Terminals are evaluated from top to bottom, and if a 'higher' terminal is more general, it will prevent a 'lower' terminal from being reached. So terminals should be defined order of most specific to least specific. What I don't understand here, is that Chevrotain is failing saying that the literal keywords are declared AFTER the MESSAGE token type in its lexer definition. But, there doesn't seem to be any way for me to order the grammar that make a difference to this. Even if I replace the literal keywords with their own terminals and declare them first in my grammar, I get the same issues. Shouldn't literal keywords always end up at the top of Chevrotain's grammar? The problematic terminal, MESSAGE, is meant to just capture the remaining text on the line (up to \n\r) Can someone tell me whether I'm doing something wrong or whether this is a bug? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hey @martaver, in theory it's a feature, in practice it's a bug. We perform an optimization where we put whitespace terminals in the front of the lexer array, as that improves performance, see here. However the way we identify such tokens is by identifying whether they are able to parse a single whitespace character ( You can work around this by either overriding the token builder and move the token manually to the back of the array or modify the terminal regex in a way that it is not able to parse a single whitespace character. |
Beta Was this translation helpful? Give feedback.
-
In this case, I can just ensure the leading character of that Token is always non-whitespace... but I'm sure other developers will encounter this issue too! Might want to think carefully about features that mess with the explicitly declared order of lexer terminals. Thanks for the quick reply :) |
Beta Was this translation helpful? Give feedback.
Hey @martaver,
in theory it's a feature, in practice it's a bug. We perform an optimization where we put whitespace terminals in the front of the lexer array, as that improves performance, see here. However the way we identify such tokens is by identifying whether they are able to parse a single whitespace character (
' '
).You can work around this by either overriding the token builder and move the token manually to the back of the array or modify the terminal regex in a way that it is not able to parse a single whitespace character.