Help with 'Token ___ can never be matched.' error? #765

martaver · 2022-11-08T18:34:57Z

martaver
Nov 8, 2022

Hi guys,

Been trying to learn langium by replicating mermaid syntax for sequence diagrams...

I can't figure out what I need to do to fix this grammar:

grammar ServiceMessages

entry Model: ( services+=Service | messages+=Message )*;

terminal ID: /[_a-zA-Z\-][\w_]*/;

Service: 'service' name=ID;

Message: from=[Service:ID] '-->' to=[Service:ID] ':' message=MESSAGE;

terminal MESSAGE: /[^\n\r]/;

When compiling I get:

Error: Errors detected in definition of Lexer:
Token: ->service<- can never be matched.
Because it appears AFTER the Token Type ->MESSAGE<-in the lexer's definition.
See https://chevrotain.io/docs/guide/resolving_lexer_errors.html#UNREACHABLE-----------------------
Token: ->--><- can never be matched.
Because it appears AFTER the Token Type ->MESSAGE<-in the lexer's definition.
See https://chevrotain.io/docs/guide/resolving_lexer_errors.html#UNREACHABLE-----------------------
Token: ->:<- can never be matched.
Because it appears AFTER the Token Type ->MESSAGE<-in the lexer's definition.
See https://chevrotain.io/docs/guide/resolving_lexer_errors.html#UNREACHABLE

If I understand Chevrotain's documentation correctly, this error caused because of specificity issues. Terminals are evaluated from top to bottom, and if a 'higher' terminal is more general, it will prevent a 'lower' terminal from being reached. So terminals should be defined order of most specific to least specific.

What I don't understand here, is that Chevrotain is failing saying that the literal keywords are declared AFTER the MESSAGE token type in its lexer definition. But, there doesn't seem to be any way for me to order the grammar that make a difference to this. Even if I replace the literal keywords with their own terminals and declare them first in my grammar, I get the same issues.

Shouldn't literal keywords always end up at the top of Chevrotain's grammar?

The problematic terminal, MESSAGE, is meant to just capture the remaining text on the line (up to \n\r)

Can someone tell me whether I'm doing something wrong or whether this is a bug?

Answered by msujew

Nov 8, 2022

Hey @martaver,

in theory it's a feature, in practice it's a bug. We perform an optimization where we put whitespace terminals in the front of the lexer array, as that improves performance, see here. However the way we identify such tokens is by identifying whether they are able to parse a single whitespace character (' ').

You can work around this by either overriding the token builder and move the token manually to the back of the array or modify the terminal regex in a way that it is not able to parse a single whitespace character.

View full answer

msujew · 2022-11-08T20:47:38Z

msujew
Nov 8, 2022
Maintainer

Hey @martaver,

in theory it's a feature, in practice it's a bug. We perform an optimization where we put whitespace terminals in the front of the lexer array, as that improves performance, see here. However the way we identify such tokens is by identifying whether they are able to parse a single whitespace character (' ').

You can work around this by either overriding the token builder and move the token manually to the back of the array or modify the terminal regex in a way that it is not able to parse a single whitespace character.

0 replies

martaver · 2022-11-08T21:22:57Z

martaver
Nov 8, 2022
Author

In this case, I can just ensure the leading character of that Token is always non-whitespace... but I'm sure other developers will encounter this issue too! Might want to think carefully about features that mess with the explicitly declared order of lexer terminals.

Thanks for the quick reply :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with 'Token ___ can never be matched.' error? #765

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Help with 'Token ___ can never be matched.' error? #765

martaver Nov 8, 2022

Replies: 2 comments

msujew Nov 8, 2022 Maintainer

martaver Nov 8, 2022 Author

martaver
Nov 8, 2022

msujew
Nov 8, 2022
Maintainer

martaver
Nov 8, 2022
Author