-
Notifications
You must be signed in to change notification settings - Fork 0
Parser
While digging into the Rocket.Chat codebase, I found that all message parsing logic lives inside the packages/message-parser folder. This package is responsible for converting raw message text into a structured AST (Abstract Syntax Tree), which is later used by the UI to render formatting like bold, italic, emojis, etc.
At a high level, this is how I understand the flow:
Raw text → Parser → AST → UI rendering
Here, the parser part is handled using PeggyJS.
packages/message-parser/src/grammar.pegjs
This file defines the grammar rules using PeggyJS. It describes how the parser should recognise patterns in text, such as:
- italic
- bold
- emoji, etc...
The grammar does two main things:
- Matches patterns in the input text (for example, text surrounded by *)
- Calls JavaScript actions when a pattern matches
Those JavaScript actions are what actually build the AST.
Like example:
*hello*
The grammar matches:
* → starthello → content* → end
When this pattern matches, it calls a helper function to create a BOLD AST node.
Okay, so I looked closer at grammar.pegjs and found something... interesting.
I was expecting just simple pattern matching, but I saw this block at the top of the file:
let skipBold = false;
let skipItalic = false;
let skipStrikethrough = false;
// ... and more skipsWait, global variables in a parser? :\
It turns out PeggyJS is being used in a stateful way. When the parser enters a Bold block, it sets skipBold = true.
Why? To prevent "Bold inside Bold". If I type **bold **bold** bold**, the parser needs to know "I am already inside a bold block, so don't start another one".
The rule looks kind of like this:
MaybeBold =
// 1. Check if we are allowed to parse bold
& { return !skipBold; }
// 2. Set the flag to TRUE (We are entering bold!)
& { skipBold = true; return true; }
// 3. Actually parse the content
(
text:Bold {
skipBold = false; // 4. Reset flag when done
return text;
}
)The Catch: This "hack" makes the grammar context-sensitive. Code parsers usually love "Memoization" (caching results so they don't do work twice). But because the result of MaybeBold depends on this invisible skipBold variable, the parser can't easily cache things. It often has to re-parse text multiple times. This is likely a big performance bottleneck! :)