Parser

Parser inside the Rocket.chat

While digging into the Rocket.Chat codebase, I found that all message parsing logic lives inside the packages/message-parser folder. This package is responsible for converting raw message text into a structured AST (Abstract Syntax Tree), which is later used by the UI to render formatting like bold, italic, emojis, etc.

At a high level, this is how I understand the flow:

Raw text  →  Parser  →  AST  →  UI rendering

Here, the parser part is handled using PeggyJS.

packages/message-parser/src/grammar.pegjs

This file defines the grammar rules using PeggyJS. It describes how the parser should recognise patterns in text, such as:

italic
bold
emoji, etc...

The grammar does two main things:

Matches patterns in the input text (for example, text surrounded by *)
Calls JavaScript actions when a pattern matches

Those JavaScript actions are what actually build the AST.

Like example:

*hello*

The grammar matches:

* → start
hello → content
* → end

When this pattern matches, it calls a helper function to create a BOLD AST node.

Digging Deeper: The "Secret" Global State

Okay, so I looked closer at grammar.pegjs and found something... interesting. I was expecting just simple pattern matching, but I saw this block at the top of the file:

let skipBold = false;
let skipItalic = false;
let skipStrikethrough = false;
// ... and more skips

Wait, global variables in a parser? :\

It turns out PeggyJS is being used in a stateful way. When the parser enters a Bold block, it sets skipBold = true.

Why? To prevent "Bold inside Bold". If I type **bold **bold** bold**, the parser needs to know "I am already inside a bold block, so don't start another one".

The rule looks kind of like this:

MaybeBold = 
  // 1. Check if we are allowed to parse bold
  & { return !skipBold; } 
  // 2. Set the flag to TRUE (We are entering bold!)
  & { skipBold = true; return true; } 
  // 3. Actually parse the content
  (
     text:Bold { 
        skipBold = false; // 4. Reset flag when done
        return text; 
     }
  )

The Catch: This "hack" makes the grammar context-sensitive. Code parsers usually love "Memoization" (caching results so they don't do work twice). But because the result of MaybeBold depends on this invisible skipBold variable, the parser can't easily cache things. It often has to re-parse text multiple times. This is likely a big performance bottleneck! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser

Parser inside the Rocket.chat

Digging Deeper: The "Secret" Global State

Uh oh!

Uh oh!

Clone this wiki locally