AsciiDoc Parsing Lab

This project is a laboratory for researching and developing a formal grammar (i.e., grammar formalism) for the AsciiDoc Language. The source code in this repository is highly experimental and incomplete. It should not be considered a standalone AsciiDoc processor implementation. Rather, it’s meant to serve as a reference for proofs. The grammars and grammar-related helpers formulated in this repository will be contributed to the AsciiDoc Language Specification and TCK.

What’s this about?

The development of the AsciiDoc Language Specification is now in full swing. The center piece of this specification is the normative definition of the AsciiDoc Language.

Up until now, the AsciiDoc syntax rules have only been informally described through pre-spec implementation code (Asciidoctor) and user-oriented documentation. Developing a specification for the AsciiDoc Language necessitates formalizing the syntax into a grammar.

A formal grammar describes the sequences of characters (i.e., markup) that are valid according to the syntax using a set of rules. Establishing a formal grammar is a major step forward for the AsciiDoc Language and its specification. It will help root out well-known inconsistencies, ambiguities, and idiosyncrasies. However, bridging this gap while retaining reasonable compatibility is a major challenge of the specification that requires substantial and open-ended research, hence the need for this project.

This lab is primarily focused on exploring a PEG grammar for AsciiDoc.

How do I run it?

This repository is structured as a Node.js project with a Mocha test suite. Thus, in order to work with it, you first need to have Node.js installed.

The best way to install Node.js is to use nvm (Node Version Manager).

$ nvm install 20

Once Node.js 20 is installed, switch to it:

$ nvm use

Next, install the dependencies of the project using npm:

$ npm i

At the core of this repository is a collection of parsers for AsciiDoc. The parsers are generated from grammar files located in the grammar folder. The grammars, which end in .pegjs, are written for peggy. peggy is the parser generator used by this project to the generate the parsers.

The code in this repository is intended to be run by way of the test suite. But in order to run the tests, you first need to use the npm script gen to generate the parsers.

$ npm run gen

Now you can run the tests using the npm test script:

$ npm t

Most of the tests are data-driven. These tests are located in the test/tests folder. Each test consists of at least an input file (ending in -input.adoc) and an output file (ending in -output.json). The input file (i.e., the test file) is an AsciiDoc file. The output file is the expected ASG that should be produced from it by a compliance AsciiDoc processor. Some tests also have a configuration file that ends in -config.json).

These data-driven tests are a blueprint of the tests that will be included in the AsciiDoc TCK. Once the tests are contributed to the AsciiDoc TCK, this test suite will be updated to use the AsciiDoc TCK directly.

Unmapped syntax

Below is a rough list of the AsciiDoc syntax that has not yet been mapped, or not mapped fully, in the grammar. There’s likely more syntax and edge cases to cover, so this is only what we know to be outstanding.

open block
quote/verse block
pass/stem block
thematic and page breaks
intrinsic document attributes
attribute entry with multiline value
attribute entry with value enclosed in pass macro
revision info line
tables
ifeval directive
variable-length delimiter lines and block nesting in preprocessor
line and block comments (likely in preprocessor)
inline replacements
backslash escaping for blocks
more thorough backslash escaping for inlines (need to consider all ASCII symbols)
lines, tags, and leveloffset attributes on include directive
optional option on include directive
automatic ID generation for headings
anchor on dlist term
anchor on list item
inline double and single quotes
inline image macro
inline footnote macro
inline UI macros (btn, kbd, menu)
inline xref macro
inline index term macro and shorthands
block video and audio macros
block toc macro
callouts in verbatim blocks
use Unicode Alpha property when matching letters in block grammar
…

In general, the line preprocessor is woefully incomplete. We’ve been focusing on sorting out the block grammar first. Then we’ll go back and map it correctly in the line preprocessor (or fold the two grammars into one).

Copyright and License

Use of this software is granted under the terms of the Eclipse Public License v 2.0 (EPL-2.0) License.

Trademarks

AsciiDoc® and AsciiDoc Language™ are trademarks of the Eclipse Foundation, Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 416 Commits
.github/workflows		.github/workflows
bin		bin
grammar		grammar
lib		lib
test		test
.eslintrc		.eslintrc
.gitignore		.gitignore
LICENSE		LICENSE
README.adoc		README.adoc
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AsciiDoc Parsing Lab

What’s this about?

How do I run it?

Unmapped syntax

Copyright and License

Trademarks

About

Releases

Packages

Languages

License

opendevise/asciidoc-parsing-lab

Folders and files

Latest commit

History

Repository files navigation

AsciiDoc Parsing Lab

What’s this about?

How do I run it?

Unmapped syntax

Copyright and License

Trademarks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages