BuildIt RegEx

This repo contains an implementation of a regular expression library using BuildIt.

We currently support the following types of matches:

full match that checks if the regex exactly matches the text; example code is given in ./samples/sample1.cpp
partial match with binary output with an option to extract the first match (./samples/sample2.cpp)
all partial matches returned as a list of strings (./samples/sample3.cpp); the output of all partial matches is the same as the output of reapeatedly applying the PCRE or RE2 FindAndConsume function that gives non-overlapping leftmost longest matches

We support the following operators and expressions.

Expression	Description
`.`	any character
`[xyz]`, `[^xyz]`	character class
`[a-z]`, `[^a-z]`	character range
`x?`	zero or one `x`
`x+`	one or more `x`
`x*`	zero or more `x`
`(x\|y)`	`x` or `y`
`x{n}`	`x` repeated `n` times
`x{n,m}`	`x` repeated between `n` and `m` times inclusive
`\d`, `\w`, `\s`, `\D`, `\W`	escaped character classes

We have a couple of flag options that affect the way the code is generated:

specifying the number of interleaving parts for partial matches
splitting the code generation on | characters
grouping multiple consecutive states into one
ignore_case to match both upper and lowercase
greedy - set to true to prefer longer partial matches

These options can be set using the RegexOptions struct as shown in ./samples/sample2.cpp.

To compile the code run make from the root directory. To run the sample1 code for example, run ./build/sample1.

Code structure

The main code is in ./src and ./include.
Testing code is in ./test.
Code for measuring performance is in ./benchmarks.

Setting up the benchmarks

Hyperscan

To build Hyperscan follow the steps 2 and 3 from here.
Use one of the scripts in ./benchmarks/hyperscan/tools/hsbench/scripts to create a corpus SQLite database.
Add the regex patterns to a file following this format.
From the hyperscan build directory run build/bin/hsbench -e <pattern_file> -c <corpus.db>. More directions are available here.

RE2

To build RE2 run make in the ./benchmarks/re2/ directory.

To run the timing experiments on the Twain dataset run ./build/preformance in the ./benchmarks directory.

Datasets

Twain

Corpus: Project Gutenberg: Complete Works of Mark Twain
Patterns: available in ./benchmarks/data/twain_patterns.txt; taken from this paper

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
benchmarks		benchmarks
buildit @ 73e3041		buildit @ 73e3041
include		include
samples		samples
src		src
test		test
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BuildIt RegEx

Code structure

Setting up the benchmarks

Hyperscan

RE2

Datasets

Twain

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

BuildIt-lang/buildit_regex

Folders and files

Latest commit

History

Repository files navigation

BuildIt RegEx

Code structure

Setting up the benchmarks

Hyperscan

RE2

Datasets

Twain

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages