Skip to content

rhaeguard/rgx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

460d7fd · Oct 13, 2023

History

61 Commits
Sep 3, 2023
Aug 21, 2023
Aug 20, 2023
Oct 13, 2023
Sep 18, 2023
Sep 23, 2023
Sep 22, 2023
Aug 17, 2023
Sep 23, 2023
Sep 23, 2023
Sep 22, 2023
Oct 3, 2023
Oct 3, 2023
Sep 30, 2023

Repository files navigation

rgx

A very simple regex engine written in go. This library is experimental, use it at your own risk!

read the article.

to add the dependency:

go get github.com/rhaeguard/rgx

how to use:

import "github.com/rhaeguard/rgx"

pattern, err := rgx.Compile(regexString)
if err != nil {
	// error handling
}
results := pattern.FindMatches(content)

if results.Matches {
	groupMatchString := results.Groups["group-name"]
}

todo

  • ^ beginning of the string
  • $ end of the string
  • . any single character/wildcard
  • bracket notation
    • [ ] bracket notation/ranges
    • [^ ] bracket negation notation
    • better handling of the bracket expressions: e.g., [ab-exy12]
    • special characters in the bracket
      • support escape character
  • quantifiers
    • * none or more times
    • + one or more times
    • ? optional
    • {m,n} more than or equal to m and less than equal to n times
  • capturing group
    • ( ) capturing group or subexpression
    • \n backreference, e.g, (dog)\1 where n is in [0, 9]
    • \k<name> named backreference, e.g, (?<animal>dog)\k<animal>
    • extracting the string that matches with the regex
  • \ escape character
    • support special characters - context dependant
  • better error handling in the API
  • ability to work on multi-line strings (tested on Alice in Wonderland text corpus)
    • . should not match the newline - \n
    • $ should match the newline - \n
    • multiple full matches

notes

  • \ escape turns any next character into a literal, no special combinations such as \d for digits, \b for backspace, etc. are allowed
  • numeric groups \n only support single digit references, so \10 will be interpreted as the first capture group followed by a literal 0

credits