Skip to content
/ regXwild Public

⏱ Superfast ^Advanced wildcards++? | Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET via Conari (with caching of 0x29 opcodes +optimizations) etc.

License

Notifications You must be signed in to change notification settings

3F/regXwild

Folders and files

NameName
Last commit message
Last commit date
Feb 24, 2021
Jul 18, 2021
Jul 18, 2021
Mar 7, 2021
Mar 7, 2021
Mar 7, 2021
Mar 7, 2021
Feb 9, 2020
Sep 19, 2016
Sep 25, 2016
Mar 7, 2021
Mar 7, 2021
Feb 20, 2021
Jul 18, 2021
Mar 7, 2021
Mar 7, 2021
Mar 7, 2021
Aug 8, 2020
Mar 7, 2021

Repository files navigation

⏱ Superfast ^Advanced wildcards++? *,|,?,^,$,+,#,>,++??,##??,>c in addition to slow regex engines and more.

✔ regex-like quantifiers, amazing meta symbols, and speed...

Unique algorithms that was implemented on native unmanaged C++ but easily accessible in .NET through Conari (recommended due to caching of 0x29 opcodes + related optimizations), and others such as python etc.

Build status release License NuGet package Tests

Build history

Samples regXwild filter n
number = '1271'; number = '????'; 0 - 4
year = '2020'; '##'|'####' 2 | 4
year = '20'; = '##??' 2 | 4
number = 888; number = +??; 1 - 3
Samples regXwild filter
everything is ok ^everything*ok$
systems system?
systems sys###s
A new 'X1' project ^A*'+' pro?ect
professional system pro*system
regXwild in action pro?ect$|open*source+act|^regXwild

Why regXwild ?

It was designed to be faster than just fast for features that usually go beyond the typical wildcards. Seriously, We love regex, I love, You love; 2013 far behind but regXwild still relevant for speed and powerful wildcards-like features, such as ##?? (which means 2 or 4) ...

🔍 Easy to start

Unmanaged native C++ or managed .NET project. It doesn't matter, just use it:

C++

#include <regXwild.h>
using namespace net::r_eg::regXwild;
...
EssRxW rxw;
if(rxw.match(_T("regXwild"), _T("reg?wild"))) {
    // ...
}

C# if Conari

using dynamic l = new ConariX("regXwild.dll");
...
if(l.match<bool>("regXwild", "reg?wild")) {
    // ...
}

🏄 Amazing meta symbols

ESS version (advanced EXT version)

metasymbol meaning
* {0, ~}
| str1 or str2 or ...
? {0, 1}, ??? {0, 3}, ...
^ [str... or [str1...
$ ...str] or ...str1]
+ {1, ~}, +++ {3, ~}, ...
# {1}, ## {2}, ### {3}, ...
> Legacy > (F_LEGACY_ANYSP = 0x008) as [^/]*str | [^/]*$
>c 1.4+ Modern > as [^**c**]*str | [^**c**]*$

EXT version (more simplified than ESS)

metasymbol meaning
* {0, ~}
> as [^/\\]+
| str1 or str2 or ...
? {0, 1}, ??? {0, 3}, ...

🧮 Quantifiers

1.3+ ++??; ##??

regex regXwild n
.* * 0+
.+ + 1+
.? ? 0 | 1
.{1} # 1
.{2} ## 2
.{2, } ++ 2+
.{0, 2} ?? 0 - 2
.{2, 4} ++?? 2 - 4
(?:.{2}|.{4}) ##?? 2 | 4
.{3, 4} +++? 3 - 4
(?:.{1}|.{3}) #?? 1 | 3

and similar ...

Play with our actual Unit-Tests.

🚀 Awesome speed

  • ~2000 times faster when C++.
  • For .NET (including modern .NET Core), Conari provides optional caching of 0x29 opcodes (Calli) and more to get similar to C++ result as possible.

Match result and Replacements

1.4+

EssRxW::MatchResult m;
rxw.match
(
    _T("number = '8888'; //TODO: up"),
    _T("'+'"),
    EssRxW::EngineOptions::F_MATCH_RESULT,
    &m
);
//m.start = 9
//m.end = 15
...
input.replace(m.start, m.end - m.start, _T("'9777'"));
tstring str = _T("year = 2021; dd = 17;");
...
if(rxw.replace(str, _T(" ##;"), _T(" 00;"))) {
    // year = 2021; dd = 00;
}

🍰 Open and Free

Open Source project; MIT License, Enjoy 🎉

License

The MIT License (MIT)

Copyright (c) 2013-2021  Denis Kuzmin <[email protected]> github/3F

[ ☕ Make a donation ]

regXwild contributors: https://github.com/3F/regXwild/graphs/contributors

We're waiting for your awesome contributions!

Speed

Procedure of testing

  • Use the algo subproject as tester of the main algorithms (Release cfg - x32 & x64)
  • In general, calculation is simple and uses average as i = (t2 - t1); (sum(i) / n) where:
    • i - one iteration for searching by filter. Represents the delta of time t2 - t1
    • n - the number of repeats of the matching to get average.

e.g.:

{
    Meter meter;
    int results = 0;

    for(int total = 0; total < average; ++total)
    {
        meter.start();
        for(int i = 0; i < iterations; ++i)
        {
            if((alg.*method)(data, filter)) {
                //...
            }
        }
        results += meter.delta();
    }

    TRACE((results / average) << "ms");
}

for regex results it also prepares additional basic_regex from filter, but of course, only one for all iterations:

meter.start();

auto rfilter = tregex(
    filter,
    regex_constants::icase | regex_constants::optimize
);

results += meter.delta();
...

Please note:

  • +icase means ignore case sensitivity when matching the filter(pattern) within the searched string, i.e. ignoreCase = true. Without this, everything will be much faster of course. That is, icase always adds complexity.
  • Below, MultiByte can be faster than Unicode (for the same platform and the same way of module use) but it depends on specific architecture and can be about ~2 times faster when native C++, and about ~4 times faster when .NET + Conari and related.
  • The results below can be different on different machines. You need only look at the difference (in milliseconds) between algorithms for a specific target.
  • To calculate the data, as in the table below, you need execute algo.exe

Sample of speed for Unicode

340 Unicode Symbols and 10^4 iterations (340 x 10000); Filter: L"nime**haru*02*Magica"

algorithms (see impl. from algo) +icase [x32] +icase [x64]
Find + Find ~58ms ~44ms
Iterator + Find ~57ms ~46ms
Getline + Find ~59ms ~54ms
Iterator + Substr ~165ms ~132ms
Iterator + Iterator ~136ms ~118ms
main :: based on Iterator + Find ~53ms ~45ms
​ ​
Final algorithm - EXT version: ~50ms ~26ms
Final algorithm - ESS version: ~50ms ~27ms
​ ​
regexp-c++11(regex_search) ~59309ms ~53334ms
regexp-c++11(only as ^match$ like a '==') ~12ms ~5ms
regexp-c++11(regex_match with endings .*) ~59503ms ~53817ms

ESS vs EXT

350 Unicode Symbols and 10^4 iterations (350 x 10000);

Operation (+icase) EXT [x32] ESS [x32] EXT [x64] ESS [x64]
ANY ~54ms ~55ms ~32ms ~34ms
ANYSP ~60ms ~59ms ~37ms ~38ms
ONE ~56ms ~56ms ~33ms ~35ms
SPLIT ~92ms ~94ms ~58ms ~63ms
BEGIN --- ~38ms --- ~19ms
END --- ~39ms --- ~21ms
MORE --- ~44ms --- ~23ms
SINGLE --- ~43ms --- ~22ms

For .NET users through Conari engine:

Same test Data & Filter: 10^4 iterations

Release cfg; x32 or x64 regXwild (Unicode)

Attention: For more speed you need upgrading to Conari 1.3 or higher !

algorithms (see impl. from snet) +icase [x32] +icase [x64]
regXwild via Conari v1.2 (Lambda) - ESS ~1032ms ~1418ms x
regXwild via Conari v1.2 (DLR) - ESS ~1238ms ~1609ms x
regXwild via Conari v1.2 (Lambda) - EXT ~1117ms ~1457ms x
regXwild via Conari v1.2 (DLR) - EXT ~1246ms ~1601ms x
​ ​
regXwild via Conari v1.3 (Lambda) - ESS ~58ms ~42ms <<
regXwild via Conari v1.3 (DLR) - ESS ~218ms ~234ms
regXwild via Conari v1.3 (Lambda) - EXT ~54ms ~35ms <<
regXwild via Conari v1.3 (DLR) - EXT ~214ms ~226ms
​ ​
.NET Regex engine [Compiled] ~38310ms ~37242ms
.NET Regex engine [Compiled]{only ^match$} < 1ms ~3ms
.NET Regex engine ~31565ms ~30975ms
.NET Regex engine {only ^match$} < 1ms ~1ms

How to get regXwild

regXwild v1.1+ can also be installed through NuGet same for both unmanaged and managed projects.

For .NET it will put x32 & x64 regXwild into $(TargetDir). Use it with your .net modules through Conari and so on.

x64 + x32 Unicode + MultiByte modules;

Please note: Modern regXwild packages will no longer be distributed together with Conari. Please consider to use it separately, Conari nuget packages.