The following tables provide comparisons between the following regular expression libraries:
Henry Spencer's regular expression library - this is provided for comparison as a typical non-backtracking implementation.
Philip Hazel's PCRE library.
Machine: Intel Pentium 4 2.8GHz PC.
Compiler: Microsoft Visual C++ version 7.1.
C++ Standard Library: Dinkumware standard library version 313.
OS: Win32.
Boost version: 1.31.0.
PCRE version: 3.9.
As ever care should be taken in interpreting the results, only sensible regular expressions (rather than pathological cases) are given, most are taken from the Boost regex examples, or from the Library of Regular Expressions. In addition, some variation in the relative performance of these libraries can be expected on other machines - as memory access and processor caching effects can be quite large for most finite state machine algorithms.
The following are the average relative scores for all the tests: the perfect regular expression library would score 1, in practice any small number (say less that 4 or 5) is pretty good.
GRETA | GRETA (non-recursive mode) |
Boost | Boost + C++ locale | POSIX | PCRE |
2.31619 | 6.14203 | 2.30668 | 1.94363 | 124.752 | 2.09365 |
For each of the following regular expressions the time taken to find all occurrences of the expression within a long English language text was measured (mtent12.txt from Project Gutenberg, 19Mb).
Expression | GRETA | GRETA (non-recursive mode) |
Boost | Boost + C++ locale | POSIX | PCRE |
Twain |
1 (0.0407s) |
1 (0.0407s) |
4.18 (0.17s) |
4.18 (0.17s) |
135 (5.48s) |
1.37 (0.0557s) |
Huck[[:alpha:]]+ |
1.02 (0.0381s) |
1 (0.0375s) |
4.53 (0.17s) |
4.54 (0.17s) |
166 (6.23s) |
1.34 (0.0501s) |
[[:alpha:]]+ing |
4.3 (4.18s) |
9.93 (9.65s) |
1.15 (1.12s) |
1 (0.972s) |
8.15 (7.92s) |
5.85 (5.69s) |
^[^ ]*?Twain |
6.25 (1.84s) |
20.9 (6.16s) |
1.56 (0.461s) |
1 (0.295s) |
NA | 2.58 (0.761s) |
Tom|Sawyer|Huckleberry|Finn |
6.53 (0.711s) |
11.5 (1.25s) |
2.3 (0.251s) |
1 (0.109s) |
196 (21.4s) |
1.77 (0.193s) |
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
3.88 (0.972s) |
6.48 (1.62s) |
1.66 (0.416s) |
1 (0.251s) |
NA | 2.48 (0.62s) |
For each of the following regular expressions the time taken to find all occurrences of the expression within a medium sized English language text was measured (the first 50K from mtent12.txt).
Expression | GRETA | GRETA (non-recursive mode) |
Boost | Boost + C++ locale | POSIX | PCRE |
Twain |
1 (9.05e-005s) |
1.03 (9.29e-005s) |
4.92 (0.000445s) |
4.92 (0.000445s) |
43.2 (0.00391s) |
3.18 (0.000288s) |
Huck[[:alpha:]]+ |
1 (8.56e-005s) |
1 (8.56e-005s) |
4.97 (0.000425s) |
4.98 (0.000426s) |
2.8 (0.000239s) |
2.2 (0.000188s) |
[[:alpha:]]+ing |
5.29 (0.011s) |
11.8 (0.0244s) |
1.19 (0.00246s) |
1 (0.00207s) |
8.77 (0.0182s) |
6.88 (0.0142s) |
^[^ ]*?Twain |
5.98 (0.00462s) |
20.2 (0.0156s) |
1.54 (0.00119s) |
1 (0.000772s) |
NA | 2.53 (0.00195s) |
Tom|Sawyer|Huckleberry|Finn |
3.42 (0.00207s) |
6.31 (0.00383s) |
1.71 (0.00104s) |
1 (0.000606s) |
81.5 (0.0494s) |
1.96 (0.00119s) |
(Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
1.97 (0.00266s) |
3.77 (0.00509s) |
1.38 (0.00186s) |
1 (0.00135s) |
297 (0.401s) |
1.77 (0.00238s) |
For each of the following regular expressions the time taken to find all occurrences of the expression within the C++ source file boost/crc.hpp was measured.
Expression | GRETA | GRETA (non-recursive mode) |
Boost | Boost + C++ locale | POSIX | PCRE |
^(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([
]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>[[:space:]]*)?(\{|:[^;\{()]*\{) |
6.67 (0.00147s) |
36.9 (0.00813s) |
1.03 (0.000227s) |
1 (0.00022s) |
557 (0.123s) |
2.57 (0.000566s) |
(^[
]*#(?:[^\\\n]|\\[^\n_[:punct:][:alnum:]]*[\n[:punct:][:word:]])*)|(//[^\n]*|/\*.*?\*/)|\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\>|('(?:[^\\']|\\.)*'|"(?:[^\\"]|\\.)*")|\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned|using|virtual|void|volatile|wchar_t|while)\> |
1 (0.00555s) |
3.32 (0.0185s) |
2.53 (0.0141s) |
1.94 (0.0108s) |
NA | 3.38 (0.0188s) |
^[ ]*#[ ]*include[ ]+("[^"]+"|<[^>]+>) |
4.77 (0.00156s) |
24.8 (0.00814s) |
1.13 (0.000372s) |
1 (0.000328s) |
120 (0.0394s) |
1.58 (0.000518s) |
^[ ]*#[ ]*include[ ]+("boost/[^"]+"|<boost/[^>]+>) |
4.72 (0.00154s) |
24.8 (0.00813s) |
1.12 (0.000367s) |
1 (0.000328s) |
143 (0.0469s) |
1.58 (0.000518s) |
For each of the following regular expressions the time taken to find all occurrences of the expression within the html file libs/libraries.htm was measured.
Expression | GRETA | GRETA (non-recursive mode) |
Boost | Boost + C++ locale | POSIX | PCRE |
beman|john|dave |
4.07 (0.00111s) |
7.14 (0.00195s) |
1.75 (0.000479s) |
1 (0.000273s) |
54.3 (0.0149s) |
1.83 (0.000499s) |
<p>.*?</p> |
1 (6.59e-005s) |
1.04 (6.84e-005s) |
4.15 (0.000273s) |
4.23 (0.000279s) |
NA | 4.23 (0.000279s) |
<a[^>]+href=("[^"]*"|[^[:space:]]+)[^>]*> |
1.39 (0.000626s) |
1.83 (0.000821s) |
1.41 (0.000636s) |
1 (0.00045s) |
351 (0.158s) |
1.13 (0.000509s) |
<h[12345678][^>]*>.*?</h[12345678]> |
1 (0.000142s) |
1.21 (0.000171s) |
2.62 (0.000372s) |
1.48 (0.00021s) |
NA | 1.73 (0.000245s) |
<img[^>]+src=("[^"]*"|[^[:space:]]+)[^>]*> |
1 (5.38e-005s) |
1.05 (5.63e-005s) |
5 (0.000269s) |
5.18 (0.000278s) |
604 (0.0325s) |
4.05 (0.000218s) |
<font[^>]+face=("[^"]*"|[^[:space:]]+)[^>]*>.*?</font> |
1 (6.05e-005s) |
1.09 (6.59e-005s) |
4.45 (0.000269s) |
4.69 (0.000284s) |
NA | 3.64 (0.00022s) |
For each of the following regular expressions the time taken to match against the text indicated was measured.
Expression | Text | GRETA | GRETA (non-recursive mode) |
Boost | Boost + C++ locale | POSIX | PCRE |
abc |
abc | 1.32 (2.24e-007s) |
1.86 (3.15e-007s) |
1.25 (2.12e-007s) |
1.24 (2.1e-007s) |
2.98 (5.05e-007s) |
1 (1.7e-007s) |
^([0-9]+)(\-| |$)(.*)$ |
100- this is a line of ftp response which contains a message string | 1.32 (5.91e-007s) |
1.96 (8.78e-007s) |
2.68 (1.2e-006s) |
1.53 (6.88e-007s) |
332 (0.000149s) |
1 (4.49e-007s) |
([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
1234-5678-1234-456 | 1.44 (7.16e-007s) |
2.04 (1.01e-006s) |
3.35 (1.66e-006s) |
2.15 (1.07e-006s) |
31.4 (1.56e-005s) |
1 (4.96e-007s) |
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
[email protected] | 1 (1.18e-006s) |
1.42 (1.68e-006s) |
2.06 (2.44e-006s) |
1.35 (1.6e-006s) |
165 (0.000196s) |
1.06 (1.26e-006s) |
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
[email protected] | 1 (1.09e-006s) |
1.44 (1.57e-006s) |
2.21 (2.4e-006s) |
1.41 (1.53e-006s) |
108 (0.000117s) |
1.04 (1.13e-006s) |
^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
[email protected] | 1 (1.07e-006s) |
1.43 (1.53e-006s) |
2.21 (2.37e-006s) |
1.45 (1.55e-006s) |
123 (0.000132s) |
1.05 (1.13e-006s) |
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
EH10 2QQ | 1 (3.19e-007s) |
1.67 (5.34e-007s) |
1.58 (5.05e-007s) |
1.4 (4.49e-007s) |
10.4 (3.32e-006s) |
1.15 (3.68e-007s) |
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
G1 1AA | 1 (3.29e-007s) |
1.65 (5.44e-007s) |
1.51 (4.96e-007s) |
1.36 (4.49e-007s) |
8.46 (2.79e-006s) |
1.1 (3.63e-007s) |
^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
SW1 1ZZ | 1 (3.25e-007s) |
1.64 (5.34e-007s) |
1.56 (5.05e-007s) |
1.38 (4.49e-007s) |
9.29 (3.02e-006s) |
1.13 (3.68e-007s) |
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
4/1/2001 | 1 (3.44e-007s) |
1.55 (5.34e-007s) |
2.36 (8.12e-007s) |
2.2 (7.55e-007s) |
19.6 (6.72e-006s) |
1.81 (6.21e-007s) |
^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
12/12/2001 | 1.05 (6.59e-007s) |
1.66 (1.05e-006s) |
1.44 (9.07e-007s) |
1.23 (7.73e-007s) |
11.6 (7.34e-006s) |
1 (6.3e-007s) |
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
123 | 1 (5.72e-007s) |
1.59 (9.07e-007s) |
1.6 (9.16e-007s) |
1.49 (8.5e-007s) |
6.14 (3.51e-006s) |
1.22 (6.97e-007s) |
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
+3.14159 | 1 (6.78e-007s) |
1.52 (1.03e-006s) |
1.47 (9.94e-007s) |
1.31 (8.88e-007s) |
10.8 (7.34e-006s) |
1.08 (7.35e-007s) |
^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
-3.14159 | 1 (6.78e-007s) |
1.52 (1.03e-006s) |
1.46 (9.92e-007s) |
1.32 (8.98e-007s) |
10.5 (7.11e-006s) |
1.11 (7.54e-007s) |
Copyright John Maddock April 2003, all rights reserved.