|
Boost.Regexsyntax_option_type |
|
Type syntax_option type is an implementation specific bitmask type that controls how a regular expression string is to be interpreted. For convenience note that all the constants listed here, are also duplicated within the scope of class template basic_regex.
namespace std{ namespace regex_constants{ typedef implementation-specific-bitmask-type syntax_option_type;
// these flags are standardized: static const syntax_option_type normal; static const syntax_option_type ECMAScript = normal; static const syntax_option_type JavaScript = normal; static const syntax_option_type JScript = normal; static const syntax_option_type perl = normal;
static const syntax_option_type basic; static const syntax_option_type sed = basic; static const syntax_option_type extended; static const syntax_option_type awk; static const syntax_option_type grep; static const syntax_option_type egrep; static const syntax_option_type icase; static const syntax_option_type nosubs; static const syntax_option_type optimize; static const syntax_option_type collate; // other boost.regex specific options are listed below
} // namespace regex_constants } // namespace std
The type syntax_option_type
is an implementation specific bitmask
type (17.3.2.1.2). Setting its elements has the effects listed in the table
below, a valid value of type syntax_option_type
will always have
exactly one of the elements normal, basic, extended, awk, grep, egrep, sed,
literal or perl
set.
Note that for convenience all the constants listed here are duplicated within the scope of class template basic_regex, so you can use any of:
boost::regex_constants::constant_name
or
boost::regex::constant_name
or
boost::wregex::constant_name
in an interchangeable manner.
One of the following must always be set for perl regular expressions:
Element | Standardized | Effect when set |
ECMAScript |
Yes |
Specifies that the grammar recognized by the regular expression engine uses its normal semantics: that is the same as that given in the ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1). boost.regex also recognizes all of the perl-compatible (?...) extensions in this mode. |
perl | No | As above. |
normal | No | As above. |
JavaScript | No | As above. |
JScript | No | As above. |
The following options may also be set when using perl-style regular expressions:
Element | Standardized | Effect when set |
icase | Yes |
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case. |
nosubs | Yes |
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure. |
optimize | Yes |
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for Boost.Regex. |
collate | Yes |
Specifies that character ranges of the form "[a-b]" should be locale sensitive. |
newline_alt | No | Specifies that the \n character has the same effect as the alternation operator |. Allows newline separated lists to be used as a list of alternatives. |
no_mod_m | No | Normally Boost.Regex behaves as if the Perl m-modifier is on: so the assertions ^ and $ match after and before embedded newlines respectively, setting this flags is equivalent to prefixing the expression with (?-m). |
no_mod_s | No | Normally whether Boost.Regex will match "." against a newline character is determined by the match flag match_dot_not_newline. Specifying this flag is equivalent to prefixing the expression with (?-s) and therefore causes "." not to match a newline character regardless of whether match_not_dot_newline is set in the match flags. |
mod_s | No | Normally whether Boost.Regex will match "." against a newline character is determined by the match flag match_dot_not_newline. Specifying this flag is equivalent to prefixing the expression with (?s) and therefore causes "." to match a newline character regardless of whether match_not_dot_newline is set in the match flags. |
mod_x | No | Turns on the perl x-modifier: causes unescaped whitespace in the expression to be ignored. |
Exactly one of the following must always be set for POSIX extended regular expressions:
Element | Standardized | Effect when set |
extended | Yes |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX extended regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1). In addition some perl-style escape sequences are supported (The POSIX standard specifies that only "special" characters may be escaped, all other escape sequences result in undefined behavior). |
egrep | Yes |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep when given the -E option in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). That is to say, the same as POSIX extended syntax, but with the newline character acting as an alternation character in addition to "|". |
awk | Yes |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk (FWD.1). That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted. In addition some perl-style escape sequences are supported (actually the awk syntax only requires \a \b \t \v \f \n and \r to be recognised, all other Perl-style escape sequences invoke undefined behavior according to the POSIX standard, but are in fact recognised by Boost.Regex). |
The following options may also be set when using POSIX extended regular expressions:
Element | Standardized | Effect when set |
icase | Yes |
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case. |
nosubs | Yes |
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure. |
optimize | Yes |
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex. |
collate | Yes |
Specifies that character ranges of the form "[a-b]" should be locale sensitive. This bit is on by default for POSIX-Extended regular expressions, but can be unset to force ranges to be compared by code point only. |
newline_alt | No | Specifies that the \n character has the same effect as the alternation operator |. Allows newline separated lists to be used as a list of alternatives. |
no_escape_in_lists | No | When set this makes the escape character ordinary inside lists, so that [\b] would match either '\' or 'b'. This bit is one by default for POSIX-Extended regular expressions, but can be unset to force escapes to be recognised inside lists. |
no_bk_refs | No | When set then backreferences are disabled. This bit is on by default for POSIX-Extended regular expressions, but can be unset to support for backreferences on. |
Exactly one of the following must always be set for POSIX basic regular expressions:
Element | Standardized | Effect When Set |
basic | Yes |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1). |
sed | No | As Above. |
grep | Yes |
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). That is to say, the same as POSIX basic syntax, but with the newline character acting as an alternation character; the expression is treated as a newline separated list of alternatives. |
emacs | No | Specifies that the grammar recognised is the superset of the POSIX-Basic syntax used by the emacs program. |
The following options may also be set when using POSIX basic regular expressions:
Element | Standardized | Effect when set |
icase | Yes |
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case. |
nosubs | Yes |
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure. |
optimize | Yes |
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex. |
collate | Yes |
Specifies that character ranges of the form "[a-b]" should be locale sensitive. This bit is on by default for POSIX-Basic regular expressions, but can be unset to force ranges to be compared by code point only. |
newline_alt | No | Specifies that the \n character has the same effect as the alternation operator |. Allows newline separated lists to be used as a list of alternatives. This bit is already set, if you use the grep option. |
no_char_classes | No | When set then character classes such as [[:alnum:]] are not allowed. |
no_escape_in_lists | No | When set this makes the escape character ordinary inside lists, so that [\b] would match either '\' or 'b'. This bit is one by default for POSIX-basic regular expressions, but can be unset to force escapes to be recognised inside lists. |
no_intervals | No | When set then bounded repeats such as a{2,3} are not permitted. |
bk_plus_qm | No | When set then \? acts as a zero-or-one repeat operator, and \+ acts as a one-or-more repeat operator. |
bk_vbar | No | When set then \| acts as the alternation operator. |
The following must always be set to interpret the expression as a string literal:
Element | Standardized | Effect when set |
literal | Yes | Treat the string as a literal (no special characters). |
The following options may also be combined with the literal flag:
Element | Standardized | Effect when set |
icase | Yes |
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case. |
optimize | Yes |
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex. |
Revised 23 June 2004
© Copyright John Maddock 1998- 2004
Use, modification and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)