Boost.Regex: syntax_option

Boost.Regex

syntax_option_type

Options for Perl Regular Expressions
Options for POSIX Extended Regular Expressions
Options for POSIX Basic Regular Expressions
Options for String Literals

Synopsis

Type syntax_option type is an implementation specific bitmask type that controls how a regular expression string is to be interpreted. For convenience note that all the constants listed here, are also duplicated within the scope of class template basic_regex.

namespace std{ namespace regex_constants{

typedef implementation-specific-bitmask-type syntax_option_type;

// these flags are standardized:
static const syntax_option_type normal;
static const syntax_option_type ECMAScript = normal;
static const syntax_option_type JavaScript = normal;
static const syntax_option_type JScript = normal;
static const syntax_option_type perl = normal;
static const syntax_option_type basic;
static const syntax_option_type sed = basic;
static const syntax_option_type extended;
static const syntax_option_type awk;
static const syntax_option_type grep;
static const syntax_option_type egrep;
static const syntax_option_type icase;
static const syntax_option_type nosubs;
static const syntax_option_type optimize;
static const syntax_option_type collate;
// other boost.regex specific options are listed below

} // namespace regex_constants
} // namespace std

Description

The type syntax_option_type is an implementation specific bitmask type (17.3.2.1.2). Setting its elements has the effects listed in the table below, a valid value of type syntax_option_type will always have exactly one of the elements normal, basic, extended, awk, grep, egrep, sed, literal or perl set.

Note that for convenience all the constants listed here are duplicated within the scope of class template basic_regex, so you can use any of:

boost::regex_constants::constant_name

boost::regex::constant_name

boost::wregex::constant_name

in an interchangeable manner.

Options for Perl Regular Expressions:

One of the following must always be set for perl regular expressions:

Element Standardized Effect when set

ECMAScript
Yes
Specifies that the grammar recognized by the regular expression engine uses its normal semantics: that is the same as that given in the ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1).

boost.regex also recognizes all of the perl-compatible (?...) extensions in this mode.

perl No As above.

normal No As above.

JavaScript No As above.

JScript No As above.

The following options may also be set when using perl-style regular expressions:

Element Standardized Effect when set

icase Yes
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.

nosubs Yes
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure.

optimize Yes
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for Boost.Regex.

collate Yes
Specifies that character ranges of the form "[a-b]" should be locale sensitive.

newline_alt No Specifies that the \n character has the same effect as the alternation operator |. Allows newline separated lists to be used as a list of alternatives.

no_mod_m No Normally Boost.Regex behaves as if the Perl m-modifier is on: so the assertions ^ and $ match after and before embedded newlines respectively, setting this flags is equivalent to prefixing the expression with (?-m).

no_mod_s No Normally whether Boost.Regex will match "." against a newline character is determined by the match flag match_dot_not_newline. Specifying this flag is equivalent to prefixing the expression with (?-s) and therefore causes "." not to match a newline character regardless of whether match_not_dot_newline is set in the match flags.

mod_s No Normally whether Boost.Regex will match "." against a newline character is determined by the match flag match_dot_not_newline. Specifying this flag is equivalent to prefixing the expression with (?s) and therefore causes "." to match a newline character regardless of whether match_not_dot_newline is set in the match flags.

mod_x No Turns on the perl x-modifier: causes unescaped whitespace in the expression to be ignored.

Options for POSIX Extended Regular Expressions:

Exactly one of the following must always be set for POSIX extended regular expressions:

Element Standardized Effect when set

extended Yes
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX extended regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).

In addition some perl-style escape sequences are supported (The POSIX standard specifies that only "special" characters may be escaped, all other escape sequences result in undefined behavior).

egrep Yes
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep when given the -E option in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1).

That is to say, the same as POSIX extended syntax, but with the newline character acting as an alternation character in addition to "|".

awk Yes
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk (FWD.1).

That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted.

In addition some perl-style escape sequences are supported (actually the awk syntax only requires \a \b \t \v \f \n and \r to be recognised, all other Perl-style escape sequences invoke undefined behavior according to the POSIX standard, but are in fact recognised by Boost.Regex).

The following options may also be set when using POSIX extended regular expressions:

Element Standardized Effect when set

icase Yes
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.

nosubs Yes
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure.

optimize Yes
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex.

collate Yes
Specifies that character ranges of the form "[a-b]" should be locale sensitive. This bit is on by default for POSIX-Extended regular expressions, but can be unset to force ranges to be compared by code point only.

newline_alt No Specifies that the \n character has the same effect as the alternation operator |. Allows newline separated lists to be used as a list of alternatives.

no_escape_in_lists No When set this makes the escape character ordinary inside lists, so that [\b] would match either '\' or 'b'. This bit is one by default for POSIX-Extended regular expressions, but can be unset to force escapes to be recognised inside lists.

no_bk_refs No When set then backreferences are disabled. This bit is on by default for POSIX-Extended regular expressions, but can be unset to support for backreferences on.

Options for POSIX Basic Regular Expressions:

Exactly one of the following must always be set for POSIX basic regular expressions:

Element Standardized Effect When Set

basic Yes
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX basic regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1).

sed No As Above.

grep Yes
Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1).

That is to say, the same as POSIX basic syntax, but with the newline character acting as an alternation character; the expression is treated as a newline separated list of alternatives.

emacs No Specifies that the grammar recognised is the superset of the POSIX-Basic syntax used by the emacs program.

The following options may also be set when using POSIX basic regular expressions:

Element Standardized Effect when set

icase Yes
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.

nosubs Yes
Specifies that when a regular expression is matched against a character container sequence, then no sub-expression matches are to be stored in the supplied match_results structure.

optimize Yes
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex.

collate Yes
Specifies that character ranges of the form "[a-b]" should be locale sensitive. This bit is on by default for POSIX-Basic regular expressions, but can be unset to force ranges to be compared by code point only.

newline_alt No Specifies that the \n character has the same effect as the alternation operator |. Allows newline separated lists to be used as a list of alternatives. This bit is already set, if you use the grep option.

no_char_classes No When set then character classes such as [[:alnum:]] are not allowed.

no_escape_in_lists No When set this makes the escape character ordinary inside lists, so that [\b] would match either '\' or 'b'. This bit is one by default for POSIX-basic regular expressions, but can be unset to force escapes to be recognised inside lists.

no_intervals No When set then bounded repeats such as a{2,3} are not permitted.

bk_plus_qm No When set then \? acts as a zero-or-one repeat operator, and \+ acts as a one-or-more repeat operator.

bk_vbar No When set then \| acts as the alternation operator.

Options for Literal Strings:

The following must always be set to interpret the expression as a string literal:

Element Standardized Effect when set

literal Yes Treat the string as a literal (no special characters).

The following options may also be combined with the literal flag:

Element Standardized Effect when set

icase Yes
Specifies that matching of regular expressions against a character container sequence shall be performed without regard to case.

optimize Yes
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output. This currently has no effect for boost.regex.

Revised 23 June 2004

Use, modification and distribution are subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

Element	Standardized	Effect when set
ECMAScript	Yes	Specifies that the grammar recognized by the regular expression engine uses its normal semantics: that is the same as that given in the ECMA-262, ECMAScript Language Specification, Chapter 15 part 10, RegExp (Regular Expression) Objects (FWD.1). boost.regex also recognizes all of the perl-compatible (?...) extensions in this mode.
perl	No	As above.
normal	No	As above.
JavaScript	No	As above.
JScript	No	As above.

Element	Standardized	Effect when set
extended	Yes	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX extended regular expressions in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Base Definitions and Headers, Section 9, Regular Expressions (FWD.1). In addition some perl-style escape sequences are supported (The POSIX standard specifies that only "special" characters may be escaped, all other escape sequences result in undefined behavior).
egrep	Yes	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility grep when given the -E option in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, Utilities, grep (FWD.1). That is to say, the same as POSIX extended syntax, but with the newline character acting as an alternation character in addition to "\|".
awk	Yes	Specifies that the grammar recognized by the regular expression engine is the same as that used by POSIX utility awk in IEEE Std 1003.1-2001, Portable Operating System Interface (POSIX ), Shells and Utilities, Section 4, awk (FWD.1). That is to say: the same as POSIX extended syntax, but with escape sequences in character classes permitted. In addition some perl-style escape sequences are supported (actually the awk syntax only requires \a \b \t \v \f \n and \r to be recognised, all other Perl-style escape sequences invoke undefined behavior according to the POSIX standard, but are in fact recognised by Boost.Regex).

Boost.Regex