A regular expression is a powerful way of specifying a pattern for a complex search.
MySQL uses Henry Spencer's implementation of regular expressions, which is
aimed at conformance with POSIX 1003.2.
See section B Credits.
MySQL uses the extended version to support pattern-matching operations
performed with the REGEXP
operator in SQL statements.
See section 3.3.4.7 Pattern Matching.
This appendix is a summary, with examples, of the special characters and
constructs that can be used in MySQL for REGEXP
operations.
It does not contain all the details that can be found in
Henry Spencer's regex(7)
manual page. That manual page is
included in MySQL source distributions, in the `regex.7' file under the
`regex' directory.
A regular expression describes a set of strings. The simplest regular
expression is one that has no special characters in it. For example, the
regular expression hello
matches hello
and nothing else.
Non-trivial regular expressions use certain special constructs so that
they can match more than one string. For example, the regular expression
hello|word
matches either the string hello
or the string
word
.
As a more complex example, the regular expression B[an]*s
matches any
of the strings Bananas
, Baaaaas
, Bs
, and any other
string starting with a B
, ending with an s
, and containing any
number of a
or n
characters in between.
A regular expression for the REGEXP
operator may use any of the
following special characters and constructs:
^
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0 mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
$
mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1 mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0
.
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1 mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1
a*
a
characters.
mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1 mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1 mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
a+
a
characters.
mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1 mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
a?
a
character.
mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1 mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1 mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
de|abc
de
or abc
.
mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1 mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0 mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1 mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1 mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1 mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
(abc)*
abc
.
mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1 mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0 mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
{1}
{2,3}
{n}
or {m,n}
notation provides
a more general way of writing regular expressions that match many
occurrences of the previous atom (or ``piece'') of the pattern.
m
and n
are integers.
a*
a{0,}
.
a+
a{1,}
.
a?
a{0,1}
.
a{n}
matches exactly n
instances of
a
.
a{n,}
matches n
or more instances of a
.
a{m,n}
matches m
through n
instances of a
,
inclusive.
m
and n
must be in the range from 0
to
RE_DUP_MAX
(default 255), inclusive. If both m
and n
are given, m
must be less than or equal to n
.
mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0 mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1 mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
[a-dX]
[^a-dX]
a
,
b
, c
, d
or X
. A -
character between two other
characters forms a range that matches all characters from the first
character to the second. For example, [0-9]
matches any decimal
digit. To include a literal ]
character, it must immediately follow
the opening bracket [
. To include a literal -
character, it
must be written first or last. Any character that does not have a defined
special meaning inside a []
pair matches only itself.
mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1 mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0 mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1 mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0 mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1 mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
[.characters.]
[
and ]
),
matches the sequence of characters of that collating element. characters
is either a single character or a character name like newline
.
You can find the full list of character names in the `regexp/cname.h'
file.
mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1 mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
[=character_class=]
[
and ]
),
[=character_class=]
represents
an equivalence class. It matches all characters with the same collation
value, including itself.
For example, if o
and (+)
are the members of an
equivalence class, then [[=o=]]
, [[=(+)=]]
, and
[o(+)]
are all synonymous.
An equivalence class may not be used as an endpoint of a range.
[:character_class:]
[
and ]
),
[:character_class:]
represents a character class that matches all
characters belonging to that class. The standard class names are:
alnum | Alphanumeric characters |
alpha | Alphabetic characters |
blank | Whitespace characters |
cntrl | Control characters |
digit | Digit characters |
graph | Graphic characters |
lower | Lowercase alphabetic characters |
print | Graphic or space characters |
punct | Punctuation characters |
space | Space, tab, newline, and carriage return |
upper | Uppercase alphabetic characters |
xdigit | Hexadecimal digit characters |
ctype(3)
manual
page. A particular locale may provide other class names.
A character class may not be used as an endpoint of a range.
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1 mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
[[:<:]]
[[:>:]]
alnum
class or an underscore
(_
).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1 mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a literal instance of a special character in a regular expression,
precede it by two backslash (\) characters. The MySQL parser
interprets one of the backslashes, and the regular expression library
interprets the other. For example, to match the string 1+2
that
contains the special +
character, only the last of the following
regular expressions is the correct one:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0 mysql> SELECT '1+2' REGEXP '1\+2'; -> 0 mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1
Go to the first, previous, next, last section, table of contents.