Summary: Constants | Methods | Protected Methods | Inherited Methods | [Expand All]

public final class

Pattern

extends Object
implements Serializable

java.lang.Object
↳	java.util.regex.Pattern

Class Overview

Patterns are compiled regular expressions. In many cases, convenience methods such as String.matches, String.replaceAll and String.split will be preferable, but if you need to do a lot of work with the same regular expression, it may be more efficient to compile it once and reuse it. The Pattern class and its companion, Matcher, are also a lot more powerful than the small amount of functionality exposed by String.

 // String convenience methods:
 boolean sawFailures = s.matches("Failures: \d+");
 String farewell = s.replaceAll("Hello, (\S+)", "Goodbye, $1");
 String[] fields = s.split(":");

 // Direct use of Pattern:
 Pattern p = Pattern.compile("Hello, (\S+)");
 Matcher m = p.matcher(inputString);
 while (m.find()) { // Find each match in turn; String can't do this.
     String name = m.group(1); // Access a submatch group; String can't do this.
 }

Regular expression syntax

Java supports a subset of Perl 5 regular expression syntax. An important gotcha is that Java has no regular expression literals, and uses plain old string literals instead. This means that you need an extra level of escaping. For example, the regular expression \s+ has to be represented as the string "\\s+".

Escape sequences

\	Quote the following metacharacter (so `\.` matches a literal `.`).
\Q	Quote all following metacharacters until `\E`.
\E	Stop quoting metacharacters (started by `\Q`).
\\	A literal backslash.
\uhhhh	The Unicode character U+hhhh (in hex).
\xhh	The Unicode character U+00hh (in hex).
\cx	The ASCII control character ^x (so `\cH` would be ^H, U+0008).
\a	The ASCII bell character (U+0007).
\e	The ASCII ESC character (U+001b).
\f	The ASCII form feed character (U+000c).
\n	The ASCII newline character (U+000a).
\r	The ASCII carriage return character (U+000d).
\t	The ASCII tab character (U+0009).

Character classes

It's possible to construct arbitrary character classes using set operations:

[abc]	Any one of `a`, `b`, or `c`. (Enumeration.)
[a-c]	Any one of `a`, `b`, or `c`. (Range.)
[^abc]	Any character except `a`, `b`, or `c`. (Negation.)
[[a-f][0-9]]	Any character in either range. (Union.)
[[a-z]&&[jkl]]	Any character in both ranges. (Intersection.)

Most of the time, the built-in character classes are more useful:

\d	Any digit character.
\D	Any non-digit character.
\s	Any whitespace character.
\S	Any non-whitespace character.
\w	Any word character.
\W	Any non-word character.
\p{NAME}	Any character in the class with the given NAME.
\P{NAME}	Any character not in the named class.

There are a variety of named classes:

Unicode category names, prefixed by Is. For example \p{IsLu} for all uppercase letters.
POSIX class names. These are 'Alnum', 'Alpha', 'ASCII', 'Blank', 'Cntrl', 'Digit', 'Graph', 'Lower', 'Print', 'Punct', 'Upper', 'XDigit'.
Unicode block names, as used by forName(String) prefixed by In. For example \p{InHebrew} for all characters in the Hebrew block.
Character method names. These are all non-deprecated methods from Character whose name starts with is, but with the is replaced by java. For example, \p{javaLowerCase}.

Quantifiers

Quantifiers match some number of instances of the preceding regular expression.

*	Zero or more.
?	Zero or one.
+	One or more.
{n}	Exactly n.
{n,}	At least n.
{n,m}	At least n but not more than m.

Quantifiers are "greedy" by default, meaning that they will match the longest possible input sequence. There are also non-greedy quantifiers that match the shortest possible input sequence. They're same as the greedy ones but with a trailing ?:

*?	Zero or more (non-greedy).
??	Zero or one (non-greedy).
+?	One or more (non-greedy).
{n}?	Exactly n (non-greedy).
{n,}?	At least n (non-greedy).
{n,m}?	At least n but not more than m (non-greedy).

Quantifiers allow backtracking by default. There are also possessive quantifiers to prevent backtracking. They're same as the greedy ones but with a trailing +:

*+	Zero or more (possessive).
?+	Zero or one (possessive).
++	One or more (possessive).
{n}+	Exactly n (possessive).
{n,}+	At least n (possessive).
{n,m}+	At least n but not more than m (possessive).

Zero-width assertions

^	At beginning of line.
$	At end of line.
\A	At beginning of input.
\b	At word boundary.
\B	At non-word boundary.
\G	At end of previous match.
\z	At end of input.
\Z	At end of input, or before newline at end.

Look-around assertions

Look-around assertions assert that the subpattern does (positive) or doesn't (negative) match after (look-ahead) or before (look-behind) the current position, without including the matched text in the containing match. The maximum length of possible matches for look-behind patterns must not be unbounded.

(?=a)	Zero-width positive look-ahead.
(?!a)	Zero-width negative look-ahead.
(?<=a)	Zero-width positive look-behind.
(?<!a)	Zero-width negative look-behind.

Groups

(a)	A capturing group.
(?:a)	A non-capturing group.
(?>a)	An independent non-capturing group. (The first match of the subgroup is the only match tried.)
\n	The text already matched by capturing group n.

See group() for details of how capturing groups are numbered and accessed.

Operators

ab	Expression a followed by expression b.
a\|b	Either expression a or expression b.

Flags

(?dimsux-dimsux:a)	Evaluates the expression a with the given flags enabled/disabled.
(?dimsux-dimsux)	Evaluates the rest of the pattern with the given flags enabled/disabled.

The flags are:

`i`	`CASE_INSENSITIVE`	case insensitive matching
`d`	`UNIX_LINES`	only accept `'\n'` as a line terminator
`m`	`MULTILINE`	allow `^` and `$` to match beginning/end of any line
`s`	`DOTALL`	allow `.` to match `'\n'` ("s" for "single line")
`u`	`UNICODE_CASE`	enable Unicode case folding
`x`	`COMMENTS`	allow whitespace and comments

Either set of flags may be empty. For example, (?i-m) would turn on case-insensitivity and turn off multiline mode, (?i) would just turn on case-insensitivity, and (?-m) would just turn off multiline mode.

Note that on Android, UNICODE_CASE is always on: case-insensitive matching will always be Unicode-aware.

There are two other flags not settable via this mechanism: CANON_EQ and LITERAL. Attempts to use CANON_EQ on Android will throw an exception.

Implementation notes

The regular expression implementation used in Android is provided by ICU. The notation for the regular expressions is mostly a superset of those used in other Java language implementations. This means that existing applications will normally work as expected, but in rare cases Android may accept a regular expression that is not accepted by other implementations.

In some cases, Android will recognize that a regular expression is a simple special case that can be handled more efficiently. This is true of both the convenience methods in String and the methods in Pattern.

Summary

Constants
int	CANON_EQ	This constant specifies that a character in a `Pattern` and a character in the input string only match if they are canonically equivalent.
int	CASE_INSENSITIVE	This constant specifies that a `Pattern` is matched case-insensitively.
int	COMMENTS	This constant specifies that a `Pattern` may contain whitespace or comments.
int	DOTALL	This constant specifies that the '.' meta character matches arbitrary characters, including line endings, which is normally not the case.
int	LITERAL	This constant specifies that the whole `Pattern` is to be taken literally, that is, all meta characters lose their meanings.
int	MULTILINE	This constant specifies that the meta characters '^' and '$' match only the beginning and end of an input line, respectively.
int	UNICODE_CASE	This constant specifies that a `Pattern` that uses case-insensitive matching will use Unicode case folding.
int	UNIX_LINES	This constant specifies that a pattern matches Unix line endings ('\n') only against the '.', '^', and '$' meta characters.

Public Methods
static Pattern	compile(String regularExpression, int flags) Returns a compiled form of the given `regularExpression`, as modified by the given `flags`.
static Pattern	compile(String pattern) Equivalent to `Pattern.compile(pattern, 0)`.
int	flags() Returns the flags supplied to `compile`.
Matcher	matcher(CharSequence input) Returns a `Matcher` for this pattern applied to the given `input`.
static boolean	matches(String regularExpression, CharSequence input) Tests whether the given `regularExpression` matches the given `input`.
String	pattern() Returns the regular expression supplied to `compile`.
static String	quote(String string) Quotes the given `string` using "\Q" and "\E", so that all meta-characters lose their special meaning.
String[]	split(CharSequence input) Equivalent to `split(input, 0)`.
String[]	split(CharSequence input, int limit) Splits the given `input` at occurrences of this pattern.
String	toString() Returns a string containing a concise, human-readable description of this object.

Protected Methods
void	finalize() Called before the object's memory is reclaimed by the VM.

[Expand]

Inherited Methods

From class java.lang.Object

Object	clone() Creates and returns a copy of this `Object`.
boolean	equals(Object o) Compares this instance with the specified object and indicates if they are equal.
void	finalize() Called before the object's memory is reclaimed by the VM.
final Class<? extends Object>	getClass() Returns the unique instance of `Class` that represents this object's class.
int	hashCode() Returns an integer hash code for this object.
final void	notify() Causes a thread which is waiting on this object's monitor (by means of calling one of the `wait()` methods) to be woken up.
final void	notifyAll() Causes all threads which are waiting on this object's monitor (by means of calling one of the `wait()` methods) to be woken up.
String	toString() Returns a string containing a concise, human-readable description of this object.
final void	wait() Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object.
final void	wait(long millis, int nanos) Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object or until the specified timeout expires.
final void	wait(long millis) Causes the calling thread to wait until another thread calls the `notify()` or `notifyAll()` method of this object or until the specified timeout expires.

Constants

public static final int CANON_EQ

Since: API Level 1

This constant specifies that a character in a Pattern and a character in the input string only match if they are canonically equivalent. It is (currently) not supported in Android.

Constant Value: 128 (0x00000080)

public static final int CASE_INSENSITIVE

Since: API Level 1

This constant specifies that a Pattern is matched case-insensitively. That is, the patterns "a+" and "A+" would both match the string "aAaAaA". See UNICODE_CASE. Corresponds to (?i).

Constant Value: 2 (0x00000002)

public static final int COMMENTS

Since: API Level 1

This constant specifies that a Pattern may contain whitespace or comments. Otherwise comments and whitespace are taken as literal characters. Corresponds to (?x).

Constant Value: 4 (0x00000004)

public static final int DOTALL

Since: API Level 1

This constant specifies that the '.' meta character matches arbitrary characters, including line endings, which is normally not the case. Corresponds to (?s).

Constant Value: 32 (0x00000020)

public static final int LITERAL

Since: API Level 1

This constant specifies that the whole Pattern is to be taken literally, that is, all meta characters lose their meanings.

Constant Value: 16 (0x00000010)

public static final int MULTILINE

Since: API Level 1

This constant specifies that the meta characters '^' and '$' match only the beginning and end of an input line, respectively. Normally, they match the beginning and the end of the complete input. Corresponds to (?m).

Constant Value: 8 (0x00000008)

public static final int UNICODE_CASE

Since: API Level 1

This constant specifies that a Pattern that uses case-insensitive matching will use Unicode case folding. On Android, UNICODE_CASE is always on: case-insensitive matching will always be Unicode-aware. If your code is intended to be portable and uses case-insensitive matching on non-ASCII characters, you should use this flag. Corresponds to (?u).

Constant Value: 64 (0x00000040)

public static final int UNIX_LINES

Since: API Level 1

This constant specifies that a pattern matches Unix line endings ('\n') only against the '.', '^', and '$' meta characters. Corresponds to (?d).

Constant Value: 1 (0x00000001)

Public Methods

public static Pattern compile (String regularExpression, int flags)

Since: API Level 1

Returns a compiled form of the given regularExpression, as modified by the given flags. See the flags overview for more on flags.

Throws

PatternSyntaxException	if the regular expression is syntactically incorrect.

public static Pattern compile (String pattern)

Since: API Level 1

Equivalent to Pattern.compile(pattern, 0).

public int flags ()

Since: API Level 1

Returns the flags supplied to compile.

public Matcher matcher (CharSequence input)

Since: API Level 1

Returns a Matcher for this pattern applied to the given input. The Matcher can be used to match the Pattern against the whole input, find occurrences of the Pattern in the input, or replace parts of the input.

public static boolean matches (String regularExpression, CharSequence input)

Since: API Level 1

Tests whether the given regularExpression matches the given input. Equivalent to Pattern.compile(regularExpression).matcher(input).matches(). If the same regular expression is to be used for multiple operations, it may be more efficient to reuse a compiled Pattern.

public String pattern ()

Since: API Level 1

Returns the regular expression supplied to compile.

public static String quote (String string)

Since: API Level 1

Quotes the given string using "\Q" and "\E", so that all meta-characters lose their special meaning. This method correctly escapes embedded instances of "\Q" or "\E". If the entire result is to be passed verbatim to compile(String), it's usually clearer to use the LITERAL flag instead.

public String[] split (CharSequence input)

Since: API Level 1

Equivalent to split(input, 0).

public String[] split (CharSequence input, int limit)

Since: API Level 1

Splits the given input at occurrences of this pattern.

If this pattern does not occur in the input, the result is an array containing the input (converted from a CharSequence to a String).

Otherwise, the limit parameter controls the contents of the returned array as described below.

Parameters

limit

Determines the maximum number of entries in the resulting array, and the treatment of trailing empty strings.

For n > 0, the resulting array contains at most n entries. If this is fewer than the number of matches, the final entry will contain all remaining input.
For n < 0, the length of the resulting array is exactly the number of occurrences of the Pattern plus one for the text after the final separator. All entries are included.
For n == 0, the result is as for n < 0, except trailing empty strings will not be returned. (Note that the case where the input is itself an empty string is special, as described above, and the limit parameter does not apply there.)

public String toString ()

Since: API Level 1

Returns a string containing a concise, human-readable description of this object. Subclasses are encouraged to override this method and provide an implementation that takes into account the object's type and data. The default implementation is equivalent to the following expression:

   getClass().getName() + '@' + Integer.toHexString(hashCode())

See Writing a useful toString method if you intend implementing your own toString method.

Returns

a printable representation of this object.

Protected Methods

protected void finalize ()

Since: API Level 1

Called before the object's memory is reclaimed by the VM. This can only happen once the garbage collector has detected that the object is no longer reachable by any thread of the running application.

The method can be used to free system resources or perform other cleanup before the object is garbage collected. The default implementation of the method is empty, which is also expected by the VM, but subclasses can override finalize() as required. Uncaught exceptions which are thrown during the execution of this method cause it to terminate immediately but are otherwise ignored.

Note that the VM does guarantee that finalize() is called at most once for any object, but it doesn't guarantee when (if at all) finalize() will be called. For example, object B's finalize() can delay the execution of object A's finalize() method and therefore it can delay the reclamation of A's memory. To be safe, use a ReferenceQueue, because it provides more control over the way the VM deals with references during garbage collection.

Throws

Throwable

Interfaces

Classes

Exceptions

Pattern

Class Overview

Regular expression syntax

Escape sequences

Character classes

Quantifiers

Zero-width assertions

Look-around assertions

Groups

Operators

Flags

Implementation notes

See Also

Summary

Constants

public static final int CANON_EQ

public static final int CASE_INSENSITIVE

public static final int COMMENTS

public static final int DOTALL

public static final int LITERAL

public static final int MULTILINE

public static final int UNICODE_CASE

public static final int UNIX_LINES

Public Methods

public static Pattern compile (String regularExpression, int flags)

Throws

See Also

public static Pattern compile (String pattern)

public int flags ()

public Matcher matcher (CharSequence input)

public static boolean matches (String regularExpression, CharSequence input)

See Also

public String pattern ()

public static String quote (String string)

public String[] split (CharSequence input)

public String[] split (CharSequence input, int limit)

Parameters

public String toString ()

Returns

Protected Methods

protected void finalize ()

Throws