__init__(self,
pattern,
gaps=False,
discard_empty=True,
flags=56)
(Constructor)
| source code
|
Construct a new tokenizer that splits strings using the given regular
expression pattern. By default, pattern will
be used to find tokens; but if gaps is set to
False, then patterns will be used to find
separators between tokens instead.
- Parameters:
pattern (str) - The pattern used to build this tokenizer. This pattern may safely
contain grouping parenthases.
gaps (bool) - True if this tokenizer's pattern should be used to find
separators between tokens; False if this tokenizer's pattern
should be used to find the tokens themselves.
discard_empty (bool) - True if any empty tokens ('') generated by the
tokenizer should be discarded. Empty tokens can only be
generated if _gaps is true.
flags (int) - The regexp flags used to compile this tokenizer's pattern. By
default, the following flags are used: re.UNICODE |
re.MULTILINE | re.DOTALL.
- Overrides:
object.__init__
|