| Home | Trees | Indices | Help |
|
|---|
|
|
Tokenizers that divide strings into substrings using regular expressions that can match either tokens or separators between tokens.
|
|||
|
RegexpTokenizer A tokenizer that splits a string into substrings using a regular expression. |
|||
|
BlanklineTokenizer A tokenizer that divides a string into substrings by treating any sequence of blank lines as a separator. |
|||
|
WordPunctTokenizer A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. |
|||
|
WordTokenizer A tokenizer that divides a text into sequences of alphabetic characters. |
|||
|
|||
| Tokenization Functions | |||
|---|---|---|---|
|
|||
|
|||
|
|||
|
|||
|
|||
Split the given text string, based on the given regular expression pattern. See the documentation for RegexpTokenizer.tokenize() for descriptions of the arguments. |
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:51 2008 | http://epydoc.sourceforge.net |