Home | Trees | Indices | Help |
|
---|
|
Tokenizers that divide strings into substrings using regular expressions that can match either tokens or separators between tokens.
|
|||
RegexpTokenizer A tokenizer that splits a string into substrings using a regular expression. |
|||
BlanklineTokenizer A tokenizer that divides a string into substrings by treating any sequence of blank lines as a separator. |
|||
WordPunctTokenizer A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. |
|||
WordTokenizer A tokenizer that divides a text into sequences of alphabetic characters. |
|
|||
Tokenization Functions | |||
---|---|---|---|
|
|||
|
|||
|
|||
|
|
Split the given text string, based on the given regular expression pattern. See the documentation for RegexpTokenizer.tokenize() for descriptions of the arguments. |
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:51 2008 | http://epydoc.sourceforge.net |