Home | Trees | Indices | Help |
|
---|
|
object --+ | api.TokenizerI --+ | RegexpTokenizer --+ | WordPunctTokenizer
A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. E.g.:
>>> WordPunctTokenizer().tokenize("She said 'hello'.") ['She', 'said', "'", 'hello', "'."]
|
|||
|
|||
Inherited from Inherited from Inherited from |
|
|||
Inherited from |
|
|||
Inherited from |
|
Construct a new tokenizer that splits strings using the given regular
expression
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:58 2008 | http://epydoc.sourceforge.net |