| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
api.TokenizerI --+
|
RegexpTokenizer --+
|
WordPunctTokenizer
A tokenizer that divides a text into sequences of alphabetic and non-alphabetic characters. E.g.:
>>> WordPunctTokenizer().tokenize("She said 'hello'.") ['She', 'said', "'", 'hello', "'."]
|
|||
|
|||
|
Inherited from Inherited from Inherited from |
|||
|
|||
|
Inherited from |
|||
|
|||
|
Inherited from |
|||
|
|||
Construct a new tokenizer that splits strings using the given regular
expression
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:58 2008 | http://epydoc.sourceforge.net |