| Home | Trees | Indices | Help |
|
|---|
|
|
object --+
|
api.TokenizerI --+
|
RegexpTokenizer --+
|
WordTokenizer
A tokenizer that divides a text into sequences of alphabetic characters. Any non-alphabetic characters are discarded. E.g.:
>>> WordTokenizer().tokenize("She said 'hello'.") ['She', 'said', 'hello']
|
|||
|
|||
|
Inherited from Inherited from Inherited from |
|||
|
|||
|
Inherited from |
|||
|
|||
|
Inherited from |
|||
|
|||
Construct a new tokenizer that splits strings using the given regular
expression
|
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:58 2008 | http://epydoc.sourceforge.net |