Package nltk :: Package tokenize :: Module api :: Class TokenizerI
[hide private]
[frames] | no frames]

Class TokenizerI

source code

object --+
         |
        TokenizerI
Known Subclasses:

A procesing interface for tokenizing a string, or dividing it into a list of substrings.

Subclasses must define:

Instance Methods [hide private]
 
tokenize(self, s)
Divide the given string into a list of substrings.
source code
list of list of str
batch_tokenize(self, strings)
Apply self.tokenize() to each element of strings.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __init__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

tokenize(self, s)

source code 

Divide the given string into a list of substrings.

Returns:
list of str

batch_tokenize(self, strings)

source code 

Apply self.tokenize() to each element of strings. I.e.:

>>> return [self.tokenize(s) for s in strings]
Returns: list of list of str