Package nltk :: Package tokenize :: Module simple :: Class WhitespaceTokenizer
[hide private]
[frames] | no frames]

Class WhitespaceTokenizer

source code

    object --+    
             |    
api.TokenizerI --+
                 |
                WhitespaceTokenizer

A tokenizer that divides a string into substrings by treating any sequence of whitespace characters as a separator. Whitespace characters are space (' '), tab ('\t'), and newline ('\n'). If you are performing the tokenization yourself (rather than building a tokenizer to pass to some other piece of code), consider using the string split() method instead:

>>> words = s.split()
Instance Methods [hide private]
 
tokenize(self, s)
Divide the given string into a list of substrings.
source code

Inherited from api.TokenizerI: batch_tokenize

Inherited from object: __delattr__, __getattribute__, __hash__, __init__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

tokenize(self, s)

source code 

Divide the given string into a list of substrings.

Returns:
list of str
Overrides: api.TokenizerI.tokenize
(inherited documentation)