Package nltk :: Package tokenize :: Module simple

Module simple

Tokenizers that divide strings into substrings using the string split() method.

These tokenizers follow the standard TokenizerI interface, and so can be used with any code that expects a tokenizer. For example, these tokenizers can be used to specify the tokenization conventions when building a CorpusReader. But if you are tokenizing a string yourself, consider using string split() method directly instead.

Classes

[hide private]

WhitespaceTokenizer
A tokenizer that divides a string into substrings by treating any sequence of whitespace characters as a separator.

SpaceTokenizer
A tokenizer that divides a string into substrings by treating any single space character as a separator.

TabTokenizer
A tokenizer that divides a string into substrings by treating any single tab character as a separator.

LineTokenizer
A tokenizer that divides a string into substrings by treating any single newline character as a separator.

Functions

[hide private]

Tokenization Functions

line_tokenize(text, blanklines='discard') source code