Module simple
source code
Tokenizers that divide strings into substrings using the string
split()
method.
These tokenizers follow the standard TokenizerI interface, and so can be used with any code
that expects a tokenizer. For example, these tokenizers can be used to
specify the tokenization conventions when building a CorpusReader. But if you are tokenizing a string
yourself, consider using string split()
method directly
instead.
|
WhitespaceTokenizer
A tokenizer that divides a string into substrings by treating any
sequence of whitespace characters as a separator.
|
|
SpaceTokenizer
A tokenizer that divides a string into substrings by treating any
single space character as a separator.
|
|
TabTokenizer
A tokenizer that divides a string into substrings by treating any
single tab character as a separator.
|
|
LineTokenizer
A tokenizer that divides a string into substrings by treating any
single newline character as a separator.
|