Package nltk :: Package tokenize :: Module simple :: Class LineTokenizer
Class LineTokenizer

    object --+    
api.TokenizerI --+

A tokenizer that divides a string into substrings by treating any single newline character as a separator. Handling of blank lines may be controlled using a constructor parameter.

__init__(self, blanklines='discard')
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
tokenize(self, s)
Divide the given string into a list of substrings.
Inherited from api.TokenizerI: batch_tokenize

__init__(self, blanklines='discard')

  • blanklines - Indicates how blank lines should be handled. Valid values are:
    • 'discard': strip blank lines out of the token list before returning it. A line is considered blank if it contains only whitespace characters.
    • 'keep': leave all blank lines in the token list.
    • 'discard-eof': if the string ends with a newline, then do not generate a corresponding token '' after that newline.
tokenize(self, s)

