Package nltk :: Package tokenize :: Module simple :: Class LineTokenizer

Class LineTokenizer

    object --+    
             |    
api.TokenizerI --+
                 |
                LineTokenizer

A tokenizer that divides a string into substrings by treating any single newline character as a separator. Handling of blank lines may be controlled using a constructor parameter.

Instance Methods

[hide private]

__init__(self, blanklines='discard')
x.__init__(...) initializes x; see x.__class__.__doc__ for signature source code

tokenize(self, s)
Divide the given string into a list of substrings.

source code

Inherited from api.TokenizerI: batch_tokenize

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, blanklines=`'discard'`)
(Constructor)

source code

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:

blanklines - Indicates how blank lines should be handled. Valid values are:
- 'discard': strip blank lines out of the token list before returning it. A line is considered blank if it contains only whitespace characters.
- 'keep': leave all blank lines in the token list.
- 'discard-eof': if the string ends with a newline, then do not generate a corresponding token '' after that newline.

Overrides: object.__init__

tokenize(self, s)

source code

Divide the given string into a list of substrings.

Returns:: list of str
Overrides: api.TokenizerI.tokenize: (inherited documentation)

Class LineTokenizer

__init__(self, blanklines='discard') (Constructor)

tokenize(self, s)

init(self, blanklines=`'discard'`)
(Constructor)