Package nltk :: Package tokenize :: Module punkt :: Class PunktParameters
[hide private]
[frames] | no frames]

Class PunktParameters

source code

object --+
         |
        PunktParameters

Stores data used to perform sentence boundary detection with punkt.

Instance Methods [hide private]
 
__init__(self)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
clear_abbrevs(self) source code
 
clear_collocations(self) source code
 
clear_sent_starters(self) source code
 
clear_ortho_context(self) source code
 
add_ortho_context(self, typ, flag) source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Instance Variables [hide private]
  abbrev_types
A set of word types for known abbreviations.
  collocations
A set of word type tuples for known common collocations where the first word ends in a period.
  sent_starters
A set of word types for words that often appear at the beginning of sentences.
  ortho_context
A dictionary mapping word types to the set of orthographic contexts that word type appears in.
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__
(inherited documentation)

Instance Variable Details [hide private]

collocations

A set of word type tuples for known common collocations where the first word ends in a period. E.g., ('S.', 'Bach') is a common collocation in a text that discusses 'Johann S. Bach'. These count as negative evidence for sentence boundaries.

ortho_context

A dictionary mapping word types to the set of orthographic contexts that word type appears in. Contexts are represented by adding orthographic context flags: ...