Package nltk :: Package tokenize :: Module punkt :: Class PunktParameters
Class PunktParameters

object --+

Stores data used to perform sentence boundary detection with punkt.

Instance Methods [hide private]
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
clear_abbrevs(self)
clear_collocations(self)
clear_sent_starters(self)
clear_ortho_context(self)
add_ortho_context(self, typ, flag)

Instance Variables [hide private]
A set of word types for known abbreviations.
A set of word type tuples for known common collocations where the first word ends in a period.
A set of word types for words that often appear at the beginning of sentences.
A dictionary mapping word types to the set of orthographic contexts that word type appears in.
Properties [hide private]

Method Details [hide private]


__init__(self) 

Instance Variable Details [hide private]


A set of word type tuples for known common collocations where the first word ends in a period. E.g., ('S.', 'Bach') is a common collocation in a text that discusses 'Johann S. Bach'. These count as negative evidence for sentence boundaries.


A dictionary mapping word types to the set of orthographic contexts that word type appears in. Contexts are represented by adding orthographic context flags: ...