Package nltk :: Package tokenize :: Module punkt :: Class PunktParameters

Class PunktParameters

object --+
         |
        PunktParameters

Stores data used to perform sentence boundary detection with punkt.

Instance Methods

[hide private]

__init__(self)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature

source code

clear_abbrevs(self)

source code

clear_collocations(self)

source code

clear_sent_starters(self)

source code

clear_ortho_context(self)

source code

add_ortho_context(self, typ, flag)

source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Instance Variables

[hide private]

abbrev_types
A set of word types for known abbreviations.

collocations
A set of word type tuples for known common collocations where the first word ends in a period.

sent_starters
A set of word types for words that often appear at the beginning of sentences.

ortho_context
A dictionary mapping word types to the set of orthographic contexts that word type appears in.

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self)
(Constructor)

source code

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__: (inherited documentation)

Instance Variable Details

[hide private]

collocations

A set of word type tuples for known common collocations where the first word ends in a period. E.g., ('S.', 'Bach') is a common collocation in a text that discusses 'Johann S. Bach'. These count as negative evidence for sentence boundaries.

ortho_context

A dictionary mapping word types to the set of orthographic contexts that word type appears in. Contexts are represented by adding orthographic context flags: ...

Class PunktParameters

__init__(self) (Constructor)

collocations

ortho_context

init(self)
(Constructor)