Package nltk :: Module probability :: Class GoodTuringProbDist

Class GoodTuringProbDist

object --+    
         |    
 ProbDistI --+
             |
            GoodTuringProbDist

The Good-Turing estimate of a probability distribution. This method calculates the probability mass to assign to events with zero or low counts based on the number of events with higher counts. It does so by using the smoothed count c*:

c* = (c + 1) N(c + 1) / N(c)

where c is the original count, N(i) is the number of event types observed with count i. These smoothed counts are then normalised to yield a probability distribution.

Instance Methods

[hide private]

__init__(self, freqdist, bins=None)
Creates a Good-Turing probability distribution estimate.

source code

float

prob(self, sample)
Returns: the probability for a given sample.

source code

any

max(self)
Returns: the sample with the greatest probability.

source code

list

samples(self)
Returns: A list of all samples that have nonzero probabilities.

source code

float

discount(self)
Returns: The ratio by which counts are discounted on average: c*/c

source code

freqdist(self)

source code

string

__repr__(self)
Returns: A string representation of this ProbDist. source code

Inherited from ProbDistI: generate, logprob

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Variables

[hide private]

Inherited from ProbDistI: SUM_TO_ONE

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, freqdist, bins=None)
(Constructor)

source code

Creates a Good-Turing probability distribution estimate. This method calculates the probability mass to assign to events with zero or low counts based on the number of events with higher counts. It does so by using the smoothed count c*:

c* = (c + 1) N(c + 1) / N(c)

where c is the original count, N(i) is the number of event types observed with count i. These smoothed counts are then normalised to yield a probability distribution.

The bins parameter allows N(0) to be estimated.

Parameters:

freqdist (FreqDist) - The frequency counts upon which to base the estimation.
bins (Int) - The number of possible event types. This must be at least as large as the number of bins in the freqdist. If None, then it's taken to be equal to freqdist.B().

Overrides: ProbDistI.__init__

prob(self, sample)

source code

Parameters:

sample - The sample whose probability should be returned.

Returns: float

the probability for a given sample. Probabilities are always real numbers in the range [0, 1].

Overrides: ProbDistI.prob

(inherited documentation)

max(self)

source code

Returns: any: the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
Overrides: ProbDistI.max: (inherited documentation)

samples(self)

source code

Returns: list: A list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.
Overrides: ProbDistI.samples: (inherited documentation)

discount(self)

source code

Returns: float: The ratio by which counts are discounted on average: c*/c
Overrides: ProbDistI.discount: (inherited documentation)

repr(self)
(Representation operator)

source code

repr(x)

Returns: string: A string representation of this ProbDist.
Overrides: object.__repr__

Class GoodTuringProbDist

__init__(self, freqdist, bins=None) (Constructor)

prob(self, sample)

max(self)

samples(self)

discount(self)

__repr__(self) (Representation operator)

init(self, freqdist, bins=None)
(Constructor)

repr(self)
(Representation operator)