Package nltk :: Module probability :: Class WittenBellProbDist
[hide private]
[frames] | no frames]

Class WittenBellProbDist

source code

object --+    
         |    
 ProbDistI --+
             |
            WittenBellProbDist

The Witten-Bell estimate of a probability distribution. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to:

where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occuring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:

Instance Methods [hide private]
 
__init__(self, freqdist, bins=None)
Creates a distribution of Witten-Bell probability estimates.
source code
float
prob(self, sample)
Returns: the probability for a given sample.
source code
any
max(self)
Returns: the sample with the greatest probability.
source code
list
samples(self)
Returns: A list of all samples that have nonzero probabilities.
source code
 
freqdist(self) source code
float
discount(self)
Returns: The ratio by which counts are discounted on average: c*/c
source code
string
__repr__(self)
Returns: A string representation of this ProbDist.
source code

Inherited from ProbDistI: generate, logprob

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Variables [hide private]

Inherited from ProbDistI: SUM_TO_ONE

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, freqdist, bins=None)
(Constructor)

source code 

Creates a distribution of Witten-Bell probability estimates. This distribution allocates uniform probability mass to as yet unseen events by using the number of events that have only been seen once. The probability mass reserved for unseen events is equal to:

  • T / (N + T)

where T is the number of observed event types and N is the total number of observed events. This equates to the maximum likelihood estimate of a new type event occuring. The remaining probability mass is discounted such that all probability estimates sum to one, yielding:

  • p = T / Z (N + T), if count = 0
  • p = c / (N + T), otherwise

The parameters T and N are taken from the freqdist parameter (the B() and N() values). The normalising factor Z is calculated using these values along with the bins parameter.

Parameters:
  • freqdist (FreqDist) - The frequency counts upon which to base the estimation.
  • bins (Int) - The number of possible event types. This must be at least as large as the number of bins in the freqdist. If None, then it's assumed to be equal to that of the freqdist
Overrides: ProbDistI.__init__

prob(self, sample)

source code 
Parameters:
  • sample - The sample whose probability should be returned.
Returns: float
the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
Overrides: ProbDistI.prob
(inherited documentation)

max(self)

source code 
Returns: any
the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
Overrides: ProbDistI.max
(inherited documentation)

samples(self)

source code 
Returns: list
A list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.
Overrides: ProbDistI.samples
(inherited documentation)

discount(self)

source code 
Returns: float
The ratio by which counts are discounted on average: c*/c
Overrides: ProbDistI.discount
(inherited documentation)

__repr__(self)
(Representation operator)

source code 

repr(x)

Returns: string
A string representation of this ProbDist.
Overrides: object.__repr__