Package nltk :: Module probability :: Class FreqDist
[hide private]
[frames] | no frames]

Class FreqDist

source code

object --+    
         |    
      dict --+
             |
            FreqDist

A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:

>>> fdist = FreqDist()
>>> for word in tokenize.whitespace(sent):
...    fdist.inc(word.lower())

An equivalent way to do this is with the initializer:

>>> fdist = FreqDist(word.lower() for word in tokenize.whitespace(sent))
Instance Methods [hide private]
new empty dictionary

__init__(self, samples=None)
Construct a new frequency distribution.
source code
None
inc(self, sample, count=1)
Increment this FreqDist's count for the given sample.
source code
int
N(self)
Returns: The total number of sample outcomes that have been recorded by this FreqDist.
source code
int
B(self)
Returns: The total number of sample values (or bins) that have counts greater than zero.
source code
list
samples(self)
Returns: A list of all samples that have been recorded as outcomes by this frequency distribution.
source code
int
Nr(self, r, bins=None)
Returns: The number of samples with count r.
source code
 
_cache_Nr_values(self) source code
int
count(self, sample)
Return the count of a given sample.
source code
float
freq(self, sample)
Return the frequency of a given sample.
source code
any or None
max(self)
Return the sample with the greatest number of outcomes in this frequency distribution.
source code
 
sorted_samples(self) source code
 
plot(self, samples=None, *args, **kwargs)
Plot the given samples from the frequency distribution.
source code
 
zipf_plot(self, num=40, *args, **kwargs)
Plot the most frequent samples of the frequency distribution.
source code
sequence of any
sorted(self)
Return the samples sorted in decreasing order of frequency.
source code
string
__repr__(self)
Returns: A string representation of this FreqDist.
source code
string
__str__(self)
Returns: A string representation of this FreqDist.
source code
 
__getitem__(self, sample)
x[y]
source code

Inherited from dict: __cmp__, __contains__, __delitem__, __eq__, __ge__, __getattribute__, __gt__, __hash__, __iter__, __le__, __len__, __lt__, __ne__, __new__, __setitem__, clear, copy, fromkeys, get, has_key, items, iteritems, iterkeys, itervalues, keys, pop, popitem, setdefault, update, values

Inherited from object: __delattr__, __reduce__, __reduce_ex__, __setattr__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, samples=None)
(Constructor)

source code 

Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.

In particular, FreqDist() returns an empty frequency distribution; and FreqDist(samples) first creates an empty frequency distribution, and then calls inc for each element in the list samples.

Parameters:
  • samples (Sequence) - The samples to initialize the frequency distribution with.
Returns:
new empty dictionary

Overrides: dict.__init__

inc(self, sample, count=1)

source code 

Increment this FreqDist's count for the given sample.

Parameters:
  • sample (any) - The sample whose count should be incremented.
  • count (int) - The amount to increment the sample's count by.
Returns: None
Raises:
  • NotImplementedError - If sample is not a supported sample type.

N(self)

source code 
Returns: int
The total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().

B(self)

source code 
Returns: int
The total number of sample values (or bins) that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N().

samples(self)

source code 
Returns: list
A list of all samples that have been recorded as outcomes by this frequency distribution. Use count() to determine the count for each sample.

Nr(self, r, bins=None)

source code 
Parameters:
  • r (int) - A sample count.
  • bins (int) - The number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0).
Returns: int
The number of samples with count r.

count(self, sample)

source code 

Return the count of a given sample. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Counts are non-negative integers. This method has been replaced by conventional dictionary indexing; use fd[item] instead of fd.count(item).

Parameters:
  • sample (any.) - the sample whose count should be returned.
Returns: int
The count of a given sample.

freq(self, sample)

source code 

Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].

Parameters:
  • sample (any) - the sample whose frequency should be returned.
Returns: float
The frequency of a given sample.

max(self)

source code 

Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occurred in this frequency distribution, return None.

Returns: any or None
The sample with the maximum number of outcomes in this frequency distribution.

plot(self, samples=None, *args, **kwargs)

source code 

Plot the given samples from the frequency distribution. If no samples are specified, use all samples, in lexical sort order. (Requires Matplotlib to be installed.)

Parameters:
  • samples (list) - The samples to plot.

zipf_plot(self, num=40, *args, **kwargs)

source code 

Plot the most frequent samples of the frequency distribution. (Requires Matplotlib to be installed.)

Parameters:
  • num (int) - The number of samples to plot.

sorted(self)

source code 

Return the samples sorted in decreasing order of frequency. Instances with the same count will be arbitrarily ordered. Instances with a count of zero will be omitted. This method is O(N^2), where N is the number of samples, but will complete in a shorter time on average.

Returns: sequence of any
The set of samples in sorted order.

__repr__(self)
(Representation operator)

source code 

repr(x)

Returns: string
A string representation of this FreqDist.
Overrides: dict.__repr__

__str__(self)
(Informal representation operator)

source code 

str(x)

Returns: string
A string representation of this FreqDist.
Overrides: object.__str__

__getitem__(self, sample)
(Indexing operator)

source code 

x[y]

Overrides: dict.__getitem__
(inherited documentation)