Package nltk :: Package cluster :: Module em :: Class EM
[hide private]
[frames] | no frames]

Class EM

source code

api.ClusterI --+    
               |    
util.VectorSpace --+
                   |
                  EM

The Gaussian EM clusterer models the vectors as being produced by a mixture of k Gaussian sources. The parameters of these sources (prior probability, mean and covariance matrix) are then found to maximise the likelihood of the given data. This is done with the expectation maximisation algorithm. It starts with k arbitrarily chosen means, priors and covariance matrices. It then calculates the membership probabilities for each vector in each of the clusters; this is the 'E' step. The cluster parameters are then updated in the 'M' step using the maximum likelihood estimate from the cluster membership probabilities. This process continues until the likelihood of the data does not significantly increase.

Instance Methods [hide private]
 
__init__(self, initial_means, priors=None, covariance_matrices=None, conv_threshold=1e-06, bias=0.1, normalise=False, svd_dimensions=None)
Creates an EM clusterer with the given starting parameters, convergence threshold and vector mangling parameters.
source code
 
num_clusters(self)
Returns the number of clusters.
source code
 
cluster_vectorspace(self, vectors, trace=False)
Finds the clusters using the given set of vectors.
source code
 
classify_vectorspace(self, vector)
Returns the index of the appropriate cluster for the vector.
source code
 
likelihood_vectorspace(self, vector, cluster)
Returns the likelihood of the vector belonging to the cluster.
source code
 
_gaussian(self, mean, cvm, x) source code
 
_loglikelihood(self, vectors, priors, means, covariances) source code
 
__repr__(self) source code

Inherited from util.VectorSpace: classify, cluster, likelihood, vector

Inherited from util.VectorSpace (private): _normalise

Inherited from api.ClusterI: classification_probdist, cluster_name, cluster_names

Method Details [hide private]

__init__(self, initial_means, priors=None, covariance_matrices=None, conv_threshold=1e-06, bias=0.1, normalise=False, svd_dimensions=None)
(Constructor)

source code 

Creates an EM clusterer with the given starting parameters, convergence threshold and vector mangling parameters.

Parameters:
  • initial_means ([seq of] numpy array or seq of SparseArray) - the means of the gaussian cluster centers
  • priors (numpy array or seq of float) - the prior probability for each cluster
  • covariance_matrices ([seq of] numpy array) - the covariance matrix for each cluster
  • conv_threshold (int or float) - maximum change in likelihood before deemed convergent
  • bias (float) - variance bias used to ensure non-singular covariance matrices
  • normalise (boolean) - should vectors be normalised to length 1
  • svd_dimensions (int) - number of dimensions to use in reducing vector dimensionsionality with SVD
Overrides: util.VectorSpace.__init__

num_clusters(self)

source code 

Returns the number of clusters.

Overrides: api.ClusterI.num_clusters
(inherited documentation)

cluster_vectorspace(self, vectors, trace=False)

source code 

Finds the clusters using the given set of vectors.

Overrides: util.VectorSpace.cluster_vectorspace
(inherited documentation)

classify_vectorspace(self, vector)

source code 

Returns the index of the appropriate cluster for the vector.

Overrides: util.VectorSpace.classify_vectorspace
(inherited documentation)

likelihood_vectorspace(self, vector, cluster)

source code 

Returns the likelihood of the vector belonging to the cluster.

Overrides: util.VectorSpace.likelihood_vectorspace
(inherited documentation)