Package nltk :: Package tag :: Module hmm :: Class HiddenMarkovModelTrainer
[hide private]
[frames] | no frames]

Class HiddenMarkovModelTrainer

source code

object --+
         |
        HiddenMarkovModelTrainer

Algorithms for learning HMM parameters from training data. These include both supervised learning (MLE) and unsupervised learning (Baum-Welch).

Instance Methods [hide private]
 
__init__(self, states=None, symbols=None)
Creates an HMM trainer to induce an HMM with the given states and output symbol alphabet.
source code
HiddenMarkovModelTagger
train(self, labelled_sequences=None, unlabeled_sequences=None, **kwargs)
Trains the HMM using both (or either of) supervised and unsupervised techniques.
source code
HiddenMarkovModelTagger
train_unsupervised(self, unlabeled_sequences, **kwargs)
Trains the HMM using the Baum-Welch algorithm to maximise the probability of the data sequence.
source code
HiddenMarkovModelTagger
train_supervised(self, labelled_sequences, **kwargs)
Supervised training maximising the joint probability of the symbol and state sequences.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, states=None, symbols=None)
(Constructor)

source code 

Creates an HMM trainer to induce an HMM with the given states and output symbol alphabet. A supervised and unsupervised training method may be used. If either of the states or symbols are not given, these may be derived from supervised training.

Parameters:
  • states (sequence of any) - the set of state labels
  • symbols (sequence of any) - the set of observation symbols
Overrides: object.__init__

train(self, labelled_sequences=None, unlabeled_sequences=None, **kwargs)

source code 

Trains the HMM using both (or either of) supervised and unsupervised techniques.

Parameters:
  • labelled_sequences (list) - the supervised training data, a set of labelled sequences of observations
  • unlabeled_sequences (list) - the unsupervised training data, a set of sequences of observations
  • kwargs - additional arguments to pass to the training methods
Returns: HiddenMarkovModelTagger
the trained model

train_unsupervised(self, unlabeled_sequences, **kwargs)

source code 

Trains the HMM using the Baum-Welch algorithm to maximise the probability of the data sequence. This is a variant of the EM algorithm, and is unsupervised in that it doesn't need the state sequences for the symbols. The code is based on 'A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition', Lawrence Rabiner, IEEE, 1989.

Parameters:
  • unlabeled_sequences (list) - the training data, a set of sequences of observations
  • kwargs - may include the following parameters:
       model - a HiddenMarkovModelTagger instance used to begin
           the Baum-Welch algorithm
       max_iterations - the maximum number of EM iterations
       convergence_logprob - the maximum change in log probability to
           allow convergence
    
Returns: HiddenMarkovModelTagger
the trained model

train_supervised(self, labelled_sequences, **kwargs)

source code 

Supervised training maximising the joint probability of the symbol and state sequences. This is done via collecting frequencies of transitions between states, symbol observations while within each state and which states start a sentence. These frequency distributions are then normalised into probability estimates, which can be smoothed if desired.

Parameters:
  • labelled_sequences (list) - the training data, a set of labelled sequences of observations
  • kwargs - may include an 'estimator' parameter, a function taking a FreqDist and a number of bins and returning a ProbDistI; otherwise a MLE estimate is used
Returns: HiddenMarkovModelTagger
the trained model