Package nltk :: Package tag :: Module hmm :: Class HiddenMarkovModelTagger

Class HiddenMarkovModelTagger

 object --+    
          |    
api.TaggerI --+
              |
             HiddenMarkovModelTagger

Hidden Markov model class, a generative model for labelling sequence data. These models define the joint probability of a sequence of symbols and their labels (state transitions) as the product of the starting state probability, the probability of each state transition, and the probability of each observation being generated from each state. This is described in more detail in the module documentation.

This implementation is based on the HMM description in Chapter 8, Huang, Acero and Hon, Spoken Language Processing and includes an extension for training shallow HMM parsers or specializaed HMMs as in Molina et. al, 2002. A specialized HMM modifies training data by applying a specialization function to create a new training set that is more appropriate for sequential tagging with an HMM. A typical use case is chunking.

Instance Methods

[hide private]

__init__(self, symbols, states, transitions, outputs, priors, **kwargs)
Creates a hidden markov model parametised by the the states, transition probabilities, output probabilities and priors.

source code

float

probability(self, sequence)
Returns the probability of the given symbol sequence.

source code

float

log_probability(self, sequence)
Returns the log-probability of the given symbol sequence.

source code

list

tag(self, unlabeled_sequence)
Tags the sequence with the highest probability state sequence.

source code

_tag(self, unlabeled_sequence)

source code

float

_output_logprob(self, state, symbol)
Returns: the log probability of the symbol being observed in the given state

source code

_create_cache(self)
The cache is a tuple (P, O, X, S) where:

source code

_update_cache(self, symbols)

source code

sequence of any

best_path(self, unlabeled_sequence)
Returns the state sequence of the optimal (most probable) path through the HMM.

source code

_best_path(self, unlabeled_sequence)

source code

sequence of any

best_path_simple(self, unlabeled_sequence)
Returns the state sequence of the optimal (most probable) path through the HMM.

source code

_best_path_simple(self, unlabeled_sequence)

source code

list

random_sample(self, rng, length)
Randomly sample the HMM to generate a sentence of a given length.

source code

_sample_probdist(self, probdist, p, samples)

source code

entropy(self, unlabeled_sequence)
Returns the entropy over labellings of the given sequence.

source code

point_entropy(self, unlabeled_sequence)
Returns the pointwise entropy over the possible states at each position in the chain, given the observation sequence.

source code

_exhaustive_entropy(self, unlabeled_sequence)

source code

_exhaustive_point_entropy(self, unlabeled_sequence)

source code

array

_forward_probability(self, unlabeled_sequence)
Return the forward probability matrix, a T by N array of log-probabilities, where T is the length of the sequence and N is the number of states.

source code

array

_backward_probability(self, unlabeled_sequence)
Return the backward probability matrix, a T by N array of log-probabilities, where T is the length of the sequence and N is the number of states.

source code

test(self, test_sequence, **kwargs)
Tests the HiddenMarkovModelTagger instance. source code

__repr__(self)
repr(x)

source code

Inherited from api.TaggerI: batch_tag

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Methods

[hide private]

_train(cls, labeled_sequence, test_sequence=None, unlabeled_sequence=None, **kwargs)

source code

HiddenMarkovModelTagger

train(cls, labeled_sequence, test_sequence=None, unlabeled_sequence=None, **kwargs)
Train a new HiddenMarkovModelTagger using the given labeled and unlabeled training instances. source code

Properties

[hide private]

Inherited from object: __class__

Method Details

Class HiddenMarkovModelTagger

__init__(self, symbols, states, transitions, outputs, priors, **kwargs) (Constructor)

train(cls, labeled_sequence, test_sequence=None, unlabeled_sequence=None, **kwargs) Class Method

probability(self, sequence)

log_probability(self, sequence)

tag(self, unlabeled_sequence)

_output_logprob(self, state, symbol)

_create_cache(self)

best_path(self, unlabeled_sequence)

best_path_simple(self, unlabeled_sequence)

random_sample(self, rng, length)

entropy(self, unlabeled_sequence)

_forward_probability(self, unlabeled_sequence)

_backward_probability(self, unlabeled_sequence)

test(self, test_sequence, **kwargs)

__repr__(self) (Representation operator)

init(self, symbols, states, transitions, outputs, priors, **kwargs)
(Constructor)

train(cls, labeled_sequence, test_sequence=None, unlabeled_sequence=None, **kwargs)
Class Method

repr(self)
(Representation operator)