Package nltk :: Package tag :: Module hmm
[hide private]
[frames] | no frames]

Module hmm

source code

Hidden Markov Models (HMMs) largely used to assign the correct label sequence to sequential data or assess the probability of a given label and data sequence. These models are finite state machines characterised by a number of states, transitions between these states, and output symbols emitted while in each state. The HMM is an extension to the Markov chain, where each state corresponds deterministically to a given event. In the HMM the observation is a probabilistic function of the state. HMMs share the Markov chain's assumption, being that the probability of transition from one state to another only depends on the current state - i.e. the series of states that led to the current state are not used. They are also time invariant.

The HMM is a directed graph, with probability weighted edges (representing the probability of a transition between the source and sink states) where each vertex emits an output symbol when entered. The symbol (or observation) is non-deterministically generated. For this reason, knowing that a sequence of output observations was generated by a given HMM does not mean that the corresponding sequence of states (and what the current state is) is known. This is the 'hidden' in the hidden markov model.

Formally, a HMM can be characterised by:

To ground this discussion, take a common NLP application, part-of-speech (POS) tagging. An HMM is desirable for this task as the highest probability tag sequence can be calculated for a given sequence of word forms. This differs from other tagging techniques which often tag each word individually, seeking to optimise each individual tagging greedily without regard to the optimal combination of tags for a larger unit, such as a sentence. The HMM does this with the Viterbi algorithm, which efficiently computes the optimal path through the graph given the sequence of words forms.

In POS tagging the states usually have a 1:1 correspondence with the tag alphabet - i.e. each state represents a single tag. The output observation alphabet is the set of word forms (the lexicon), and the remaining three parameters are derived by a training regime. With this information the probability of a given sentence can be easily derived, by simply summing the probability of each distinct path through the model. Similarly, the highest probability tagging sequence can be derived with the Viterbi algorithm, yielding a state sequence which can be mapped into a tag sequence.

This discussion assumes that the HMM has been trained. This is probably the most difficult task with the model, and requires either MLE estimates of the parameters or unsupervised learning using the Baum-Welch algorithm, a variant of EM.


Version: 1.0.1

Classes [hide private]
  HiddenMarkovModelTagger
Hidden Markov model class, a generative model for labelling sequence data.
  HiddenMarkovModelTrainer
Algorithms for learning HMM parameters from training data.
  HiddenMarkovModelTaggerTransform
An abstract subclass of HiddenMarkovModelTaggerTransformI.
  LambdaTransform
A subclass of HiddenMarkovModelTaggerTransform that is backed by an arbitrary user-defined function, instance method, or lambda function.
  IdentityTransform
A subclass of HiddenMarkovModelTaggerTransform that implements transform() as the identity function, i.e.
Functions [hide private]
 
_log_add(*values)
Adds the logged values, returning the logarithm of the addition.
source code
 
demo() source code
 
load_pos(num_sents) source code
 
demo_pos() source code
 
_untag(sentences) source code
 
demo_pos_bw() source code
 
demo_bw() source code
Variables [hide private]
  _NINF = -1e+300
  _TEXT = 0
  _TAG = 1
  ALLOW_THREADS = 1
  BUFSIZE = 10000
  CLIP = 0
  ERR_CALL = 3
  ERR_DEFAULT = 0
  ERR_DEFAULT2 = 2084
  ERR_IGNORE = 0
  ERR_LOG = 5
  ERR_PRINT = 4
  ERR_RAISE = 2
  ERR_WARN = 1
  FLOATING_POINT_SUPPORT = 1
  FPE_DIVIDEBYZERO = 1
  FPE_INVALID = 8
  FPE_OVERFLOW = 2
  FPE_UNDERFLOW = 4
  False_ = False
  Inf = inf
  Infinity = inf
  MAXDIMS = 32
  NAN = nan
  NINF = -inf
  NZERO = -0.0
  NaN = nan
  PINF = inf
  PZERO = 0.0
  RAISE = 2
  SHIFT_DIVIDEBYZERO = 0
  SHIFT_INVALID = 9
  SHIFT_OVERFLOW = 3
  SHIFT_UNDERFLOW = 6
  ScalarType = (<type 'int'>, <type 'float'>, <type 'complex'>, ...
  True_ = True
  UFUNC_BUFSIZE_DEFAULT = 10000
  UFUNC_PYVALS_NAME = 'UFUNC_PYVALS'
  WRAP = 1
  absolute = <ufunc 'absolute'>
  add = <ufunc 'add'>
  arccos = <ufunc 'arccos'>
  arccosh = <ufunc 'arccosh'>
  arcsin = <ufunc 'arcsin'>
  arcsinh = <ufunc 'arcsinh'>
  arctan = <ufunc 'arctan'>
  arctan2 = <ufunc 'arctan2'>
  arctanh = <ufunc 'arctanh'>
  bitwise_and = <ufunc 'bitwise_and'>
  bitwise_not = <ufunc 'invert'>
  bitwise_or = <ufunc 'bitwise_or'>
  bitwise_xor = <ufunc 'bitwise_xor'>
  c_ = <numpy.lib.index_tricks.c_class object at 0x11b44f0>
  cast = {<type 'numpy.int64'>: <function <lambda> at 0x123bf30>...
  ceil = <ufunc 'ceil'>
  conj = <ufunc 'conjugate'>
  conjugate = <ufunc 'conjugate'>
  cos = <ufunc 'cos'>
  cosh = <ufunc 'cosh'>
  divide = <ufunc 'divide'>
  e = 2.71828182846
  equal = <ufunc 'equal'>
  exp = <ufunc 'exp'>
  expm1 = <ufunc 'expm1'>
  fabs = <ufunc 'fabs'>
  floor = <ufunc 'floor'>
  floor_divide = <ufunc 'floor_divide'>
  fmod = <ufunc 'fmod'>
  frexp = <ufunc 'frexp'>
  greater = <ufunc 'greater'>
  greater_equal = <ufunc 'greater_equal'>
  hypot = <ufunc 'hypot'>
  index_exp = <numpy.lib.index_tricks._index_expression_class ob...
  inf = inf
  infty = inf
  invert = <ufunc 'invert'>
  isfinite = <ufunc 'isfinite'>
  isinf = <ufunc 'isinf'>
  isnan = <ufunc 'isnan'>
  ldexp = <ufunc 'ldexp'>
  left_shift = <ufunc 'left_shift'>
  less = <ufunc 'less'>
  less_equal = <ufunc 'less_equal'>
  little_endian = True
  log = <ufunc 'log'>
  log10 = <ufunc 'log10'>
  log1p = <ufunc 'log1p'>
  logical_and = <ufunc 'logical_and'>
  logical_not = <ufunc 'logical_not'>
  logical_or = <ufunc 'logical_or'>
  logical_xor = <ufunc 'logical_xor'>
  maximum = <ufunc 'maximum'>
  mgrid = <numpy.lib.index_tricks.nd_grid object at 0x11aa350>
  minimum = <ufunc 'minimum'>
  mod = <ufunc 'remainder'>
  modf = <ufunc 'modf'>
  multiply = <ufunc 'multiply'>
  nan = nan
  nbytes = {<type 'numpy.int64'>: 8, <type 'numpy.int16'>: 2, <t...
  negative = <ufunc 'negative'>
  newaxis = None
  not_equal = <ufunc 'not_equal'>
  ogrid = <numpy.lib.index_tricks.nd_grid object at 0x11aa330>
  ones_like = <ufunc 'ones_like'>
  pi = 3.14159265359
  power = <ufunc 'power'>
  r_ = <numpy.lib.index_tricks.r_class object at 0x11a5c30>
  reciprocal = <ufunc 'reciprocal'>
  remainder = <ufunc 'remainder'>
  right_shift = <ufunc 'right_shift'>
  rint = <ufunc 'rint'>
  s_ = <numpy.lib.index_tricks._index_expression_class object at...
  sctypeDict = {0: <type 'numpy.bool_'>, 1: <type 'numpy.int8'>,...
  sctypeNA = {'?': 'Bool', 'B': 'UInt8', 'Bool': <type 'numpy.bo...
  sctypes = {'complex': [<type 'numpy.complex64'>, <type 'numpy....
  sign = <ufunc 'sign'>
  signbit = <ufunc 'signbit'>
  sin = <ufunc 'sin'>
  sinh = <ufunc 'sinh'>
  sqrt = <ufunc 'sqrt'>
  square = <ufunc 'square'>
  subtract = <ufunc 'subtract'>
  tan = <ufunc 'tan'>
  tanh = <ufunc 'tanh'>
  true_divide = <ufunc 'true_divide'>
  typeDict = {0: <type 'numpy.bool_'>, 1: <type 'numpy.int8'>, 2...
  typeNA = {'?': 'Bool', 'B': 'UInt8', 'Bool': <type 'numpy.bool...
  typecodes = {'All': '?bhilqpBHILQPfdgFDGSUVO', 'AllFloat': 'fd...
Variables Details [hide private]

ScalarType

Value:
(<type 'int'>,
 <type 'float'>,
 <type 'complex'>,
 <type 'long'>,
 <type 'bool'>,
 <type 'str'>,
 <type 'unicode'>,
 <type 'buffer'>,
...

cast

Value:
{<type 'numpy.int64'>: <function <lambda> at 0x123bf30>, <type 'numpy.\
int16'>: <function <lambda> at 0x123bf70>, <type 'numpy.complex128'>: \
<function <lambda> at 0x123bfb0>, <type 'numpy.int32'>: <function <lam\
bda> at 0x761030>, <type 'numpy.uint32'>: <function <lambda> at 0x7610\
70>, <type 'numpy.unicode_'>: <function <lambda> at 0x7610b0>, <type '\
numpy.complex64'>: <function <lambda> at 0x7610f0>, <type 'numpy.int32\
'>: <function <lambda> at 0x761130>, <type 'numpy.uint32'>: <function \
<lambda> at 0x761170>, <type 'numpy.string_'>: <function <lambda> at 0\
...

index_exp

Value:
<numpy.lib.index_tricks._index_expression_class object at 0x11b4590>

nbytes

Value:
{<type 'numpy.int64'>: 8, <type 'numpy.int16'>: 2, <type 'numpy.comple\
x128'>: 16, <type 'numpy.int32'>: 4, <type 'numpy.uint32'>: 4, <type '\
numpy.unicode_'>: 0, <type 'numpy.complex64'>: 8, <type 'numpy.int32'>\
: 4, <type 'numpy.uint32'>: 4, <type 'numpy.string_'>: 0, <type 'numpy\
.float128'>: 16, <type 'numpy.uint16'>: 2, <type 'numpy.object_'>: 4, \
<type 'numpy.float64'>: 8, <type 'numpy.int8'>: 1, <type 'numpy.uint8'\
>: 1, <type 'numpy.bool_'>: 1, <type 'numpy.float32'>: 4, <type 'numpy\
.uint64'>: 8, <type 'numpy.complex256'>: 32, <type 'numpy.void'>: 0}

s_

Value:
<numpy.lib.index_tricks._index_expression_class object at 0x11b45b0>

sctypeDict

Value:
{0: <type 'numpy.bool_'>,
 1: <type 'numpy.int8'>,
 2: <type 'numpy.uint8'>,
 3: <type 'numpy.int16'>,
 4: <type 'numpy.uint16'>,
 5: <type 'numpy.int32'>,
 6: <type 'numpy.uint32'>,
 7: <type 'numpy.int32'>,
...

sctypeNA

Value:
{'?': 'Bool',
 'B': 'UInt8',
 'Bool': <type 'numpy.bool_'>,
 'Complex128': <type 'numpy.complex256'>,
 'Complex32': <type 'numpy.complex64'>,
 'Complex64': <type 'numpy.complex128'>,
 'D': 'Complex64',
 'F': 'Complex32',
...

sctypes

Value:
{'complex': [<type 'numpy.complex64'>,
             <type 'numpy.complex128'>,
             <type 'numpy.complex256'>],
 'float': [<type 'numpy.float32'>,
           <type 'numpy.float64'>,
           <type 'numpy.float128'>],
 'int': [<type 'numpy.int8'>,
         <type 'numpy.int16'>,
...

typeDict

Value:
{0: <type 'numpy.bool_'>,
 1: <type 'numpy.int8'>,
 2: <type 'numpy.uint8'>,
 3: <type 'numpy.int16'>,
 4: <type 'numpy.uint16'>,
 5: <type 'numpy.int32'>,
 6: <type 'numpy.uint32'>,
 7: <type 'numpy.int32'>,
...

typeNA

Value:
{'?': 'Bool',
 'B': 'UInt8',
 'Bool': <type 'numpy.bool_'>,
 'Complex128': <type 'numpy.complex256'>,
 'Complex32': <type 'numpy.complex64'>,
 'Complex64': <type 'numpy.complex128'>,
 'D': 'Complex64',
 'F': 'Complex32',
...

typecodes

Value:
{'All': '?bhilqpBHILQPfdgFDGSUVO',
 'AllFloat': 'fdgFDG',
 'AllInteger': 'bBhHiIlLqQpP',
 'Character': 'S1',
 'Complex': 'FDG',
 'Float': 'fdg',
 'Integer': 'bhilqp',
 'UnsignedInteger': 'BHILQP'}