Package nltk :: Package tag
[hide private]
[frames] | no frames]

Package tag

source code

Classes and interfaces for tagging each token of a sentence with supplementary information, such as its part of speech. This task, which is known as tagging, is defined by the TaggerI interface.

Submodules [hide private]
  • nltk.tag.api: Interface for tagging each token in a sentence with supplementary information, such as its part of speech.
  • nltk.tag.brill: Brill's transformational rule-based tagger.
  • nltk.tag.crf: An interface to Mallet's Linear Chain Conditional Random Field (LC-CRF) implementation.
  • nltk.tag.hmm: Hidden Markov Models (HMMs) largely used to assign the correct label sequence to sequential data or assess the probability of a given label and data sequence.
  • nltk.tag.sequential: Classes for tagging sentences sequentially, left to right.
  • nltk.tag.simplify
  • nltk.tag.util

Classes [hide private]
  AffixTagger
A tagger that chooses a token's tag based on a leading or trailing substring of its word string.
  BigramTagger
A tagger that chooses a token's tag based its word string and on the preceeding words' tag.
  BrillTagger
Brill's transformational rule-based tagger.
  BrillTaggerTrainer
A trainer for brill taggers.
  DefaultTagger
A tagger that assigns the same tag to every token.
  FastBrillTaggerTrainer
A faster trainer for brill taggers.
  HiddenMarkovModelTagger
Hidden Markov model class, a generative model for labelling sequence data.
  HiddenMarkovModelTrainer
Algorithms for learning HMM parameters from training data.
  NgramTagger
A tagger that chooses a token's tag based on its word string and on the preceeding n word's tags.
  RegexpTagger
A tagger that assigns tags to words based on regular expressions over word strings.
  TaggerI
A processing interface for assigning a tag to each token in a list.
  TrigramTagger
A tagger that chooses a token's tag based its word string and on the preceeding two words' tags.
  UnigramTagger
A tagger that chooses a token's tag based its word string.
    Deprecated
  Affix
Use nltk.AffixTagger instead.
  Bigram
Use nltk.BigramTagger instead.
  Lookup
Use UnigramTagger instead.
  Ngram
Use nltk.NgramTagger instead.
  Regexp
Use RegexpTagger instead.
  SequentialBackoff
Use nltk.SequentialBackoffTagger instead.
  TagI
Use nltk.TaggerI instead.
  Trigram
Use nltk.TrigramTagger instead.
  Unigram
Use nltk.UnigramTagger instead.
Functions [hide private]
 
untag(tagged_sentence)
Given a tagged sentence, return an untagged version of that sentence.
source code
Variables [hide private]
  ALLOW_THREADS = 1
  BUFSIZE = 10000
  CLIP = 0
  ERR_CALL = 3
  ERR_DEFAULT = 0
  ERR_DEFAULT2 = 2084
  ERR_IGNORE = 0
  ERR_LOG = 5
  ERR_PRINT = 4
  ERR_RAISE = 2
  ERR_WARN = 1
  FLOATING_POINT_SUPPORT = 1
  FPE_DIVIDEBYZERO = 1
  FPE_INVALID = 8
  FPE_OVERFLOW = 2
  FPE_UNDERFLOW = 4
  False_ = False
  Inf = inf
  Infinity = inf
  MAXDIMS = 32
  NAN = nan
  NINF = -inf
  NZERO = -0.0
  NaN = nan
  PINF = inf
  PZERO = 0.0
  RAISE = 2
  SHIFT_DIVIDEBYZERO = 0
  SHIFT_INVALID = 9
  SHIFT_OVERFLOW = 3
  SHIFT_UNDERFLOW = 6
  ScalarType = (<type 'int'>, <type 'float'>, <type 'complex'>, ...
  True_ = True
  UFUNC_BUFSIZE_DEFAULT = 10000
  UFUNC_PYVALS_NAME = 'UFUNC_PYVALS'
  WRAP = 1
  absolute = <ufunc 'absolute'>
  add = <ufunc 'add'>
  arccos = <ufunc 'arccos'>
  arccosh = <ufunc 'arccosh'>
  arcsin = <ufunc 'arcsin'>
  arcsinh = <ufunc 'arcsinh'>
  arctan = <ufunc 'arctan'>
  arctan2 = <ufunc 'arctan2'>
  arctanh = <ufunc 'arctanh'>
  bitwise_and = <ufunc 'bitwise_and'>
  bitwise_not = <ufunc 'invert'>
  bitwise_or = <ufunc 'bitwise_or'>
  bitwise_xor = <ufunc 'bitwise_xor'>
  c_ = <numpy.lib.index_tricks.c_class object at 0x11b44f0>
  cast = {<type 'numpy.int64'>: <function <lambda> at 0x123bf30>...
  ceil = <ufunc 'ceil'>
  conj = <ufunc 'conjugate'>
  conjugate = <ufunc 'conjugate'>
  cos = <ufunc 'cos'>
  cosh = <ufunc 'cosh'>
  divide = <ufunc 'divide'>
  e = 2.71828182846
  equal = <ufunc 'equal'>
  exp = <ufunc 'exp'>
  expm1 = <ufunc 'expm1'>
  fabs = <ufunc 'fabs'>
  floor = <ufunc 'floor'>
  floor_divide = <ufunc 'floor_divide'>
  fmod = <ufunc 'fmod'>
  frexp = <ufunc 'frexp'>
  greater = <ufunc 'greater'>
  greater_equal = <ufunc 'greater_equal'>
  hypot = <ufunc 'hypot'>
  index_exp = <numpy.lib.index_tricks._index_expression_class ob...
  inf = inf
  infty = inf
  invert = <ufunc 'invert'>
  isfinite = <ufunc 'isfinite'>
  isinf = <ufunc 'isinf'>
  isnan = <ufunc 'isnan'>
  ldexp = <ufunc 'ldexp'>
  left_shift = <ufunc 'left_shift'>
  less = <ufunc 'less'>
  less_equal = <ufunc 'less_equal'>
  little_endian = True
  log = <ufunc 'log'>
  log10 = <ufunc 'log10'>
  log1p = <ufunc 'log1p'>
  logical_and = <ufunc 'logical_and'>
  logical_not = <ufunc 'logical_not'>
  logical_or = <ufunc 'logical_or'>
  logical_xor = <ufunc 'logical_xor'>
  maximum = <ufunc 'maximum'>
  mgrid = <numpy.lib.index_tricks.nd_grid object at 0x11aa350>
  minimum = <ufunc 'minimum'>
  mod = <ufunc 'remainder'>
  modf = <ufunc 'modf'>
  multiply = <ufunc 'multiply'>
  nan = nan
  nbytes = {<type 'numpy.int64'>: 8, <type 'numpy.int16'>: 2, <t...
  negative = <ufunc 'negative'>
  newaxis = None
  not_equal = <ufunc 'not_equal'>
  ogrid = <numpy.lib.index_tricks.nd_grid object at 0x11aa330>
  ones_like = <ufunc 'ones_like'>
  pi = 3.14159265359
  power = <ufunc 'power'>
  r_ = <numpy.lib.index_tricks.r_class object at 0x11a5c30>
  reciprocal = <ufunc 'reciprocal'>
  remainder = <ufunc 'remainder'>
  right_shift = <ufunc 'right_shift'>
  rint = <ufunc 'rint'>
  s_ = <numpy.lib.index_tricks._index_expression_class object at...
  sctypeDict = {0: <type 'numpy.bool_'>, 1: <type 'numpy.int8'>,...
  sctypeNA = {'?': 'Bool', 'B': 'UInt8', 'Bool': <type 'numpy.bo...
  sctypes = {'complex': [<type 'numpy.complex64'>, <type 'numpy....
  sign = <ufunc 'sign'>
  signbit = <ufunc 'signbit'>
  sin = <ufunc 'sin'>
  sinh = <ufunc 'sinh'>
  sqrt = <ufunc 'sqrt'>
  square = <ufunc 'square'>
  subtract = <ufunc 'subtract'>
  tan = <ufunc 'tan'>
  tanh = <ufunc 'tanh'>
  true_divide = <ufunc 'true_divide'>
  typeDict = {0: <type 'numpy.bool_'>, 1: <type 'numpy.int8'>, 2...
  typeNA = {'?': 'Bool', 'B': 'UInt8', 'Bool': <type 'numpy.bool...
  typecodes = {'All': '?bhilqpBHILQPfdgFDGSUVO', 'AllFloat': 'fd...
Function Details [hide private]

untag(tagged_sentence)

source code 

Given a tagged sentence, return an untagged version of that sentence. I.e., return a list containing the first element of each tuple in tagged_sentence.

>>> untag([('John', 'NNP'), ('saw', 'VBD'), ('Mary', 'NNP')]
['John', 'saw', 'mary']

Variables Details [hide private]

ScalarType

Value:
(<type 'int'>,
 <type 'float'>,
 <type 'complex'>,
 <type 'long'>,
 <type 'bool'>,
 <type 'str'>,
 <type 'unicode'>,
 <type 'buffer'>,
...

cast

Value:
{<type 'numpy.int64'>: <function <lambda> at 0x123bf30>, <type 'numpy.\
int16'>: <function <lambda> at 0x123bf70>, <type 'numpy.complex128'>: \
<function <lambda> at 0x123bfb0>, <type 'numpy.int32'>: <function <lam\
bda> at 0x761030>, <type 'numpy.uint32'>: <function <lambda> at 0x7610\
70>, <type 'numpy.unicode_'>: <function <lambda> at 0x7610b0>, <type '\
numpy.complex64'>: <function <lambda> at 0x7610f0>, <type 'numpy.int32\
'>: <function <lambda> at 0x761130>, <type 'numpy.uint32'>: <function \
<lambda> at 0x761170>, <type 'numpy.string_'>: <function <lambda> at 0\
...

index_exp

Value:
<numpy.lib.index_tricks._index_expression_class object at 0x11b4590>

nbytes

Value:
{<type 'numpy.int64'>: 8, <type 'numpy.int16'>: 2, <type 'numpy.comple\
x128'>: 16, <type 'numpy.int32'>: 4, <type 'numpy.uint32'>: 4, <type '\
numpy.unicode_'>: 0, <type 'numpy.complex64'>: 8, <type 'numpy.int32'>\
: 4, <type 'numpy.uint32'>: 4, <type 'numpy.string_'>: 0, <type 'numpy\
.float128'>: 16, <type 'numpy.uint16'>: 2, <type 'numpy.object_'>: 4, \
<type 'numpy.float64'>: 8, <type 'numpy.int8'>: 1, <type 'numpy.uint8'\
>: 1, <type 'numpy.bool_'>: 1, <type 'numpy.float32'>: 4, <type 'numpy\
.uint64'>: 8, <type 'numpy.complex256'>: 32, <type 'numpy.void'>: 0}

s_

Value:
<numpy.lib.index_tricks._index_expression_class object at 0x11b45b0>

sctypeDict

Value:
{0: <type 'numpy.bool_'>,
 1: <type 'numpy.int8'>,
 2: <type 'numpy.uint8'>,
 3: <type 'numpy.int16'>,
 4: <type 'numpy.uint16'>,
 5: <type 'numpy.int32'>,
 6: <type 'numpy.uint32'>,
 7: <type 'numpy.int32'>,
...

sctypeNA

Value:
{'?': 'Bool',
 'B': 'UInt8',
 'Bool': <type 'numpy.bool_'>,
 'Complex128': <type 'numpy.complex256'>,
 'Complex32': <type 'numpy.complex64'>,
 'Complex64': <type 'numpy.complex128'>,
 'D': 'Complex64',
 'F': 'Complex32',
...

sctypes

Value:
{'complex': [<type 'numpy.complex64'>,
             <type 'numpy.complex128'>,
             <type 'numpy.complex256'>],
 'float': [<type 'numpy.float32'>,
           <type 'numpy.float64'>,
           <type 'numpy.float128'>],
 'int': [<type 'numpy.int8'>,
         <type 'numpy.int16'>,
...

typeDict

Value:
{0: <type 'numpy.bool_'>,
 1: <type 'numpy.int8'>,
 2: <type 'numpy.uint8'>,
 3: <type 'numpy.int16'>,
 4: <type 'numpy.uint16'>,
 5: <type 'numpy.int32'>,
 6: <type 'numpy.uint32'>,
 7: <type 'numpy.int32'>,
...

typeNA

Value:
{'?': 'Bool',
 'B': 'UInt8',
 'Bool': <type 'numpy.bool_'>,
 'Complex128': <type 'numpy.complex256'>,
 'Complex32': <type 'numpy.complex64'>,
 'Complex64': <type 'numpy.complex128'>,
 'D': 'Complex64',
 'F': 'Complex32',
...

typecodes

Value:
{'All': '?bhilqpBHILQPfdgFDGSUVO',
 'AllFloat': 'fdgFDG',
 'AllInteger': 'bBhHiIlLqQpP',
 'Character': 'S1',
 'Complex': 'FDG',
 'Float': 'fdg',
 'Integer': 'bhilqp',
 'UnsignedInteger': 'BHILQP'}