Package nltk :: Package tag :: Module sequential :: Class ContextTagger
[hide private]
[frames] | no frames]

Class ContextTagger

source code

         object --+        
                  |        
        api.TaggerI --+    
                      |    
SequentialBackoffTagger --+
                          |
                         ContextTagger
Known Subclasses:

An abstract base class for sequential backoff taggers that choose a tag for a token based on the value of its "context". Different subclasses are used to define different contexts.

A ContextTagger chooses the tag for a token by calculating the token's context, and looking up the corresponding tag in a table. This table can be constructed manually; or it can be automatically constructed based on a training corpus, using the _train() factory method.

Instance Methods [hide private]
 
__init__(self, context_to_tag, backoff=None) source code
(hashable)
context(self, tokens, index, history)
Returns: the context that should be used to look up the tag for the specified token; or None if the specified token should not be handled by this tagger.
source code
str
choose_tag(self, tokens, index, history)
Decide which tag should be used for the specified token, and return that tag.
source code
 
size(self)
Returns: The number of entries in the table used by this tagger to map from contexts to tags.
source code
 
__repr__(self)
repr(x)
source code
 
_train(self, tagged_corpus, cutoff=1, verbose=False)
Initialize this ContextTagger's _context_to_tag table based on the given training data.
source code

Inherited from SequentialBackoffTagger: tag, tag_one

Inherited from SequentialBackoffTagger (private): _get_backoff

Inherited from api.TaggerI: batch_tag

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

    Deprecated

Inherited from SequentialBackoffTagger: tag_sents

Instance Variables [hide private]
  _context_to_tag
Dictionary mapping contexts to tags.

Inherited from SequentialBackoffTagger (private): _taggers

Properties [hide private]

Inherited from SequentialBackoffTagger: backoff

Inherited from object: __class__

Method Details [hide private]

__init__(self, context_to_tag, backoff=None)
(Constructor)

source code 
Parameters:
  • context_to_tag - A dictionary mapping contexts to tags.
  • backoff - The backoff tagger that should be used for this tagger.
Overrides: SequentialBackoffTagger.__init__

context(self, tokens, index, history)

source code 
Returns: (hashable)
the context that should be used to look up the tag for the specified token; or None if the specified token should not be handled by this tagger.

choose_tag(self, tokens, index, history)

source code 

Decide which tag should be used for the specified token, and return that tag. If this tagger is unable to determine a tag for the specified token, return None -- do not consult the backoff tagger. This method should be overridden by subclasses of SequentialBackoffTagger.

Parameters:
  • tokens - The list of words that are being tagged.
  • index - The index of the word whose tag should be returned.
  • history - A list of the tags for all words before index.
Returns: str
Overrides: SequentialBackoffTagger.choose_tag
(inherited documentation)

size(self)

source code 
Returns:
The number of entries in the table used by this tagger to map from contexts to tags.

__repr__(self)
(Representation operator)

source code 

repr(x)

Overrides: object.__repr__
(inherited documentation)

_train(self, tagged_corpus, cutoff=1, verbose=False)

source code 

Initialize this ContextTagger's _context_to_tag table based on the given training data. In particular, for each context c in the training data, set _context_to_tag[c] to the most frequent tag for that context. However, exclude any contexts that are already tagged perfectly by the backoff tagger(s).

The old value of self._context_to_tag (if any) is discarded.

Parameters:
  • tagged_corpus - A tagged corpus. Each item should be a list of (word, tag) tuples.
  • cutoff - If the most likely tag for a context occurs fewer than cutoff times, then exclude it from the context-to-tag table for the new tagger.