Package nltk :: Package tag :: Module util
[hide private]
[frames] | no frames]

Module util

source code

Functions [hide private]
 
str2tuple(s, sep='/')
Given the string representation of a tagged token, return the corresponding tuple representation.
source code
 
tuple2str(tagged_token, sep='/')
Given the tuple representation of a tagged token, return the corresponding string representation.
source code
 
untag(tagged_sentence)
Given a tagged sentence, return an untagged version of that sentence.
source code
float
accuracy(tagger, gold)
Score the accuracy of the tagger against the gold standard.
source code
Function Details [hide private]

str2tuple(s, sep='/')

source code 

Given the string representation of a tagged token, return the corresponding tuple representation. The rightmost occurence of sep in s will be used to divide s into a word string and a tag string. If sep does not occur in s, return (s, None).

Parameters:
  • s (str) - The string representaiton of a tagged token.
  • sep (str) - The separator string used to separate word strings from tags.

tuple2str(tagged_token, sep='/')

source code 

Given the tuple representation of a tagged token, return the corresponding string representation. This representation is formed by concatenating the token's word string, followed by the separator, followed by the token's tag. (If the tag is None, then just return the bare word string.)

Parameters:
  • tagged_token ((str, str)) - The tuple representation of a tagged token.
  • sep (str) - The separator string used to separate word strings from tags.

untag(tagged_sentence)

source code 

Given a tagged sentence, return an untagged version of that sentence. I.e., return a list containing the first element of each tuple in tagged_sentence.

>>> untag([('John', 'NNP'), ('saw', 'VBD'), ('Mary', 'NNP')]
['John', 'saw', 'mary']

accuracy(tagger, gold)

source code 

Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.

Parameters:
  • tagger (TaggerI) - The tagger being evaluated.
  • gold (list of Token) - The list of tagged tokens to score the tagger on.
Returns: float