Package nltk :: Package tag :: Module brill :: Class BrillTaggerTrainer
[hide private]
[frames] | no frames]

Class BrillTaggerTrainer

source code

object --+
         |
        BrillTaggerTrainer

A trainer for brill taggers.

Instance Methods [hide private]
 
__init__(self, initial_tagger, templates, trace=0, deterministic=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
train(self, train_sents, max_rules=200, min_score=2)
Trains the Brill tagger on the corpus train_token, producing at most max_rules transformations, each of which reduces the net number of errors in the corpus by at least min_score.
source code
 
_best_rule(self, test_sents, train_sents) source code
 
_find_rules(self, test_sents, train_sents)
Find all rules that correct at least one token's tag in test_sents.
source code
Set
_find_rules_at(self, test_sent, train_sent, i)
Returns: the set of all rules (based on the templates) that correct token i's tag in test_sent.
source code
 
_trace_header(self) source code
 
_trace_rule(self, rule, score, fixscore, numchanges) source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, initial_tagger, templates, trace=0, deterministic=None)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:
  • deterministic - If true, then choose between rules that have the same score by picking the one whose __repr__ is lexicographically smaller. If false, then just pick the first rule we find with a given score -- this will depend on the order in which keys are returned from dictionaries, and so may not be the same from one run to the next. If not specified, treat as true iff trace > 0.
Overrides: object.__init__

train(self, train_sents, max_rules=200, min_score=2)

source code 

Trains the Brill tagger on the corpus train_token, producing at most max_rules transformations, each of which reduces the net number of errors in the corpus by at least min_score.

Parameters:
  • train_sents (list of list of tuple) - The corpus of tagged tokens
  • max_rules (int) - The maximum number of transformations to be created
  • min_score (int) - The minimum acceptable net error reduction that each transformation must produce in the corpus.

_find_rules(self, test_sents, train_sents)

source code 

Find all rules that correct at least one token's tag in test_sents.

Returns:
A list of tuples (rule, fixscore), where rule is a brill rule and fixscore is the number of tokens whose tag the rule corrects. Note that fixscore does not include the number of tokens whose tags are changed to incorrect values.

_find_rules_at(self, test_sent, train_sent, i)

source code 
Returns: Set
the set of all rules (based on the templates) that correct token i's tag in test_sent.