Package nltk :: Package tag :: Module brill :: Class FastBrillTaggerTrainer

Class FastBrillTaggerTrainer

object --+
         |
        FastBrillTaggerTrainer

A faster trainer for brill taggers.

Instance Methods

[hide private]

__init__(self, initial_tagger, templates, trace=0, deterministic=None)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature

source code

train(self, train_sents, max_rules=200, min_score=2)

source code

_init_mappings(self, test_sents, train_sents)
Initialize the tag position mapping & the rule related mappings.

source code

_clean(self)

source code

_find_rules(self, sent, wordnum, new_tag)
Use the templates to find rules that apply at index wordnum in the sentence sent and generate the tag new_tag. source code

_update_rule_applies(self, rule, sentnum, wordnum, train_sents)
Update the rule data tables to reflect the fact that rule applies at the position (sentnum, wordnum). source code

_update_rule_not_applies(self, rule, sentnum, wordnum)
Update the rule data tables to reflect the fact that rule does not apply at the position (sentnum, wordnum). source code

_best_rule(self, train_sents, test_sents, min_score)
Find the next best rule.

source code

_apply_rule(self, rule, test_sents)
Update test_sents by applying rule everywhere where its conditions are meet. source code

_update_tag_positions(self, rule)
Update _tag_positions to reflect the changes to tags that are made by rule. source code

_update_rules(self, rule, train_sents, test_sents)
Check if we should add or remove any rules from consideration, given the changes made by rule. source code

_trace_header(self)

source code

_trace_rule(self, rule)

source code

_trace_apply(self, num_updates)

source code

_trace_update_rules(self, num_obsolete, num_new, num_unseen)

source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Instance Variables

[hide private]

_tag_positions
Mapping from tags to lists of positions that use that tag.

_rules_by_position
Mapping from positions to the set of rules that are known to occur at that position.

_positions_by_rule
Mapping from rule to position to effect, specifying the effect that each rule has on the overall score, at each position.

_rules_by_score
Mapping from scores to the set of rules whose effect on the overall score is upper bounded by that score.

_rule_scores
Mapping from rules to upper bounds on their effects on the overall score.

_first_unknown_position
Mapping from rules to the first position where we're unsure if the rule applies.

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

init(self, initial_tagger, templates, trace=0, deterministic=None)
(Constructor)

source code

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__: (inherited documentation)

_init_mappings(self, test_sents, train_sents)

source code

Initialize the tag position mapping & the rule related mappings. For each error in test_sents, find new rules that would correct them, and add them to the rule mappings.

_best_rule(self, train_sents, test_sents, min_score)

source code

Find the next best rule. This is done by repeatedly taking a rule with the highest score and stepping through the corpus to see where it applies. When it makes an error (decreasing its score) it's bumped down, and we try a new rule with the highest score. When we find a rule which has the highest score AND which has been tested against the entire corpus, we can conclude that it's the next best rule.

Instance Variable Details

[hide private]

_rules_by_position

Mapping from positions to the set of rules that are known to occur at that position. Position is (sentnum, wordnum). Initially, this will only contain positions where each rule applies in a helpful way; but when we examine a rule, we'll extend this list to also include positions where each rule applies in a harmful or neutral way.

_positions_by_rule

Mapping from rule to position to effect, specifying the effect that each rule has on the overall score, at each position. Position is (sentnum, wordnum); and effect is -1, 0, or 1. As with _rules_by_position, this mapping starts out only containing rules with positive effects; but when we examine a rule, we'll extend this mapping to include the positions where the rule is harmful or neutral.

_rules_by_score

Mapping from scores to the set of rules whose effect on the overall score is upper bounded by that score. Invariant: rulesByScore[s] will contain r iff the sum of _positions_by_rule[r] is s.

_rule_scores

Mapping from rules to upper bounds on their effects on the overall score. This is the inverse mapping to _rules_by_score. Invariant: ruleScores[r] = sum(_positions_by_rule[r])

_first_unknown_position

Mapping from rules to the first position where we're unsure if the rule applies. This records the next position we need to check to see if the rule messed anything up.

Class FastBrillTaggerTrainer

__init__(self, initial_tagger, templates, trace=0, deterministic=None) (Constructor)

_init_mappings(self, test_sents, train_sents)

_best_rule(self, train_sents, test_sents, min_score)

_rules_by_position

_positions_by_rule

_rules_by_score

_rule_scores

_first_unknown_position

init(self, initial_tagger, templates, trace=0, deterministic=None)
(Constructor)