nltk.tag.brill

Module brill

Brill's transformational rule-based tagger.

Classes

BrillTagger
Brill's transformational rule-based tagger.

BrillRule
An interface for tag transformations on a tagged corpus, as performed by brill taggers.

ProximateTokensRule
An abstract base class for brill rules whose condition checks for the presence of tokens with given properties at given ranges of positions, relative to the token.

ProximateTagsRule
A rule which examines the tags of nearby tokens.

ProximateWordsRule
A rule which examines the base types of nearby tokens.

BrillTemplateI
An interface for generating lists of transformational rules that apply at given sentence positions.

ProximateTokensTemplate
An brill templates that generates a list of ProximateTokensRules that apply at a given sentence position.

SymmetricProximateTokensTemplate
Simulates two ProximateTokensTemplates which are symmetric across the location of the token.

BrillTaggerTrainer
A trainer for brill taggers.

FastBrillTaggerTrainer
A faster trainer for brill taggers.

Functions

[hide private]

error_list(train_sents, test_sents, radius=2)
Returns a list of human-readable strings indicating the errors in the given tagging of the corpus.

source code

demo(num_sents=100, max_rules=200, min_score=3, error_output='errors.out', rule_output='rules.yaml', randomize=False, train=0.8, trace=3)
Brill Tagger Demonstration source code

Function Details

[hide private]

error_list(train_sents, test_sents, radius=2)

source code

Returns a list of human-readable strings indicating the errors in the given tagging of the corpus.

Parameters:

train_sents (list of tuple) - The correct tagging of the corpus
test_sents (list of tuple) - The tagged corpus
radius (int) - How many tokens on either side of a wrongly-tagged token to include in the error string. For example, if radius=2, each error string will show the incorrect token plus two tokens on either side.

demo(num_sents=100, max_rules=200, min_score=3, error_output=`'errors.out'`, rule_output=`'rules.yaml'`, randomize=False, train=0.8, trace=3)

source code

Brill Tagger Demonstration

Parameters:

num_sents (int) - how many sentences of training and testing data to use
max_rules (int) - maximum number of rule instances to create
min_score (int) - the minimum score for a rule in order for it to be considered
error_output (string) - the file where errors will be saved
rule_output (string) - the file where rules will be saved
randomize (boolean) - whether the training data should be a random subset of the corpus
train (float) - the fraction of the the corpus to be used for training (1=all)
trace (int) - the level of diagnostic tracing output to produce (0-4)

Module brill

error_list(train_sents, test_sents, radius=2)

demo(num_sents=100, max_rules=200, min_score=3, error_output='errors.out', rule_output='rules.yaml', randomize=False, train=0.8, trace=3)

demo(num_sents=100, max_rules=200, min_score=3, error_output=`'errors.out'`, rule_output=`'rules.yaml'`, randomize=False, train=0.8, trace=3)