Package nltk :: Package tag :: Module brill
[hide private]
[frames] | no frames]

Module brill

source code

Brill's transformational rule-based tagger.

Classes [hide private]
  BrillTagger
Brill's transformational rule-based tagger.
  BrillRule
An interface for tag transformations on a tagged corpus, as performed by brill taggers.
  ProximateTokensRule
An abstract base class for brill rules whose condition checks for the presence of tokens with given properties at given ranges of positions, relative to the token.
  ProximateTagsRule
A rule which examines the tags of nearby tokens.
  ProximateWordsRule
A rule which examines the base types of nearby tokens.
  BrillTemplateI
An interface for generating lists of transformational rules that apply at given sentence positions.
  ProximateTokensTemplate
An brill templates that generates a list of ProximateTokensRules that apply at a given sentence position.
  SymmetricProximateTokensTemplate
Simulates two ProximateTokensTemplates which are symmetric across the location of the token.
  BrillTaggerTrainer
A trainer for brill taggers.
  FastBrillTaggerTrainer
A faster trainer for brill taggers.
Functions [hide private]
 
error_list(train_sents, test_sents, radius=2)
Returns a list of human-readable strings indicating the errors in the given tagging of the corpus.
source code
 
demo(num_sents=100, max_rules=200, min_score=3, error_output='errors.out', rule_output='rules.yaml', randomize=False, train=0.8, trace=3)
Brill Tagger Demonstration
source code
Function Details [hide private]

error_list(train_sents, test_sents, radius=2)

source code 

Returns a list of human-readable strings indicating the errors in the given tagging of the corpus.

Parameters:
  • train_sents (list of tuple) - The correct tagging of the corpus
  • test_sents (list of tuple) - The tagged corpus
  • radius (int) - How many tokens on either side of a wrongly-tagged token to include in the error string. For example, if radius=2, each error string will show the incorrect token plus two tokens on either side.

demo(num_sents=100, max_rules=200, min_score=3, error_output='errors.out', rule_output='rules.yaml', randomize=False, train=0.8, trace=3)

source code 

Brill Tagger Demonstration

Parameters:
  • num_sents (int) - how many sentences of training and testing data to use
  • max_rules (int) - maximum number of rule instances to create
  • min_score (int) - the minimum score for a rule in order for it to be considered
  • error_output (string) - the file where errors will be saved
  • rule_output (string) - the file where rules will be saved
  • randomize (boolean) - whether the training data should be a random subset of the corpus
  • train (float) - the fraction of the the corpus to be used for training (1=all)
  • trace (int) - the level of diagnostic tracing output to produce (0-4)