Home | Trees | Indices | Help |
|
---|
|
A classifier model based on maximum entropy modeling framework. This framework considers all of the probability distributions that are empirically consistant with the training data; and chooses the distribution with the highest entropy. A probability distribution is empirically consistant with a set of training data if its estimated frequency with which a class and a feature vector value co-occur is equal to the actual frequency in the data.
The term feature is usually used to refer to some property of
an unlabeled token. For example, when performing word sense
disambiguation, we might define a 'prevword'
feature whose
value is the word preceeding the target word. However, in the context
of maxent modeling, the term feature is typically used to refer
to a property of a labeled token. In order to prevent confusion, we
will introduce two distinct terms to disambiguate these two different
concepts:
In the rest of the nltk.classify
module, the term features is used to
refer to what we will call input-features in this module.
In literature that describes and discusses maximum entropy models, input-features are typically called contexts, and joint-features are simply referred to as features.
In maximum entropy models, joint-features are required to have
numeric values. Typically, each input-feature
input_feat
is mapped to a set of joint-features of the
form:
joint_feat(token, label) = { 1 if input_feat(token) == feat_val { and label == some_label { { 0 otherwise
For all values of feat_val
and
some_label
. This mapping is performed by classes that
implement the MaxentFeatureEncodingI interface.
|
|||
Classifier Model | |||
---|---|---|---|
MaxentClassifier A maximum entropy classifier (also known as a conditional exponential classifier). |
|||
ConditionalExponentialClassifier Alias for MaxentClassifier. |
|||
Feature Encodings | |||
MaxentFeatureEncodingI A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. |
|||
FunctionBackedMaxentFeatureEncoding A feature encoding that calls a user-supplied function to map a given featureset/label pair to a sparse joint-feature vector. |
|||
BinaryMaxentFeatureEncoding A feature encoding that generates vectors containing a binary joint-features of the form: |
|||
GISEncoding A binary feature encoding which adds one new joint-feature to the joint-features defined by BinaryMaxentFeatureEncoding: a correction feature, whose value is chosen to ensure that the sparse vector always sums to a constant non-negative number. |
|
|||
Classifier Trainer: Generalized Iterative Scaling | |||
---|---|---|---|
|
|||
|
|||
|
|||
Classifier Trainer: Improved Iterative Scaling | |||
|
|||
dictionary from int to int
|
|
||
|
|||
Classifier Trainer: scipy algorithms (GC, LBFGSB, etc.) | |||
|
|||
Classifier Trainer: megam | |||
|
|||
Demo | |||
|
|
Train a new See Also: train_maxent_classifier() for parameter descriptions. |
Train a new See Also: train_maxent_classifier() for parameter descriptions. |
Construct a map that can be used to compress nf(feature_vector) is the sum of the feature values for feature_vector. This represents the number of features that are active for a given
labeled text. This method finds all values of nf(t)
that are attested for at least one token in the given list of training
tokens; and constructs a dictionary mapping these attested values to a
continuous range 0...N. For example, if the only
values of nf() that were attested were 3, 5, and 7,
then
|
Calculate the update values for the classifier weights for this
iteration of IIS. These update weights are the value of
ffreq_empirical[i] = SUM[fs,l] (classifier.prob_classify(fs).prob(l) * feature_vector(fs,l)[i] * exp(delta[i] * nf(feature_vector(fs,l)))) Where:
This method uses Newton's method to solve this equation for delta[i]. In particular, it starts with a guess of
delta[i] -= (ffreq_empirical[i] - sum1[i])/(-sum2[i]) until convergence, where sum1 and sum2 are defined as: sum1[i](delta) = SUM[fs,l] f[i](fs,l,delta) sum2[i](delta) = SUM[fs,l] (f[i](fs,l,delta) * nf(feature_vector(fs,l))) f[i](fs,l,delta) = (classifier.prob_classify(fs).prob(l) * feature_vector(fs,l)[i] * exp(delta[i] * nf(feature_vector(fs,l)))) Note that sum1 and sum2 depend
on The variables
|
Train a new See Also: train_maxent_classifier() for parameter descriptions. Requires:
The |
Train a new
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0beta1 on Wed Aug 27 15:08:49 2008 | http://epydoc.sourceforge.net |