Package nltk :: Package classify :: Module maxent :: Class BinaryMaxentFeatureEncoding
Class BinaryMaxentFeatureEncoding

            object --+    
MaxentFeatureEncodingI --+
Known Subclasses:

A feature encoding that generates vectors containing a binary joint-features of the form:

 joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
                     { 0 otherwise

Where fname is the name of an input-feature, fval is a value for that input-feature, and label is a label.

Typically, these features are constructed based on a training corpus, using the train() method. This method will create one feature for each combination of fname, fval, and label that occurs at least once in the training corpus.

The unseen_features parameter can be used to add unseen-value features, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:

 joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
                     {      and l == label
                     { 0 otherwise

Where is_unseen(fname, fval) is true if the encoding does not contain any joint features that are true when fs[fname]==fval.

The alwayson_features parameter can be used to add always-on features, which have the form:

 joint_feat(fs, l) = { 1 if (l == label)
                     { 0 otherwise

These always-on features allow the maxent model to directly model the prior probabilities of each label.

__init__(self, labels, mapping, unseen_features=False, alwayson_features=False)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
list of (int, number)
encode(self, featureset, label)
Given a (featureset, label) pair, return the corresponding vector of joint-feature values.
describe(self, f_id)
Returns: A string describing the value of the joint-feature whose index in the generated feature vectors is fid.
Returns: A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.
source code
Returns: The size of the fixed-length joint-feature vectors that are generated by this encoding.
Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

train(cls, train_toks, count_cutoff=0, labels=None, **options)
Construct and return new feature encoding, based on a given training corpus train_toks.
Instance Variables [hide private]
A list of attested labels.
dict mapping from (fname,fval,label) -> fid
The length of generated joint feature vectors.
dict mapping from label -> fid
dict mapping from fname -> fid
Inherited from object: __class__

__init__(self, labels, mapping, unseen_features=False, alwayson_features=False)

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

  • labels - A list of the "known labels" for this encoding.
  • mapping - A dictionary mapping from (fname,fval,label) tuples to corresponding joint-feature indexes. These indexes must be the set of integers from 0...len(mapping). If mapping[fname,fval,label]=id, then self.encode({..., fname:fval, ...}, label)[id] is 1; otherwise, it is 0.
  • unseen_features - If true, then include unseen value features in the generated joint-feature vectors.
  • alwayson_features - If true, then include always-on features in the generated joint-feature vectors.
Overrides: object.__init__

encode(self, featureset, label)

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of (index, value) tuples, specifying the value of each non-zero joint-feature.

Returns: list of (int, number)
Overrides: MaxentFeatureEncodingI.encode
(inherited documentation)

describe(self, f_id)

Returns: str
A string describing the value of the joint-feature whose index in the generated feature vectors is fid.
Overrides: MaxentFeatureEncodingI.describe
(inherited documentation)


Returns: list
A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.
Overrides: MaxentFeatureEncodingI.labels
(inherited documentation)


Returns: int
The size of the fixed-length joint-feature vectors that are generated by this encoding.
Overrides: MaxentFeatureEncodingI.length
(inherited documentation)

train(cls, train_toks, count_cutoff=0, labels=None, **options)
Class Method

Construct and return new feature encoding, based on a given training corpus train_toks. See the class description for a description of the joint-features that will be included in this encoding.

  • train_toks (list of tuples of (dict, str)) - Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.
  • count_cutoff (int) - A cutoff value that is used to discard rare joint-features. If a joint-feature's value is 1 fewer than count_cutoff times in the training corpus, then that joint-feature is not included in the generated encoding.
  • labels (list) - A list of labels that should be used by the classifier. If not specified, then the set of labels attested in train_toks will be used.
  • options - Extra parameters for the constructor, such as unseen_features and alwayson_features.
Overrides: MaxentFeatureEncodingI.train