Package nltk :: Package classify :: Module maxent :: Class BinaryMaxentFeatureEncoding
[hide private]
[frames] | no frames]

Class BinaryMaxentFeatureEncoding

source code

            object --+    
                     |    
MaxentFeatureEncodingI --+
                         |
                        BinaryMaxentFeatureEncoding
Known Subclasses:

A feature encoding that generates vectors containing a binary joint-features of the form:

 joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
                     {
                     { 0 otherwise

Where fname is the name of an input-feature, fval is a value for that input-feature, and label is a label.

Typically, these features are constructed based on a training corpus, using the train() method. This method will create one feature for each combination of fname, fval, and label that occurs at least once in the training corpus.

The unseen_features parameter can be used to add unseen-value features, which are used whenever an input feature has a value that was not encountered in the training corpus. These features have the form:

 joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
                     {      and l == label
                     {
                     { 0 otherwise

Where is_unseen(fname, fval) is true if the encoding does not contain any joint features that are true when fs[fname]==fval.

The alwayson_features parameter can be used to add always-on features, which have the form:

 joint_feat(fs, l) = { 1 if (l == label)
                     {
                     { 0 otherwise

These always-on features allow the maxent model to directly model the prior probabilities of each label.

Instance Methods [hide private]
 
__init__(self, labels, mapping, unseen_features=False, alwayson_features=False)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
list of (int, number)
encode(self, featureset, label)
Given a (featureset, label) pair, return the corresponding vector of joint-feature values.
source code
str
describe(self, f_id)
Returns: A string describing the value of the joint-feature whose index in the generated feature vectors is fid.
source code
list
labels(self)
Returns: A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.
source code
int
length(self)
Returns: The size of the fixed-length joint-feature vectors that are generated by this encoding.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Class Methods [hide private]
 
train(cls, train_toks, count_cutoff=0, labels=None, **options)
Construct and return new feature encoding, based on a given training corpus train_toks.
source code
Instance Variables [hide private]
  _labels
A list of attested labels.
  _mapping
dict mapping from (fname,fval,label) -> fid
  _length
The length of generated joint feature vectors.
  _alwayson
dict mapping from label -> fid
  _unseen
dict mapping from fname -> fid
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, labels, mapping, unseen_features=False, alwayson_features=False)
(Constructor)

source code 

x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Parameters:
  • labels - A list of the "known labels" for this encoding.
  • mapping - A dictionary mapping from (fname,fval,label) tuples to corresponding joint-feature indexes. These indexes must be the set of integers from 0...len(mapping). If mapping[fname,fval,label]=id, then self.encode({..., fname:fval, ...}, label)[id] is 1; otherwise, it is 0.
  • unseen_features - If true, then include unseen value features in the generated joint-feature vectors.
  • alwayson_features - If true, then include always-on features in the generated joint-feature vectors.
Overrides: object.__init__

encode(self, featureset, label)

source code 

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of (index, value) tuples, specifying the value of each non-zero joint-feature.

Returns: list of (int, number)
Overrides: MaxentFeatureEncodingI.encode
(inherited documentation)

describe(self, f_id)

source code 
Returns: str
A string describing the value of the joint-feature whose index in the generated feature vectors is fid.
Overrides: MaxentFeatureEncodingI.describe
(inherited documentation)

labels(self)

source code 
Returns: list
A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.
Overrides: MaxentFeatureEncodingI.labels
(inherited documentation)

length(self)

source code 
Returns: int
The size of the fixed-length joint-feature vectors that are generated by this encoding.
Overrides: MaxentFeatureEncodingI.length
(inherited documentation)

train(cls, train_toks, count_cutoff=0, labels=None, **options)
Class Method

source code 

Construct and return new feature encoding, based on a given training corpus train_toks. See the class description for a description of the joint-features that will be included in this encoding.

Parameters:
  • train_toks (list of tuples of (dict, str)) - Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.
  • count_cutoff (int) - A cutoff value that is used to discard rare joint-features. If a joint-feature's value is 1 fewer than count_cutoff times in the training corpus, then that joint-feature is not included in the generated encoding.
  • labels (list) - A list of labels that should be used by the classifier. If not specified, then the set of labels attested in train_toks will be used.
  • options - Extra parameters for the constructor, such as unseen_features and alwayson_features.
Overrides: MaxentFeatureEncodingI.train