Package nltk :: Package classify :: Module maxent :: Class MaxentFeatureEncodingI
[hide private]
[frames] | no frames]

Class MaxentFeatureEncodingI

source code

object --+
         |
        MaxentFeatureEncodingI
Known Subclasses:

A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models.

The set of joint-features used by a given encoding is fixed, and each index in the generated joint-feature vectors corresponds to a single joint-feature. The length of the generated joint-feature vectors is therefore constant (for a given encoding).

Because the joint-feature vectors generated by MaxentFeatureEncodingI are typically very sparse, they are represented as a list of (index, value) tuples, specifying the value of each non-zero joint-feature.

Feature encodings are generally created using the train() method, which generates an appropriate encoding based on the input-feature values and labels that are present in a given corpus.

Instance Methods [hide private]
list of (int, number)
encode(self, featureset, label)
Given a (featureset, label) pair, return the corresponding vector of joint-feature values.
source code
int
length(self)
Returns: The size of the fixed-length joint-feature vectors that are generated by this encoding.
source code
list
labels(self)
Returns: A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.
source code
str
describe(self, fid)
Returns: A string describing the value of the joint-feature whose index in the generated feature vectors is fid.
source code
 
train(cls, train_toks)
Construct and return new feature encoding, based on a given training corpus train_toks.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __init__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

encode(self, featureset, label)

source code 

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of (index, value) tuples, specifying the value of each non-zero joint-feature.

Parameters:
  • featureset (dict)
Returns: list of (int, number)

length(self)

source code 
Returns: int
The size of the fixed-length joint-feature vectors that are generated by this encoding.

labels(self)

source code 
Returns: list
A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.

describe(self, fid)

source code 
Returns: str
A string describing the value of the joint-feature whose index in the generated feature vectors is fid.

train(cls, train_toks)

source code 

Construct and return new feature encoding, based on a given training corpus train_toks.

Parameters:
  • train_toks (list of tuples of (dict, str)) - Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.