Package nltk :: Package classify :: Module maxent :: Class MaxentFeatureEncodingI

Class MaxentFeatureEncodingI

object --+
         |
        MaxentFeatureEncodingI

Known Subclasses:

A mapping that converts a set of input-feature values to a vector of joint-feature values, given a label. This conversion is necessary to translate featuresets into a format that can be used by maximum entropy models.

The set of joint-features used by a given encoding is fixed, and each index in the generated joint-feature vectors corresponds to a single joint-feature. The length of the generated joint-feature vectors is therefore constant (for a given encoding).

Because the joint-feature vectors generated by MaxentFeatureEncodingI are typically very sparse, they are represented as a list of (index, value) tuples, specifying the value of each non-zero joint-feature.

Feature encodings are generally created using the train() method, which generates an appropriate encoding based on the input-feature values and labels that are present in a given corpus.

Instance Methods

[hide private]

list of (int, number)

encode(self, featureset, label)
Given a (featureset, label) pair, return the corresponding vector of joint-feature values.

source code

int

length(self)
Returns: The size of the fixed-length joint-feature vectors that are generated by this encoding.

source code

list

labels(self)
Returns: A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs. source code

str

describe(self, fid)
Returns: A string describing the value of the joint-feature whose index in the generated feature vectors is fid. source code

train(cls, train_toks)
Construct and return new feature encoding, based on a given training corpus train_toks. source code

Inherited from object: __delattr__, __getattribute__, __hash__, __init__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __str__

Properties

[hide private]

Inherited from object: __class__

Method Details

[hide private]

encode(self, featureset, label)

source code

Given a (featureset, label) pair, return the corresponding vector of joint-feature values. This vector is represented as a list of (index, value) tuples, specifying the value of each non-zero joint-feature.

Parameters:

featureset (dict)

Returns: list of (int, number)

length(self)

source code

Returns: int: The size of the fixed-length joint-feature vectors that are generated by this encoding.

labels(self)

source code

Returns: list: A list of the "known labels" -- i.e., all labels l such that self.encode(fs,l) can be a nonzero joint-feature vector for some value of fs.

describe(self, fid)

source code

Returns: str: A string describing the value of the joint-feature whose index in the generated feature vectors is fid.

train(cls, train_toks)

source code

Construct and return new feature encoding, based on a given training corpus train_toks.

Parameters:

train_toks (list of tuples of (dict, str)) - Training data, represented as a list of pairs, the first member of which is a feature dictionary, and the second of which is a classification label.