Class MaxentFeatureEncodingI
source code
object --+
|
MaxentFeatureEncodingI
- Known Subclasses:
-
A mapping that converts a set of input-feature values to a vector of
joint-feature values, given a label. This conversion is necessary to
translate featuresets into a format that can be used by maximum entropy
models.
The set of joint-features used by a given encoding is fixed, and each
index in the generated joint-feature vectors corresponds to a single
joint-feature. The length of the generated joint-feature vectors is
therefore constant (for a given encoding).
Because the joint-feature vectors generated by
MaxentFeatureEncodingI are typically very sparse, they are
represented as a list of (index, value) tuples, specifying
the value of each non-zero joint-feature.
Feature encodings are generally created using the train() method, which generates an appropriate encoding
based on the input-feature values and labels that are present in a given
corpus.
list of (int, number)
|
encode(self,
featureset,
label)
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. |
source code
|
|
int
|
length(self)
Returns:
The size of the fixed-length joint-feature vectors that are generated
by this encoding. |
source code
|
|
list
|
labels(self)
Returns:
A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs. |
source code
|
|
str
|
describe(self,
fid)
Returns:
A string describing the value of the joint-feature whose index in the
generated feature vectors is fid. |
source code
|
|
|
|
train(cls,
train_toks)
Construct and return new feature encoding, based on a given training
corpus train_toks. |
source code
|
|
|
Inherited from object:
__delattr__,
__getattribute__,
__hash__,
__init__,
__new__,
__reduce__,
__reduce_ex__,
__repr__,
__setattr__,
__str__
|
|
Inherited from object:
__class__
|
|
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. This vector is represented as a list of
(index, value) tuples, specifying the value of each non-zero
joint-feature.
- Parameters:
- Returns:
list of (int, number)
|
- Returns:
int
- The size of the fixed-length joint-feature vectors that are
generated by this encoding.
|
- Returns:
list
- A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs.
|
- Returns:
str
- A string describing the value of the joint-feature whose index in
the generated feature vectors is
fid.
|
|
Construct and return new feature encoding, based on a given training
corpus train_toks.
- Parameters:
train_toks (list of tuples of (dict,
str)) - Training data, represented as a list of pairs, the first member
of which is a feature dictionary, and the second of which is a
classification label.
|