Class MaxentFeatureEncodingI
source code
object --+
|
MaxentFeatureEncodingI
- Known Subclasses:
-
A mapping that converts a set of input-feature values to a vector of
joint-feature values, given a label. This conversion is necessary to
translate featuresets into a format that can be used by maximum entropy
models.
The set of joint-features used by a given encoding is fixed, and each
index in the generated joint-feature vectors corresponds to a single
joint-feature. The length of the generated joint-feature vectors is
therefore constant (for a given encoding).
Because the joint-feature vectors generated by
MaxentFeatureEncodingI
are typically very sparse, they are
represented as a list of (index, value)
tuples, specifying
the value of each non-zero joint-feature.
Feature encodings are generally created using the train() method, which generates an appropriate encoding
based on the input-feature values and labels that are present in a given
corpus.
list of (int, number)
|
encode(self,
featureset,
label)
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. |
source code
|
|
int
|
length(self)
Returns:
The size of the fixed-length joint-feature vectors that are generated
by this encoding. |
source code
|
|
list
|
labels(self)
Returns:
A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs . |
source code
|
|
str
|
describe(self,
fid)
Returns:
A string describing the value of the joint-feature whose index in the
generated feature vectors is fid . |
source code
|
|
|
train(cls,
train_toks)
Construct and return new feature encoding, based on a given training
corpus train_toks . |
source code
|
|
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__init__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__str__
|
Inherited from object :
__class__
|
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. This vector is represented as a list of
(index, value) tuples, specifying the value of each non-zero
joint-feature.
- Parameters:
- Returns:
list of (int, number)
|
- Returns:
int
- The size of the fixed-length joint-feature vectors that are
generated by this encoding.
|
- Returns:
list
- A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs .
|
- Returns:
str
- A string describing the value of the joint-feature whose index in
the generated feature vectors is
fid .
|
Construct and return new feature encoding, based on a given training
corpus train_toks .
- Parameters:
train_toks (list of tuples of (dict ,
str )) - Training data, represented as a list of pairs, the first member
of which is a feature dictionary, and the second of which is a
classification label.
|