Class BinaryMaxentFeatureEncoding
source code
object --+
|
MaxentFeatureEncodingI --+
|
BinaryMaxentFeatureEncoding
- Known Subclasses:
-
A feature encoding that generates vectors containing a binary
joint-features of the form:
joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
{
{ 0 otherwise
Where fname is the name of an input-feature,
fval is a value for that input-feature, and
label is a label.
Typically, these features are constructed based on a training corpus,
using the train() method. This method will create one feature for
each combination of fname, fval, and
label that occurs at least once in the training corpus.
The unseen_features parameter can be used to add unseen-value
features, which are used whenever an input feature has a value that
was not encountered in the training corpus. These features have the
form:
joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
{ and l == label
{
{ 0 otherwise
Where is_unseen(fname, fval) is true if the encoding does
not contain any joint features that are true when
fs[fname]==fval.
The alwayson_features parameter can be used to add always-on
features, which have the form:
joint_feat(fs, l) = { 1 if (l == label)
{
{ 0 otherwise
These always-on features allow the maxent model to directly model the
prior probabilities of each label.
|
|
__init__(self,
labels,
mapping,
unseen_features=False,
alwayson_features=False)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature |
source code
|
|
list of (int, number)
|
encode(self,
featureset,
label)
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. |
source code
|
|
str
|
describe(self,
f_id)
Returns:
A string describing the value of the joint-feature whose index in the
generated feature vectors is fid. |
source code
|
|
list
|
labels(self)
Returns:
A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs. |
source code
|
|
int
|
length(self)
Returns:
The size of the fixed-length joint-feature vectors that are generated
by this encoding. |
source code
|
|
|
Inherited from object:
__delattr__,
__getattribute__,
__hash__,
__new__,
__reduce__,
__reduce_ex__,
__repr__,
__setattr__,
__str__
|
|
|
train(cls,
train_toks,
count_cutoff=0,
labels=None,
**options)
Construct and return new feature encoding, based on a given training
corpus train_toks. |
source code
|
|
|
|
_labels
A list of attested labels.
|
|
|
_mapping
dict mapping from (fname,fval,label) -> fid
|
|
|
_length
The length of generated joint feature vectors.
|
|
|
_alwayson
dict mapping from label -> fid
|
|
|
_unseen
dict mapping from fname -> fid
|
|
Inherited from object:
__class__
|
__init__(self,
labels,
mapping,
unseen_features=False,
alwayson_features=False)
(Constructor)
| source code
|
x.__init__(...) initializes x; see x.__class__.__doc__ for
signature
- Parameters:
labels - A list of the "known labels" for this encoding.
mapping - A dictionary mapping from (fname,fval,label) tuples
to corresponding joint-feature indexes. These indexes must be
the set of integers from 0...len(mapping). If
mapping[fname,fval,label]=id, then
self.encode({..., fname:fval, ...}, label)[id] is 1;
otherwise, it is 0.
unseen_features - If true, then include unseen value features in the generated
joint-feature vectors.
alwayson_features - If true, then include always-on features in the generated
joint-feature vectors.
- Overrides:
object.__init__
|
|
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. This vector is represented as a list of
(index, value) tuples, specifying the value of each non-zero
joint-feature.
- Returns:
list of (int, number)
- Overrides:
MaxentFeatureEncodingI.encode
- (inherited documentation)
|
- Returns:
str
- A string describing the value of the joint-feature whose index in
the generated feature vectors is
fid.
- Overrides:
MaxentFeatureEncodingI.describe
- (inherited documentation)
|
- Returns:
list
- A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs.
- Overrides:
MaxentFeatureEncodingI.labels
- (inherited documentation)
|
- Returns:
int
- The size of the fixed-length joint-feature vectors that are
generated by this encoding.
- Overrides:
MaxentFeatureEncodingI.length
- (inherited documentation)
|
train(cls,
train_toks,
count_cutoff=0,
labels=None,
**options)
Class Method
| source code
|
Construct and return new feature encoding, based on a given training
corpus train_toks. See the class description for a description of the
joint-features that will be included in this encoding.
- Parameters:
train_toks (list of tuples of (dict,
str)) - Training data, represented as a list of pairs, the first member
of which is a feature dictionary, and the second of which is a
classification label.
count_cutoff (int) - A cutoff value that is used to discard rare joint-features. If a
joint-feature's value is 1 fewer than count_cutoff
times in the training corpus, then that joint-feature is not
included in the generated encoding.
labels (list) - A list of labels that should be used by the classifier. If not
specified, then the set of labels attested in
train_toks will be used.
options - Extra parameters for the constructor, such as
unseen_features and alwayson_features.
- Overrides:
MaxentFeatureEncodingI.train
|