Class BinaryMaxentFeatureEncoding
source code
object --+
|
MaxentFeatureEncodingI --+
|
BinaryMaxentFeatureEncoding
- Known Subclasses:
-
A feature encoding that generates vectors containing a binary
joint-features of the form:
joint_feat(fs, l) = { 1 if (fs[fname] == fval) and (l == label)
{
{ 0 otherwise
Where fname
is the name of an input-feature,
fval
is a value for that input-feature, and
label
is a label.
Typically, these features are constructed based on a training corpus,
using the train() method. This method will create one feature for
each combination of fname
, fval
, and
label
that occurs at least once in the training corpus.
The unseen_features
parameter can be used to add unseen-value
features, which are used whenever an input feature has a value that
was not encountered in the training corpus. These features have the
form:
joint_feat(fs, l) = { 1 if is_unseen(fname, fs[fname])
{ and l == label
{
{ 0 otherwise
Where is_unseen(fname, fval)
is true if the encoding does
not contain any joint features that are true when
fs[fname]==fval
.
The alwayson_features
parameter can be used to add always-on
features, which have the form:
joint_feat(fs, l) = { 1 if (l == label)
{
{ 0 otherwise
These always-on features allow the maxent model to directly model the
prior probabilities of each label.
|
__init__(self,
labels,
mapping,
unseen_features=False,
alwayson_features=False)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature |
source code
|
|
list of (int, number)
|
encode(self,
featureset,
label)
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. |
source code
|
|
str
|
describe(self,
f_id)
Returns:
A string describing the value of the joint-feature whose index in the
generated feature vectors is fid . |
source code
|
|
list
|
labels(self)
Returns:
A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs . |
source code
|
|
int
|
length(self)
Returns:
The size of the fixed-length joint-feature vectors that are generated
by this encoding. |
source code
|
|
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__repr__ ,
__setattr__ ,
__str__
|
|
train(cls,
train_toks,
count_cutoff=0,
labels=None,
**options)
Construct and return new feature encoding, based on a given training
corpus train_toks . |
source code
|
|
|
_labels
A list of attested labels.
|
|
_mapping
dict mapping from (fname,fval,label) -> fid
|
|
_length
The length of generated joint feature vectors.
|
|
_alwayson
dict mapping from label -> fid
|
|
_unseen
dict mapping from fname -> fid
|
Inherited from object :
__class__
|
__init__(self,
labels,
mapping,
unseen_features=False,
alwayson_features=False)
(Constructor)
| source code
|
x.__init__(...) initializes x; see x.__class__.__doc__ for
signature
- Parameters:
labels - A list of the "known labels" for this encoding.
mapping - A dictionary mapping from (fname,fval,label) tuples
to corresponding joint-feature indexes. These indexes must be
the set of integers from 0...len(mapping). If
mapping[fname,fval,label]=id , then
self.encode({..., fname:fval, ...}, label)[id] is 1;
otherwise, it is 0.
unseen_features - If true, then include unseen value features in the generated
joint-feature vectors.
alwayson_features - If true, then include always-on features in the generated
joint-feature vectors.
- Overrides:
object.__init__
|
Given a (featureset, label) pair, return the corresponding vector of
joint-feature values. This vector is represented as a list of
(index, value) tuples, specifying the value of each non-zero
joint-feature.
- Returns:
list of (int, number)
- Overrides:
MaxentFeatureEncodingI.encode
- (inherited documentation)
|
- Returns:
str
- A string describing the value of the joint-feature whose index in
the generated feature vectors is
fid .
- Overrides:
MaxentFeatureEncodingI.describe
- (inherited documentation)
|
- Returns:
list
- A list of the "known labels" -- i.e., all labels
l such that self.encode(fs,l) can be a
nonzero joint-feature vector for some value of fs .
- Overrides:
MaxentFeatureEncodingI.labels
- (inherited documentation)
|
- Returns:
int
- The size of the fixed-length joint-feature vectors that are
generated by this encoding.
- Overrides:
MaxentFeatureEncodingI.length
- (inherited documentation)
|
train(cls,
train_toks,
count_cutoff=0,
labels=None,
**options)
Class Method
| source code
|
Construct and return new feature encoding, based on a given training
corpus train_toks . See the class description for a description of the
joint-features that will be included in this encoding.
- Parameters:
train_toks (list of tuples of (dict ,
str )) - Training data, represented as a list of pairs, the first member
of which is a feature dictionary, and the second of which is a
classification label.
count_cutoff (int ) - A cutoff value that is used to discard rare joint-features. If a
joint-feature's value is 1 fewer than count_cutoff
times in the training corpus, then that joint-feature is not
included in the generated encoding.
labels (list ) - A list of labels that should be used by the classifier. If not
specified, then the set of labels attested in
train_toks will be used.
options - Extra parameters for the constructor, such as
unseen_features and alwayson_features .
- Overrides:
MaxentFeatureEncodingI.train
|