nltk.classify.maxent.MaxentClassifier

A maximum entropy classifier (also known as a conditional exponential classifier). This classifier is parameterized by a set of weights, which are used to combine the joint-features that are generated from a featureset by an encoding. In particular, the encoding maps each (featureset, label) pair to a vector. The probability of each label is then computed using the following equation:

train(cls, train_toks, algorithm=None, trace=3, encoding=None, labels=None, sparse=True, gaussian_prior_sigma=0, **cutoffs)
Class Method

source code

Train a new maxent classifier based on the given corpus of training samples. This classifier will have its weights chosen to maximize entropy while remaining empirically consistent with the training corpus.

Parameters:

train_toks (list) - Training data, represented as a list of pairs, the first member of which is a featureset, and the second of which is a classification label.
algorithm (str) - A case-insensitive string, specifying which algorithm should be used to train the classifier. The following algorithms are currently available.
- Iterative Scaling Methods
  - 'GIS': Generalized Iterative Scaling
  - 'IIS': Improved Iterative Scaling
- Optimization Methods (require scipy)
  - 'CG': Conjugate gradient
  - 'BFGS': Broyden-Fletcher-Goldfarb-Shanno algorithm
  - 'Powell': Powell agorithm
  - 'LBFGSB': A limited-memory variant of the BFGS algorithm
  - 'Nelder-Mead': The Nelder-Mead algorithm
- External Libraries
  - 'megam': LM-BFGS algorithm, with training performed by an megam. (requires that megam be installed.)
The default algorithm is 'CG' if 'scipy' is installed; and 'iis' otherwise.
trace (int) - The level of diagnostic tracing output to produce. Higher values produce more verbose output.
encoding (MaxentFeatureEncodingI) - A feature encoding, used to convert featuresets into feature vectors. If none is specified, then a BinaryMaxentFeatureEncoding will be built based on the features that are attested in the training corpus.
labels (list of str) - The set of possible labels. If none is given, then the set of all labels attested in the training data will be used instead.
sparse - If true, then use sparse matrices instead of dense matrices. Currently, this is only supported by the scipy (optimization method) algorithms. For other algorithms, its value is ignored.
gaussian_prior_sigma - The sigma value for a gaussian prior on model weights. Currently, this is supported by the scipy (optimization method) algorithms and megam. For other algorithms, its value is ignored.
cutoffs - Arguments specifying various conditions under which the training should be halted. (Some of the cutoff conditions are not supported by some algorithms.)
- max_iter=v: Terminate after v iterations.
- min_ll=v: Terminate after the negative average log-likelihood drops under v.
- min_lldelta=v: Terminate if a single iteration improves log likelihood by less than v.
- tolerance=v: Terminate a scipy optimization method when improvement drops below a tolerance level v. The exact meaning of this tolerance depends on the scipy algorithm used. See scipy documentation for more info. Default values: 1e-3 for CG, 1e-5 for LBFGSB, and 1e-4 for other algorithms. (scipy only)

Returns: MaxentClassifier

The new maxent classifier

ALGORITHMS

A list of the algorithm names that are accepted for the train() method's algorithm parameter.

Value:

['GIS',
 'IIS',
 'CG',
 'BFGS',
 'Powell',
 'LBFGSB',
 'Nelder-Mead',
 'MEGAM']

_SCIPY_ALGS

Value:

{'bfgs': 'BFGS',
 'cg': 'CG',
 'lbfgsb': 'LBFGSB',
 'nelder-mead': 'Nelder-Mead',
 'powell': 'Powell'}

Class MaxentClassifier

init(self, encoding, weights, logarithmic=True)
(Constructor)

labels(self)

set_weights(self, new_weights)

weights(self)

classify(self, featureset)

prob_classify(self, featureset)

show_most_informative_features(self, n=10, show=`'all'`)

repr(self)
(Representation operator)

train(cls, train_toks, algorithm=None, trace=3, encoding=None, labels=None, sparse=True, gaussian_prior_sigma=0, **cutoffs)
Class Method

ALGORITHMS

_SCIPY_ALGS

Class MaxentClassifier

__init__(self, encoding, weights, logarithmic=True) (Constructor)

labels(self)

set_weights(self, new_weights)

weights(self)

classify(self, featureset)

prob_classify(self, featureset)

show_most_informative_features(self, n=10, show='all')

__repr__(self) (Representation operator)

train(cls, train_toks, algorithm=None, trace=3, encoding=None, labels=None, sparse=True, gaussian_prior_sigma=0, **cutoffs) Class Method

ALGORITHMS

_SCIPY_ALGS

init(self, encoding, weights, logarithmic=True)
(Constructor)

show_most_informative_features(self, n=10, show=`'all'`)

repr(self)
(Representation operator)

train(cls, train_toks, algorithm=None, trace=3, encoding=None, labels=None, sparse=True, gaussian_prior_sigma=0, **cutoffs)
Class Method