Classifiers label tokens with category labels (or class labels). Typically, labels are represented with strings (such as "health" or "sports". In NLTK, classifiers are defined using classes that implement the ClassifyI interface:
|
NLTK defines several classifier classes:
Classifiers are typically created by training them on a training corpus.
We define a very simple training corpus with 3 binary features: ['a', 'b', 'c'], and are two labels: ['x', 'y']. We use a simple feature set so that the correct answers can be calculated analytically (although we haven't done this yet for all tests).
|
Test the Naive Bayes classifier:
|
Test the Decision Tree classifier:
|
Test SklearnClassifier, which requires the scikit-learn package.
|
Test the Maximum Entropy classifier training algorithms; they should all generate the same results.
|
|
|
|
|
|
|
|
|