Stemmers remove morphological affixes from words, leaving only the word stem.
>>> from __future__ import print_function >>> from nltk.stem import *
>>> from nltk.stem.porter import *
Create a new Porter stemmer.
>>> stemmer = PorterStemmer()
Test the stemmer on various pluralised words.
>>> plurals = ['caresses', 'flies', 'dies', 'mules', 'denied', ... 'died', 'agreed', 'owned', 'humbled', 'sized', ... 'meeting', 'stating', 'siezing', 'itemization', ... 'sensational', 'traditional', 'reference', 'colonizer', ... 'plotted']>>> singles = [stemmer.stem(plural) for plural in plurals]>>> print(' '.join(singles)) # doctest: +NORMALIZE_WHITESPACE caress fli die mule deni die agre own humbl size meet state siez item sensat tradit refer colon plot
>>> from nltk.stem.snowball import SnowballStemmer
See which languages are supported.
>>> print(" ".join(SnowballStemmer.languages)) danish dutch english finnish french german hungarian italian norwegian porter portuguese romanian russian spanish swedish
Create a new instance of a language specific subclass.
>>> stemmer = SnowballStemmer("english")
Stem a word.
>>> print(stemmer.stem("running")) run
Decide not to stem stopwords.
>>> stemmer2 = SnowballStemmer("english", ignore_stopwords=True) >>> print(stemmer.stem("having")) have >>> print(stemmer2.stem("having")) having
The 'english' stemmer is better than the original 'porter' stemmer.
>>> print(SnowballStemmer("english").stem("generously")) generous >>> print(SnowballStemmer("porter").stem("generously")) gener
Note
Extra stemmer tests can be found in nltk.test.unit.test_stem.