PAICE's evaluation statistics for stemming algorithms

Given a list of words with their real lemmas and stems according to stemming algorithm under evaluation, counts Understemming Index (UI), Overstemming Index (OI), Stemming Weight (SW) and Error-rate relative to truncation (ERRT).

 
>>> from nltk.metrics import Paice

1   Understemming and Overstemming values

 
>>> lemmas = {'consol': ['consol', 'consols'],
...           'console': ['consoled', 'consoles', 'consoling'],
...           'kneel': ['kneel', 'knelt']}
>>> stems = {'consol': ['consol', 'consoled', 'consoles', 'consoling', 'consols'],
...          'kneel': ['kneel'],
...          'knelt': ['knelt']}
>>> p = Paice(lemmas, stems)
>>> p.gumt, p.gdmt, p.gwmt, p.gdnt
(1.0, 5.0, 6.0, 16.0)
 
>>> p.ui, p.oi, p.sw
(0.2, 0.375, 1.875)
 
>>> p.errt
1.0
 
>>> p.coords
[(0.0, 1.0), (0.0, 0.375), (0.2, 0.375)]