nltk.translate package

Submodules

nltk.translate.api module

class nltk.translate.api.AlignedSent(words, mots, alignment=None)[source]

Bases: object

Return an aligned sentence object, which encapsulates two sentences along with an Alignment between them.

>>> from nltk.translate import AlignedSent, Alignment
>>> algnsent = AlignedSent(['klein', 'ist', 'das', 'Haus'],
...     ['the', 'house', 'is', 'small'], Alignment.fromstring('0-2 1-3 2-1 3-0'))
>>> algnsent.words
['klein', 'ist', 'das', 'Haus']
>>> algnsent.mots
['the', 'house', 'is', 'small']
>>> algnsent.alignment
Alignment([(0, 2), (1, 3), (2, 1), (3, 0)])
>>> from nltk.corpus import comtrans
>>> print(comtrans.aligned_sents()[54])
<AlignedSent: 'Weshalb also sollten...' -> 'So why should EU arm...'>
>>> print(comtrans.aligned_sents()[54].alignment)
0-0 0-1 1-0 2-2 3-4 3-5 4-7 5-8 6-3 7-9 8-9 9-10 9-11 10-12 11-6 12-6 13-13
Parameters:
  • words (list(str)) – source language words
  • mots (list(str)) – target language words
  • alignment (Alignment) – the word-level alignments between the source and target language
alignment
invert()[source]

Return the aligned sentence pair, reversing the directionality

Return type:AlignedSent
mots
unicode_repr()

Return a string representation for this AlignedSent.

Return type:str
words
class nltk.translate.api.Alignment[source]

Bases: frozenset

A storage class for representing alignment between two sequences, s1, s2. In general, an alignment is a set of tuples of the form (i, j, ...) representing an alignment between the i-th element of s1 and the j-th element of s2. Tuples are extensible (they might contain additional data, such as a boolean to indicate sure vs possible alignments).

>>> from nltk.translate import Alignment
>>> a = Alignment([(0, 0), (0, 1), (1, 2), (2, 2)])
>>> a.invert()
Alignment([(0, 0), (1, 0), (2, 1), (2, 2)])
>>> print(a.invert())
0-0 1-0 2-1 2-2
>>> a[0]
[(0, 1), (0, 0)]
>>> a.invert()[2]
[(2, 1), (2, 2)]
>>> b = Alignment([(0, 0), (0, 1)])
>>> b.issubset(a)
True
>>> c = Alignment.fromstring('0-0 0-1')
>>> b == c
True
classmethod fromstring(s)[source]

Read a giza-formatted string and return an Alignment object.

>>> Alignment.fromstring('0-0 2-1 9-2 21-3 10-4 7-5')
Alignment([(0, 0), (2, 1), (7, 5), (9, 2), (10, 4), (21, 3)])
Parameters:s (str) – the positional alignments in giza format
Return type:Alignment
Returns:An Alignment object corresponding to the string representation s.
invert()[source]

Return an Alignment object, being the inverted mapping.

range(positions=None)[source]

Work out the range of the mapping from the given positions. If no positions are specified, compute the range of the entire mapping.

unicode_repr()

Produce a Giza-formatted string representing the alignment.

class nltk.translate.api.PhraseTable[source]

Bases: object

In-memory store of translations for a given phrase, and the log probability of the those translations

add(src_phrase, trg_phrase, log_prob)[source]
Parameters:log_prob (float) – Log probability that given src_phrase, trg_phrase is its translation
translations_for(src_phrase)[source]

Get the translations for a source language phrase

Parameters:src_phrase (tuple(str)) – Source language phrase of interest
Returns:A list of target language phrases that are translations of src_phrase, ordered in decreasing order of likelihood. Each list element is a tuple of the target phrase and its log probability.
Return type:list(PhraseTableEntry)
class nltk.translate.api.PhraseTableEntry(trg_phrase, log_prob)

Bases: tuple

log_prob

Alias for field number 1

trg_phrase

Alias for field number 0

nltk.translate.bleu_score module

BLEU score implementation.

class nltk.translate.bleu_score.SmoothingFunction(epsilon=0.1, alpha=5, k=5)[source]

Bases: object

This is an implementation of the smoothing techniques for segment-level BLEU scores that was presented in Boxing Chen and Collin Cherry (2014) A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU. In WMT14. http://acl2014.org/acl2014/W14-33/pdf/W14-3346.pdf

method0(p_n, *args, **kwargs)[source]

No smoothing.

method1(p_n, *args, **kwargs)[source]

Smoothing method 1: Add epsilon counts to precision with 0 counts.

method2(p_n, *args, **kwargs)[source]

Smoothing method 2: Add 1 to both numerator and denominator from Chin-Yew Lin and Franz Josef Och (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In ACL04.

method3(p_n, *args, **kwargs)[source]

Smoothing method 3: NIST geometric sequence smoothing The smoothing is computed by taking 1 / ( 2^k ), instead of 0, for each precision score whose matching n-gram count is null. k is 1 for the first ‘n’ value for which the n-gram match count is null/ For example, if the text contains:

  • one 2-gram match
  • and (consequently) two 1-gram matches
the n-gram count for each individual precision score would be:
  • n=1 => prec_count = 2 (two unigrams)
  • n=2 => prec_count = 1 (one bigram)
  • n=3 => prec_count = 1/2 (no trigram, taking ‘smoothed’ value of 1 / ( 2^k ), with k=1)
  • n=4 => prec_count = 1/4 (no fourgram, taking ‘smoothed’ value of 1 / ( 2^k ), with k=2)
method4(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]

Smoothing method 4: Shorter translations may have inflated precision values due to having smaller denominators; therefore, we give them proportionally smaller smoothed counts. Instead of scaling to 1/(2^k), Chen and Cherry suggests dividing by 1/ln(len(T)), where T is the length of the translation.

method5(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]

Smoothing method 5: The matched counts for similar values of n should be similar. To a calculate the n-gram matched count, it averages the n−1, n and n+1 gram matched counts.

method6(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]

Smoothing method 6: Interpolates the maximum likelihood estimate of the precision p_n with a prior estimate pi0. The prior is estimated by assuming that the ratio between pn and pn−1 will be the same as that between pn−1 and pn−2; from Gao and He (2013) Training MRF-Based Phrase Translation Models using Gradient Ascent. In NAACL.

method7(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]

Smoothing method 6: Interpolates the maximum likelihood estimate of the precision p_n with a prior estimate pi0. The prior is estimated by assuming that the ratio between pn and pn−1 will be the same as that between pn−1 and pn−2.

nltk.translate.bleu_score.brevity_penalty(closest_ref_len, hyp_len)[source]

Calculate brevity penalty.

As the modified n-gram precision still has the problem from the short length sentence, brevity penalty is used to modify the overall BLEU score according to length.

An example from the paper. There are three references with length 12, 15 and 17. And a concise hypothesis of the length 12. The brevity penalty is 1.

>>> reference1 = list('aaaaaaaaaaaa')      # i.e. ['a'] * 12
>>> reference2 = list('aaaaaaaaaaaaaaa')   # i.e. ['a'] * 15
>>> reference3 = list('aaaaaaaaaaaaaaaaa') # i.e. ['a'] * 17
>>> hypothesis = list('aaaaaaaaaaaa')      # i.e. ['a'] * 12
>>> references = [reference1, reference2, reference3]
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(references, hyp_len)
>>> brevity_penalty(closest_ref_len, hyp_len)
1.0

In case a hypothesis translation is shorter than the references, penalty is applied.

>>> references = [['a'] * 28, ['a'] * 28]
>>> hypothesis = ['a'] * 12
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(references, hyp_len)
>>> brevity_penalty(closest_ref_len, hyp_len)
0.2635971381157267

The length of the closest reference is used to compute the penalty. If the length of a hypothesis is 12, and the reference lengths are 13 and 2, the penalty is applied because the hypothesis length (12) is less then the closest reference length (13).

>>> references = [['a'] * 13, ['a'] * 2]
>>> hypothesis = ['a'] * 12
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(references, hyp_len)
>>> brevity_penalty(closest_ref_len, hyp_len) 
0.9200...

The brevity penalty doesn’t depend on reference order. More importantly, when two reference sentences are at the same distance, the shortest reference sentence length is used.

>>> references = [['a'] * 13, ['a'] * 11]
>>> hypothesis = ['a'] * 12
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(references, hyp_len)
>>> bp1 = brevity_penalty(closest_ref_len, hyp_len)
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(reversed(references), hyp_len)
>>> bp2 = brevity_penalty(closest_ref_len, hyp_len)
>>> bp1 == bp2 == 1
True

A test example from mteval-v13a.pl (starting from the line 705):

>>> references = [['a'] * 11, ['a'] * 8]
>>> hypothesis = ['a'] * 7
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(references, hyp_len)
>>> brevity_penalty(closest_ref_len, hyp_len) 
0.8668...
>>> references = [['a'] * 11, ['a'] * 8, ['a'] * 6, ['a'] * 7]
>>> hypothesis = ['a'] * 7
>>> hyp_len = len(hypothesis)
>>> closest_ref_len =  closest_ref_length(references, hyp_len)
>>> brevity_penalty(closest_ref_len, hyp_len)
1.0
Parameters:hyp_len – The length of the hypothesis for a single sentence OR the

sum of all the hypotheses’ lengths for a corpus :type hyp_len: int :param closest_ref_len: The length of the closest reference for a single hypothesis OR the sum of all the closest references for every hypotheses. :type closest_reference_len: int :return: BLEU’s brevity penalty. :rtype: float

nltk.translate.bleu_score.closest_ref_length(references, hyp_len)[source]

This function finds the reference that is the closest length to the hypothesis. The closest reference length is referred to as r variable from the brevity penalty formula in Papineni et. al. (2002)

Parameters:
  • references (list(list(str))) – A list of reference translations.
  • hypothesis (int) – The length of the hypothesis.
Returns:

The length of the reference that’s closest to the hypothesis.

Return type:

int

nltk.translate.bleu_score.corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None, auto_reweigh=False, emulate_multibleu=False)[source]

Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all the hypotheses and their respective references.

Instead of averaging the sentence level BLEU scores (i.e. marco-average precision), the original BLEU metric (Papineni et al. 2002) accounts for the micro-average precision (i.e. summing the numerators and denominators for each hypothesis-reference(s) pairs before the division).

>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
...         'interested', 'in', 'world', 'history']
>>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history',
...          'because', 'he', 'read', 'the', 'book']
>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]]
>>> hypotheses = [hyp1, hyp2]
>>> corpus_bleu(list_of_references, hypotheses) 
0.5920...

The example below show that corpus_bleu() is different from averaging sentence_bleu() for hypotheses

>>> score1 = sentence_bleu([ref1a, ref1b, ref1c], hyp1)
>>> score2 = sentence_bleu([ref2a], hyp2)
>>> (score1 + score2) / 2 
0.6223...
Parameters:
  • references (list(list(list(str)))) – a corpus of lists of reference sentences, w.r.t. hypotheses
  • hypotheses (list(list(str))) – a list of hypothesis sentences
  • weights (list(float)) – weights for unigrams, bigrams, trigrams and so on
  • smoothing_function (SmoothingFunction) –
  • auto_reweigh (bool) –
  • emulate_multibleu – bool
Returns:

The corpus-level BLEU score.

Return type:

float

nltk.translate.bleu_score.modified_precision(references, hypothesis, n)[source]

Calculate modified ngram precision.

The normal precision method may lead to some wrong translations with high-precision, e.g., the translation, in which a word of reference repeats several times, has very high precision.

This function only returns the Fraction object that contains the numerator and denominator necessary to calculate the corpus-level precision. To calculate the modified precision for a single pair of hypothesis and references, cast the Fraction object into a float.

The famous “the the the ... ” example shows that you can get BLEU precision by duplicating high frequency words.

>>> reference1 = 'the cat is on the mat'.split()
>>> reference2 = 'there is a cat on the mat'.split()
>>> hypothesis1 = 'the the the the the the the'.split()
>>> references = [reference1, reference2]
>>> float(modified_precision(references, hypothesis1, n=1)) 
0.2857...

In the modified n-gram precision, a reference word will be considered exhausted after a matching hypothesis word is identified, e.g.

>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...               'ensures', 'that', 'the', 'military', 'will',
...               'forever', 'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...               'guarantees', 'the', 'military', 'forces', 'always',
...               'being', 'under', 'the', 'command', 'of', 'the',
...               'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...               'army', 'always', 'to', 'heed', 'the', 'directions',
...               'of', 'the', 'party']
>>> hypothesis = 'of the'.split()
>>> references = [reference1, reference2, reference3]
>>> float(modified_precision(references, hypothesis, n=1))
1.0
>>> float(modified_precision(references, hypothesis, n=2))
1.0

An example of a normal machine translation hypothesis:

>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...               'ensures', 'that', 'the', 'military', 'always',
...               'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops',
...               'forever', 'hearing', 'the', 'activity', 'guidebook',
...               'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...               'ensures', 'that', 'the', 'military', 'will',
...               'forever', 'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...               'guarantees', 'the', 'military', 'forces', 'always',
...               'being', 'under', 'the', 'command', 'of', 'the',
...               'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...               'army', 'always', 'to', 'heed', 'the', 'directions',
...               'of', 'the', 'party']
>>> references = [reference1, reference2, reference3]
>>> float(modified_precision(references, hypothesis1, n=1)) 
0.9444...
>>> float(modified_precision(references, hypothesis2, n=1)) 
0.5714...
>>> float(modified_precision(references, hypothesis1, n=2)) 
0.5882352941176471
>>> float(modified_precision(references, hypothesis2, n=2)) 
0.07692...
Parameters:
  • references (list(list(str))) – A list of reference translations.
  • hypothesis (list(str)) – A hypothesis translation.
  • n (int) – The ngram order.
Returns:

BLEU’s modified precision for the nth order ngram.

Return type:

Fraction

nltk.translate.bleu_score.sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None, auto_reweigh=False, emulate_multibleu=False)[source]

Calculate BLEU score (Bilingual Evaluation Understudy) from Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “BLEU: a method for automatic evaluation of machine translation.” In Proceedings of ACL. http://www.aclweb.org/anthology/P02-1040.pdf

>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...               'ensures', 'that', 'the', 'military', 'always',
...               'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops',
...               'forever', 'hearing', 'the', 'activity', 'guidebook',
...               'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...               'ensures', 'that', 'the', 'military', 'will', 'forever',
...               'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...               'guarantees', 'the', 'military', 'forces', 'always',
...               'being', 'under', 'the', 'command', 'of', 'the',
...               'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...               'army', 'always', 'to', 'heed', 'the', 'directions',
...               'of', 'the', 'party']
>>> sentence_bleu([reference1, reference2, reference3], hypothesis1) 
0.5045...
>>> sentence_bleu([reference1, reference2, reference3], hypothesis2) 
0.3969...

The default BLEU calculates a score for up to 4grams using uniform weights. To evaluate your translations with higher/lower order ngrams, use customized weights. E.g. when accounting for up to 6grams with uniform weights:

>>> weights = (0.1666, 0.1666, 0.1666, 0.1666, 0.1666)
>>> sentence_bleu([reference1, reference2, reference3], hypothesis1, weights) 
0.4583...
Parameters:
  • references (list(list(str))) – reference sentences
  • hypothesis (list(str)) – a hypothesis sentence
  • weights (list(float)) – weights for unigrams, bigrams, trigrams and so on
  • smoothing_function (SmoothingFunction) –
  • auto_reweigh (bool) –
  • emulate_multibleu – bool
Returns:

The sentence-level BLEU score.

Return type:

float

nltk.translate.chrf_score module

ChrF score implementation

nltk.translate.chrf_score.corpus_chrf(list_of_references, hypotheses, min_len=1, max_len=6, beta=3.0)[source]

Calculates the corpus level CHRF (Character n-gram F-score), it is the micro-averaged value of the sentence/segment level CHRF score.

CHRF only supports a single reference.

>>> ref1 = str('It is a guide to action that ensures that the military '
...            'will forever heed Party commands').split()
>>> ref2 = str('It is the guiding principle which guarantees the military '
...            'forces always being under the command of the Party').split()
>>>
>>> hyp1 = str('It is a guide to action which ensures that the military '
...            'always obeys the commands of the party').split()
>>> hyp2 = str('It is to insure the troops forever hearing the activity '
...            'guidebook that party direct')
>>> corpus_chrf([ref1, ref2, ref1, ref2], [hyp1, hyp2, hyp2, hyp1]) 
0.4915...
Parameters:
  • references (list(list(str)) / list(str)) – a corpus of list of reference sentences, w.r.t. hypotheses
  • hypotheses (list(list(str)) / list(str)) – a list of hypothesis sentences
  • min_len (int) – The minimum order of n-gram this function should extract.
  • max_len (int) – The maximum order of n-gram this function should extract.
  • beta (float) – the parameter to assign more importance to recall over precision
Returns:

the sentence level CHRF score.

Return type:

float

nltk.translate.chrf_score.sentence_chrf(reference, hypothesis, min_len=1, max_len=6, beta=3.0)[source]
Calculates the sentence level CHRF (Character n-gram F-score) described in

Unlike multi-reference BLEU, CHRF only supports a single reference.

An example from the original BLEU paper http://www.aclweb.org/anthology/P02-1040.pdf

>>> ref1 = str('It is a guide to action that ensures that the military '
...            'will forever heed Party commands').split()
>>> hyp1 = str('It is a guide to action which ensures that the military '
...            'always obeys the commands of the party').split()
>>> hyp2 = str('It is to insure the troops forever hearing the activity '
...            'guidebook that party direct').split()
>>> sentence_chrf(ref1, hyp1) 
0.6768...
>>> sentence_chrf(ref1, hyp2) 
0.4201...

The infamous “the the the ... ” example

>>> ref = 'the cat is on the mat'.split()
>>> hyp = 'the the the the the the the'.split()
>>> sentence_chrf(ref, hyp)  
0.2530...

An example to show that this function allows users to use strings instead of tokens, i.e. list(str) as inputs.

>>> ref1 = str('It is a guide to action that ensures that the military '
...            'will forever heed Party commands')
>>> hyp1 = str('It is a guide to action which ensures that the military '
...            'always obeys the commands of the party')
>>> sentence_chrf(ref1, hyp1) 
0.6768...
>>> type(ref1) == type(hyp1) == str
True
>>> sentence_chrf(ref1.split(), hyp1.split()) 
0.6768...

To skip the unigrams and only use 2- to 3-grams:

>>> sentence_chrf(ref1, hyp1, min_len=2, max_len=3) 
0.7018...
Parameters:
  • references (list(str) / str) – reference sentence
  • hypothesis (list(str) / str) – a hypothesis sentence
  • min_len (int) – The minimum order of n-gram this function should extract.
  • max_len (int) – The maximum order of n-gram this function should extract.
  • beta (float) – the parameter to assign more importance to recall over precision
Returns:

the sentence level CHRF score.

Return type:

float

nltk.translate.gale_church module

A port of the Gale-Church Aligner.

Gale & Church (1993), A Program for Aligning Sentences in Bilingual Corpora. http://aclweb.org/anthology/J93-1004.pdf

class nltk.translate.gale_church.LanguageIndependent[source]

Bases: object

AVERAGE_CHARACTERS = 1
PRIORS = {(0, 1): 0.0099, (1, 2): 0.089, (2, 2): 0.011, (1, 0): 0.0099, (1, 1): 0.89, (2, 1): 0.089}
VARIANCE_CHARACTERS = 6.8
nltk.translate.gale_church.align_blocks(source_sents_lens, target_sents_lens, params=<class 'nltk.translate.gale_church.LanguageIndependent'>)[source]

Return the sentence alignment of two text blocks (usually paragraphs).

>>> align_blocks([5,5,5], [7,7,7])
[(0, 0), (1, 1), (2, 2)]
>>> align_blocks([10,5,5], [12,20])
[(0, 0), (1, 1), (2, 1)]
>>> align_blocks([12,20], [10,5,5])
[(0, 0), (1, 1), (1, 2)]
>>> align_blocks([10,2,10,10,2,10], [12,3,20,3,12])
[(0, 0), (1, 1), (2, 2), (3, 2), (4, 3), (5, 4)]

@param source_sents_lens: The list of source sentence lengths. @param target_sents_lens: The list of target sentence lengths. @param params: the sentence alignment parameters. @return: The sentence alignments, a list of index pairs.

nltk.translate.gale_church.align_log_prob(i, j, source_sents, target_sents, alignment, params)[source]

Returns the log probability of the two sentences C{source_sents[i]}, C{target_sents[j]} being aligned with a specific C{alignment}.

@param i: The offset of the source sentence. @param j: The offset of the target sentence. @param source_sents: The list of source sentence lengths. @param target_sents: The list of target sentence lengths. @param alignment: The alignment type, a tuple of two integers. @param params: The sentence alignment parameters.

@returns: The log probability of a specific alignment between the two sentences, given the parameters.

nltk.translate.gale_church.align_texts(source_blocks, target_blocks, params=<class 'nltk.translate.gale_church.LanguageIndependent'>)[source]

Creates the sentence alignment of two texts.

Texts can consist of several blocks. Block boundaries cannot be crossed by sentence alignment links.

Each block consists of a list that contains the lengths (in characters) of the sentences in this block.

@param source_blocks: The list of blocks in the source text. @param target_blocks: The list of blocks in the target text. @param params: the sentence alignment parameters.

@returns: A list of sentence alignment lists

nltk.translate.gale_church.erfcc(x)[source]

Complementary error function.

nltk.translate.gale_church.norm_cdf(x)[source]

Return the area under the normal distribution from M{-∞..x}.

nltk.translate.gale_church.norm_logsf(x)[source]
nltk.translate.gale_church.parse_token_stream(stream, soft_delimiter, hard_delimiter)[source]

Parses a stream of tokens and splits it into sentences (using C{soft_delimiter} tokens) and blocks (using C{hard_delimiter} tokens) for use with the L{align_texts} function.

nltk.translate.gale_church.split_at(it, split_value)[source]

Splits an iterator C{it} at values of C{split_value}.

Each instance of C{split_value} is swallowed. The iterator produces subiterators which need to be consumed fully before the next subiterator can be used.

nltk.translate.gale_church.trace(backlinks, source_sents_lens, target_sents_lens)[source]

Traverse the alignment cost from the tracebacks and retrieves appropriate sentence pairs.

Parameters:
  • backlinks (dict) – A dictionary where the key is the alignment points and value is the cost (referencing the LanguageIndependent.PRIORS)
  • source_sents_lens (list(int)) – A list of target sentences’ lengths
  • target_sents_lens (list(int)) – A list of target sentences’ lengths

nltk.translate.gdfa module

nltk.translate.gdfa.grow_diag_final_and(srclen, trglen, e2f, f2e)[source]

This module symmetrisatizes the source-to-target and target-to-source word alignment output and produces, aka. GDFA algorithm (Koehn, 2005).

Step 1: Find the intersection of the bidirectional alignment.

Step 2: Search for additional neighbor alignment points to be added, given
these criteria: (i) neighbor alignments points are not in the intersection and (ii) neighbor alignments are in the union.
Step 3: Add all other alignment points thats not in the intersection, not in
the neighboring alignments that met the criteria but in the original foward/backward alignment outputs.
>>> forw = ('0-0 2-1 9-2 21-3 10-4 7-5 11-6 9-7 12-8 1-9 3-10 '
...         '4-11 17-12 17-13 25-14 13-15 24-16 11-17 28-18')
>>> back = ('0-0 1-9 2-9 3-10 4-11 5-12 6-6 7-5 8-6 9-7 10-4 '
...         '11-6 12-8 13-12 15-12 17-13 18-13 19-12 20-13 '
...         '21-3 22-12 23-14 24-17 25-15 26-17 27-18 28-18')
>>> srctext = ("この よう な ハロー 白色 わい 星 の L 関数 "
...            "は L と 共 に 不連続 に 増加 する こと が "
...            "期待 さ れる こと を 示し た 。")
>>> trgtext = ("Therefore , we expect that the luminosity function "
...            "of such halo white dwarfs increases discontinuously "
...            "with the luminosity .")
>>> srclen = len(srctext.split())
>>> trglen = len(trgtext.split())
>>>
>>> gdfa = grow_diag_final_and(srclen, trglen, forw, back)
>>> gdfa == set([(28, 18), (6, 6), (24, 17), (2, 1), (15, 12), (13, 12),
...         (2, 9), (3, 10), (26, 17), (25, 15), (8, 6), (9, 7), (20,
...         13), (18, 13), (0, 0), (10, 4), (13, 15), (23, 14), (7, 5),
...         (25, 14), (1, 9), (17, 13), (4, 11), (11, 17), (9, 2), (22,
...         12), (27, 18), (24, 16), (21, 3), (19, 12), (17, 12), (5,
...         12), (11, 6), (12, 8)])
True

References: Koehn, P., A. Axelrod, A. Birch, C. Callison, M. Osborne, and D. Talbot. 2005. Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation. In MT Eval Workshop.

Parameters:
  • srclen (int) – the number of tokens in the source language
  • trglen (int) – the number of tokens in the target language
  • e2f (str) – the forward word alignment outputs from source-to-target language (in pharaoh output format)
  • f2e (str) – the backward word alignment outputs from target-to-source language (in pharaoh output format)
Return type:

set(tuple(int))

Returns:

the symmetrized alignment points from the GDFA algorithm

nltk.translate.gleu_score module

GLEU score implementation.

nltk.translate.gleu_score.corpus_gleu(list_of_references, hypotheses, min_len=1, max_len=4)[source]

Calculate a single corpus-level GLEU score (aka. system-level GLEU) for all the hypotheses and their respective references.

Instead of averaging the sentence level GLEU scores (i.e. macro-average precision), Wu et al. (2016) sum up the matching tokens and the max of hypothesis and reference tokens for each sentence, then compute using the aggregate values.

From Mike Schuster (via email):
“For the corpus, we just add up the two statistics n_match and
n_all = max(n_all_output, n_all_target) for all sentences, then calculate gleu_score = n_match / n_all, so it is not just a mean of the sentence gleu scores (in our case, longer sentences count more, which I think makes sense as they are more difficult to translate).”
>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was',
...         'interested', 'in', 'world', 'history']
>>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history',
...          'because', 'he', 'read', 'the', 'book']
>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]]
>>> hypotheses = [hyp1, hyp2]
>>> corpus_gleu(list_of_references, hypotheses) 
0.5673...

The example below show that corpus_gleu() is different from averaging sentence_gleu() for hypotheses

>>> score1 = sentence_gleu([ref1a], hyp1)
>>> score2 = sentence_gleu([ref2a], hyp2)
>>> (score1 + score2) / 2 
0.6144...
Parameters:
  • list_of_references (list(list(list(str)))) – a list of reference sentences, w.r.t. hypotheses
  • hypotheses (list(list(str))) – a list of hypothesis sentences
  • min_len (int) – The minimum order of n-gram this function should extract.
  • max_len (int) – The maximum order of n-gram this function should extract.
Returns:

The corpus-level GLEU score.

Return type:

float

nltk.translate.gleu_score.sentence_gleu(references, hypothesis, min_len=1, max_len=4)[source]

Calculates the sentence level GLEU (Google-BLEU) score described in

Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. (2016) Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. eprint arXiv:1609.08144. https://arxiv.org/pdf/1609.08144v2.pdf Retrieved on 27 Oct 2016.
From Wu et al. (2016):
“The BLEU score has some undesirable properties when used for single
sentences, as it was designed to be a corpus measure. We therefore use a slightly different score for our RL experiments which we call the ‘GLEU score’. For the GLEU score, we record all sub-sequences of 1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then compute a recall, which is the ratio of the number of matching n-grams to the number of total n-grams in the target (ground truth) sequence, and a precision, which is the ratio of the number of matching n-grams to the number of total n-grams in the generated output sequence. Then GLEU score is simply the minimum of recall and precision. This GLEU score’s range is always between 0 (no matches) and 1 (all match) and it is symmetrical when switching output and target. According to our experiments, GLEU score correlates quite well with the BLEU metric on a corpus level but does not have its drawbacks for our per sentence reward objective.”
Note: The initial implementation only allowed a single reference, but now
a list of references is required (which is consistent with bleu_score.sentence_bleu()).

The infamous “the the the ... ” example

>>> ref = 'the cat is on the mat'.split()
>>> hyp = 'the the the the the the the'.split()
>>> sentence_gleu([ref], hyp)  
0.0909...

An example to evaluate normal machine translation outputs

>>> ref1 = str('It is a guide to action that ensures that the military '
...            'will forever heed Party commands').split()
>>> hyp1 = str('It is a guide to action which ensures that the military '
...            'always obeys the commands of the party').split()
>>> hyp2 = str('It is to insure the troops forever hearing the activity '
...            'guidebook that party direct').split()
>>> sentence_gleu([ref1], hyp1) 
0.4393...
>>> sentence_gleu([ref1], hyp2) 
0.1206...
Parameters:
  • references (list(list(str))) – a list of reference sentences
  • hypothesis (list(str)) – a hypothesis sentence
  • min_len (int) – The minimum order of n-gram this function should extract.
  • max_len (int) – The maximum order of n-gram this function should extract.
Returns:

the sentence level GLEU score.

Return type:

float

nltk.translate.ibm1 module

Lexical translation model that ignores word order.

In IBM Model 1, word order is ignored for simplicity. Thus, the following two alignments are equally likely.

Source: je mange du jambon Target: i eat some ham Alignment: (1,1) (2,2) (3,3) (4,4)

Source: je mange du jambon Target: some ham eat i Alignment: (1,4) (2,3) (3,2) (4,1)

The EM algorithm used in Model 1 is: E step - In the training data, count how many times a source language

word is translated into a target language word, weighted by the prior probability of the translation.
M step - Estimate the new probability of translation based on the
counts from the Expectation step.

Notations: i: Position in the source sentence

Valid values are 0 (for NULL), 1, 2, ..., length of source sentence
j: Position in the target sentence
Valid values are 1, 2, ..., length of target sentence

s: A word in the source language t: A word in the target language

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm1.IBMModel1(sentence_aligned_corpus, iterations, probability_tables=None)[source]

Bases: nltk.translate.ibm_model.IBMModel

Lexical translation model that ignores word order

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm1 = IBMModel1(bitext, 5)
>>> print(ibm1.translation_table['buch']['book'])
0.889...
>>> print(ibm1.translation_table['das']['book'])
0.061...
>>> print(ibm1.translation_table['buch'][None])
0.113...
>>> print(ibm1.translation_table['ja'][None])
0.072...
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])
prob_alignment_point(s, t)[source]

Probability that word t in the target sentence is aligned to word s in the source sentence

prob_all_alignments(src_sentence, trg_sentence)[source]

Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t

Each entry in the return value represents the contribution to the total alignment probability by the target word t.

To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.

Returns:Probability of t for all s in src_sentence
Return type:dict(str): float
prob_t_a_given_s(alignment_info)[source]

Probability of target sentence and an alignment given the source sentence

set_uniform_probabilities(sentence_aligned_corpus)[source]
train(parallel_corpus)[source]

nltk.translate.ibm2 module

Lexical translation model that considers word order.

IBM Model 2 improves on Model 1 by accounting for word order. An alignment probability is introduced, a(i | j,l,m), which predicts a source word position, given its aligned target word’s position.

The EM algorithm used in Model 2 is: E step - In the training data, collect counts, weighted by prior

probabilities. (a) count how many times a source language word is translated

into a target language word
  1. count how many times a particular position in the source sentence is aligned to a particular position in the target sentence

M step - Estimate new probabilities based on the counts from the E step

Notations: i: Position in the source sentence

Valid values are 0 (for NULL), 1, 2, ..., length of source sentence
j: Position in the target sentence
Valid values are 1, 2, ..., length of target sentence

l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm2.IBMModel2(sentence_aligned_corpus, iterations, probability_tables=None)[source]

Bases: nltk.translate.ibm_model.IBMModel

Lexical translation model that considers word order

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm2 = IBMModel2(bitext, 5)
>>> print(round(ibm2.translation_table['buch']['book'], 3))
1.0
>>> print(round(ibm2.translation_table['das']['book'], 3))
0.0
>>> print(round(ibm2.translation_table['buch'][None], 3))
0.0
>>> print(round(ibm2.translation_table['ja'][None], 3))
0.0
>>> print(ibm2.alignment_table[1][1][2][2])
0.938...
>>> print(round(ibm2.alignment_table[1][2][2][2], 3))
0.0
>>> print(round(ibm2.alignment_table[2][2][4][5], 3))
1.0
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])
maximize_alignment_probabilities(counts)[source]
prob_alignment_point(i, j, src_sentence, trg_sentence)[source]

Probability that position j in trg_sentence is aligned to position i in the src_sentence

prob_all_alignments(src_sentence, trg_sentence)[source]

Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t

Each entry in the return value represents the contribution to the total alignment probability by the target word t.

To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.

Returns:Probability of t for all s in src_sentence
Return type:dict(str): float
prob_t_a_given_s(alignment_info)[source]

Probability of target sentence and an alignment given the source sentence

set_uniform_probabilities(sentence_aligned_corpus)[source]
train(parallel_corpus)[source]
class nltk.translate.ibm2.Model2Counts[source]

Bases: nltk.translate.ibm_model.Counts

Data object to store counts of various parameters during training. Includes counts for alignment.

update_alignment(count, i, j, l, m)[source]
update_lexical_translation(count, s, t)[source]

nltk.translate.ibm3 module

Translation model that considers how a word can be aligned to multiple words in another language.

IBM Model 3 improves on Model 2 by directly modeling the phenomenon where a word in one language may be translated into zero or more words in another. This is expressed by the fertility probability, n(phi | source word).

If a source word translates into more than one word, it is possible to generate sentences that have the same alignment in multiple ways. This is modeled by a distortion step. The distortion probability, d(j|i,l,m), predicts a target word position, given its aligned source word’s position. The distortion probability replaces the alignment probability of Model 2.

The fertility probability is not applicable for NULL. Target words that align to NULL are assumed to be distributed uniformly in the target sentence. The existence of these words is modeled by p1, the probability that a target word produced by a real source word requires another target word that is produced by NULL.

The EM algorithm used in Model 3 is: E step - In the training data, collect counts, weighted by prior

probabilities. (a) count how many times a source language word is translated

into a target language word
  1. count how many times a particular position in the target sentence is aligned to a particular position in the source sentence
  2. count how many times a source word is aligned to phi number of target words
  3. count how many times NULL is aligned to a target word

M step - Estimate new probabilities based on the counts from the E step

Because there are too many possible alignments, only the most probable ones are considered. First, the best alignment is determined using prior probabilities. Then, a hill climbing approach is used to find other good candidates.

Notations: i: Position in the source sentence

Valid values are 0 (for NULL), 1, 2, ..., length of source sentence
j: Position in the target sentence
Valid values are 1, 2, ..., length of target sentence

l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language phi: Fertility, the number of target words produced by a source word p1: Probability that a target word produced by a source word is

accompanied by another target word that is aligned to NULL

p0: 1 - p1

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm3.IBMModel3(sentence_aligned_corpus, iterations, probability_tables=None)[source]

Bases: nltk.translate.ibm_model.IBMModel

Translation model that considers how a word can be aligned to multiple words in another language

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
>>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> ibm3 = IBMModel3(bitext, 5)
>>> print(round(ibm3.translation_table['buch']['book'], 3))
1.0
>>> print(round(ibm3.translation_table['das']['book'], 3))
0.0
>>> print(round(ibm3.translation_table['ja'][None], 3))
1.0
>>> print(round(ibm3.distortion_table[1][1][2][2], 3))
1.0
>>> print(round(ibm3.distortion_table[1][2][2][2], 3))
0.0
>>> print(round(ibm3.distortion_table[2][2][4][5], 3))
0.75
>>> print(round(ibm3.fertility_table[2]['summarize'], 3))
1.0
>>> print(round(ibm3.fertility_table[1]['book'], 3))
1.0
>>> print(ibm3.p1)
0.054...
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
maximize_distortion_probabilities(counts)[source]
prob_t_a_given_s(alignment_info)[source]

Probability of target sentence and an alignment given the source sentence

reset_probabilities()[source]
set_uniform_probabilities(sentence_aligned_corpus)[source]
train(parallel_corpus)[source]
class nltk.translate.ibm3.Model3Counts[source]

Bases: nltk.translate.ibm_model.Counts

Data object to store counts of various parameters during training. Includes counts for distortion.

update_distortion(count, alignment_info, j, l, m)[source]

nltk.translate.ibm4 module

Translation model that reorders output words based on their type and distance from other related words in the output sentence.

IBM Model 4 improves the distortion model of Model 3, motivated by the observation that certain words tend to be re-ordered in a predictable way relative to one another. For example, <adjective><noun> in English usually has its order flipped as <noun><adjective> in French.

Model 4 requires words in the source and target vocabularies to be categorized into classes. This can be linguistically driven, like parts of speech (adjective, nouns, prepositions, etc). Word classes can also be obtained by statistical methods. The original IBM Model 4 uses an information theoretic approach to group words into 50 classes for each vocabulary.

Terminology: Cept:

A source word with non-zero fertility i.e. aligned to one or more target words.
Tablet:
The set of target word(s) aligned to a cept.
Head of cept:
The first word of the tablet of that cept.
Center of cept:
The average position of the words in that cept’s tablet. If the value is not an integer, the ceiling is taken. For example, for a tablet with words in positions 2, 5, 6 in the target sentence, the center of the corresponding cept is ceil((2 + 5 + 6) / 3) = 5
Displacement:
For a head word, defined as (position of head word - position of previous cept’s center). Can be positive or negative. For a non-head word, defined as (position of non-head word - position of previous word in the same tablet). Always positive, because successive words in a tablet are assumed to appear to the right of the previous word.

In contrast to Model 3 which reorders words in a tablet independently of other words, Model 4 distinguishes between three cases. (1) Words generated by NULL are distributed uniformly. (2) For a head word t, its position is modeled by the probability

d_head(displacement | word_class_s(s),word_class_t(t)), where s is the previous cept, and word_class_s and word_class_t maps s and t to a source and target language word class respectively.
  1. For a non-head word t, its position is modeled by the probability d_non_head(displacement | word_class_t(t))

The EM algorithm used in Model 4 is: E step - In the training data, collect counts, weighted by prior

probabilities. (a) count how many times a source language word is translated

into a target language word
  1. for a particular word class, count how many times a head word is located at a particular displacement from the previous cept’s center
  2. for a particular word class, count how many times a non-head word is located at a particular displacement from the previous target word
  3. count how many times a source word is aligned to phi number of target words
  4. count how many times NULL is aligned to a target word

M step - Estimate new probabilities based on the counts from the E step

Like Model 3, there are too many possible alignments to consider. Thus, a hill climbing approach is used to sample good candidates.

Notations: i: Position in the source sentence

Valid values are 0 (for NULL), 1, 2, ..., length of source sentence
j: Position in the target sentence
Valid values are 1, 2, ..., length of target sentence

l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language phi: Fertility, the number of target words produced by a source word p1: Probability that a target word produced by a source word is

accompanied by another target word that is aligned to NULL

p0: 1 - p1 dj: Displacement, Δj

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm4.IBMModel4(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]

Bases: nltk.translate.ibm_model.IBMModel

Translation model that reorders output words based on their type and their distance from other related words in the output sentence

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
>>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 }
>>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm4 = IBMModel4(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm4.translation_table['buch']['book'], 3))
1.0
>>> print(round(ibm4.translation_table['das']['book'], 3))
0.0
>>> print(round(ibm4.translation_table['ja'][None], 3))
1.0
>>> print(round(ibm4.head_distortion_table[1][0][1], 3))
1.0
>>> print(round(ibm4.head_distortion_table[2][0][1], 3))
0.0
>>> print(round(ibm4.non_head_distortion_table[3][6], 3))
0.5
>>> print(round(ibm4.fertility_table[2]['summarize'], 3))
1.0
>>> print(round(ibm4.fertility_table[1]['book'], 3))
1.0
>>> print(ibm4.p1)
0.033...
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
maximize_distortion_probabilities(counts)[source]
static model4_prob_t_a_given_s(alignment_info, ibm_model)[source]
prob_t_a_given_s(alignment_info)[source]

Probability of target sentence and an alignment given the source sentence

reset_probabilities()[source]
set_uniform_probabilities(sentence_aligned_corpus)[source]

Set distortion probabilities uniformly to 1 / cardinality of displacement values

train(parallel_corpus)[source]
class nltk.translate.ibm4.Model4Counts[source]

Bases: nltk.translate.ibm_model.Counts

Data object to store counts of various parameters during training. Includes counts for distortion.

update_distortion(count, alignment_info, j, src_classes, trg_classes)[source]

nltk.translate.ibm5 module

Translation model that keeps track of vacant positions in the target sentence to decide where to place translated words.

Translation can be viewed as a process where each word in the source sentence is stepped through sequentially, generating translated words for each source word. The target sentence can be viewed as being made up of m empty slots initially, which gradually fill up as generated words are placed in them.

Models 3 and 4 use distortion probabilities to decide how to place translated words. For simplicity, these models ignore the history of which slots have already been occupied with translated words. Consider the placement of the last translated word: there is only one empty slot left in the target sentence, so the distortion probability should be 1.0 for that position and 0.0 everywhere else. However, the distortion probabilities for Models 3 and 4 are set up such that all positions are under consideration.

IBM Model 5 fixes this deficiency by accounting for occupied slots during translation. It introduces the vacancy function v(j), the number of vacancies up to, and including, position j in the target sentence.

Terminology: Maximum vacancy:

The number of valid slots that a word can be placed in. This is not necessarily the same as the number of vacant slots. For example, if a tablet contains more than one word, the head word cannot be placed at the last vacant slot because there will be no space for the other words in the tablet. The number of valid slots has to take into account the length of the tablet. Non-head words cannot be placed before the head word, so vacancies to the left of the head word are ignored.
Vacancy difference:
For a head word: (v(j) - v(center of previous cept)) Can be positive or negative. For a non-head word: (v(j) - v(position of previously placed word)) Always positive, because successive words in a tablet are assumed to appear to the right of the previous word.

Positioning of target words fall under three cases: (1) Words generated by NULL are distributed uniformly (2) For a head word t, its position is modeled by the probability

v_head(dv | max_v,word_class_t(t))
  1. For a non-head word t, its position is modeled by the probability v_non_head(dv | max_v,word_class_t(t))

dv and max_v are defined differently for head and non-head words.

The EM algorithm used in Model 5 is: E step - In the training data, collect counts, weighted by prior

probabilities. (a) count how many times a source language word is translated

into a target language word
  1. for a particular word class and maximum vacancy, count how many times a head word and the previous cept’s center have a particular difference in number of vacancies
  1. for a particular word class and maximum vacancy, count how many times a non-head word and the previous target word have a particular difference in number of vacancies
  1. count how many times a source word is aligned to phi number of target words
  2. count how many times NULL is aligned to a target word

M step - Estimate new probabilities based on the counts from the E step

Like Model 4, there are too many possible alignments to consider. Thus, a hill climbing approach is used to sample good candidates. In addition, pruning is used to weed out unlikely alignments based on Model 4 scores.

Notations: i: Position in the source sentence

Valid values are 0 (for NULL), 1, 2, ..., length of source sentence
j: Position in the target sentence
Valid values are 1, 2, ..., length of target sentence

l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language phi: Fertility, the number of target words produced by a source word p1: Probability that a target word produced by a source word is

accompanied by another target word that is aligned to NULL

p0: 1 - p1 max_v: Maximum vacancy dv: Vacancy difference, Δv

The definition of v_head here differs from GIZA++, section 4.7 of [Brown et al., 1993], and [Koehn, 2010]. In the latter cases, v_head is v_head(v(j) | v(center of previous cept),max_v,word_class(t)).

Here, we follow appendix B of [Brown et al., 1993] and combine v(j) with v(center of previous cept) to obtain dv: v_head(v(j) - v(center of previous cept) | max_v,word_class(t)).

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm5.IBMModel5(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]

Bases: nltk.translate.ibm_model.IBMModel

Translation model that keeps track of vacant positions in the target sentence to decide where to place translated words

>>> bitext = []
>>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big']))
>>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small']))
>>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small']))
>>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house']))
>>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book']))
>>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book']))
>>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 }
>>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm5 = IBMModel5(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm5.head_vacancy_table[1][1][1], 3))
1.0
>>> print(round(ibm5.head_vacancy_table[2][1][1], 3))
0.0
>>> print(round(ibm5.non_head_vacancy_table[3][3][6], 3))
1.0
>>> print(round(ibm5.fertility_table[2]['summarize'], 3))
1.0
>>> print(round(ibm5.fertility_table[1]['book'], 3))
1.0
>>> print(ibm5.p1)
0.033...
>>> test_sentence = bitext[2]
>>> test_sentence.words
['das', 'buch', 'ist', 'ja', 'klein']
>>> test_sentence.mots
['the', 'book', 'is', 'small']
>>> test_sentence.alignment
Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
MIN_SCORE_FACTOR = 0.2

Alignments with scores below this factor are pruned during sampling

hillclimb(alignment_info, j_pegged=None)[source]

Starting from the alignment in alignment_info, look at neighboring alignments iteratively for the best one, according to Model 4

Note that Model 4 scoring is used instead of Model 5 because the latter is too expensive to compute.

There is no guarantee that the best alignment in the alignment space will be found, because the algorithm might be stuck in a local maximum.

Parameters:j_pegged (int) – If specified, the search will be constrained to alignments where j_pegged remains unchanged
Returns:The best alignment found from hill climbing
Return type:AlignmentInfo
maximize_vacancy_probabilities(counts)[source]
prob_t_a_given_s(alignment_info)[source]

Probability of target sentence and an alignment given the source sentence

prune(alignment_infos)[source]

Removes alignments from alignment_infos that have substantially lower Model 4 scores than the best alignment

Returns:Pruned alignments
Return type:set(AlignmentInfo)
reset_probabilities()[source]
sample(sentence_pair)[source]

Sample the most probable alignments from the entire alignment space according to Model 4

Note that Model 4 scoring is used instead of Model 5 because the latter is too expensive to compute.

First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a IBM Model 4. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point. Finally, prune alignments that have substantially lower Model 4 scores than the best alignment.

Parameters:sentence_pair (AlignedSent) – Source and target language sentence pair to generate a sample of alignments from
Returns:A set of best alignments represented by their AlignmentInfo and the best alignment of the set for convenience
Return type:set(AlignmentInfo), AlignmentInfo
set_uniform_probabilities(sentence_aligned_corpus)[source]

Set vacancy probabilities uniformly to 1 / cardinality of vacancy difference values

train(parallel_corpus)[source]
class nltk.translate.ibm5.Model5Counts[source]

Bases: nltk.translate.ibm_model.Counts

Data object to store counts of various parameters during training. Includes counts for vacancies.

update_vacancy(count, alignment_info, i, trg_classes, slots)[source]
Parameters:
  • count – Value to add to the vacancy counts
  • alignment_info – Alignment under consideration
  • i – Source word position under consideration
  • trg_classes – Target word classes
  • slots – Vacancy states of the slots in the target sentence. Output parameter that will be modified as new words are placed in the target sentence.
class nltk.translate.ibm5.Slots(target_sentence_length)[source]

Bases: object

Represents positions in a target sentence. Used to keep track of which slot (position) is occupied.

occupy(position)[source]
Returns:Mark slot at position as occupied
vacancies_at(position)[source]
Returns:Number of vacant slots up to, and including, position

nltk.translate.ibm_model module

Common methods and classes for all IBM models. See IBMModel1, IBMModel2, IBMModel3, IBMModel4, and IBMModel5 for specific implementations.

The IBM models are a series of generative models that learn lexical translation probabilities, p(target language word|source language word), given a sentence-aligned parallel corpus.

The models increase in sophistication from model 1 to 5. Typically, the output of lower models is used to seed the higher models. All models use the Expectation-Maximization (EM) algorithm to learn various probability tables.

Words in a sentence are one-indexed. The first word of a sentence has position 1, not 0. Index 0 is reserved in the source sentence for the NULL token. The concept of position does not apply to NULL, but it is indexed at 0 by convention.

Each target word is aligned to exactly one source word or the NULL token.

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.

class nltk.translate.ibm_model.AlignmentInfo(alignment, src_sentence, trg_sentence, cepts)[source]

Bases: object

Helper data object for training IBM Models 3 and up

Read-only. For a source sentence and its counterpart in the target language, this class holds information about the sentence pair’s alignment, cepts, and fertility.

Warning: Alignments are one-indexed here, in contrast to nltk.translate.Alignment and AlignedSent, which are zero-indexed This class is not meant to be used outside of IBM models.

alignment = None

tuple(int): Alignment function. alignment[j] is the position in the source sentence that is aligned to the position j in the target sentence.

center_of_cept(i)[source]
Returns:The ceiling of the average positions of the words in the tablet of cept i, or 0 if i is None
cepts = None

list(list(int)): The positions of the target words, in ascending order, aligned to a source word position. For example, cepts[4] = (2, 3, 7) means that words in positions 2, 3 and 7 of the target sentence are aligned to the word in position 4 of the source sentence

fertility_of_i(i)[source]

Fertility of word in position i of the source sentence

is_head_word(j)[source]
Returns:Whether the word in position j of the target sentence is a head word
previous_cept(j)[source]
Returns:The previous cept of j, or None if j belongs to the first cept
previous_in_tablet(j)[source]
Returns:The position of the previous word that is in the same tablet as j, or None if j is the first word of the tablet
score = None

float: Optional. Probability of alignment, as defined by the IBM model that assesses this alignment

src_sentence = None

tuple(str): Source sentence referred to by this object. Should include NULL token (None) in index 0.

trg_sentence = None

tuple(str): Target sentence referred to by this object. Should have a dummy element in index 0 so that the first word starts from index 1.

zero_indexed_alignment()[source]
Returns:Zero-indexed alignment, suitable for use in external nltk.translate modules like nltk.translate.Alignment
Return type:list(tuple)
class nltk.translate.ibm_model.Counts[source]

Bases: object

Data object to store counts of various parameters during training

update_fertility(count, alignment_info)[source]
update_lexical_translation(count, alignment_info, j)[source]
update_null_generation(count, alignment_info)[source]
class nltk.translate.ibm_model.IBMModel(sentence_aligned_corpus)[source]

Bases: object

Abstract base class for all IBM models

MIN_PROB = 1e-12
best_model2_alignment(sentence_pair, j_pegged=None, i_pegged=0)[source]

Finds the best alignment according to IBM Model 2

Used as a starting point for hill climbing in Models 3 and above, because it is easier to compute than the best alignments in higher models

Parameters:
  • sentence_pair (AlignedSent) – Source and target language sentence pair to be word-aligned
  • j_pegged (int) – If specified, the alignment point of j_pegged will be fixed to i_pegged
  • i_pegged (int) – Alignment point to j_pegged
hillclimb(alignment_info, j_pegged=None)[source]

Starting from the alignment in alignment_info, look at neighboring alignments iteratively for the best one

There is no guarantee that the best alignment in the alignment space will be found, because the algorithm might be stuck in a local maximum.

Parameters:j_pegged (int) – If specified, the search will be constrained to alignments where j_pegged remains unchanged
Returns:The best alignment found from hill climbing
Return type:AlignmentInfo
init_vocab(sentence_aligned_corpus)[source]
maximize_fertility_probabilities(counts)[source]
maximize_lexical_translation_probabilities(counts)[source]
maximize_null_generation_probabilities(counts)[source]
neighboring(alignment_info, j_pegged=None)[source]

Determine the neighbors of alignment_info, obtained by moving or swapping one alignment point

Parameters:j_pegged (int) – If specified, neighbors that have a different alignment point from j_pegged will not be considered
Returns:A set neighboring alignments represented by their AlignmentInfo
Return type:set(AlignmentInfo)
prob_of_alignments(alignments)[source]
prob_t_a_given_s(alignment_info)[source]

Probability of target sentence and an alignment given the source sentence

All required information is assumed to be in alignment_info and self.

Derived classes should override this method

reset_probabilities()[source]
sample(sentence_pair)[source]

Sample the most probable alignments from the entire alignment space

First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a higher IBM Model. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point.

Hill climbing may be stuck in a local maxima, hence the pegging and trying out of different alignments.

Parameters:sentence_pair (AlignedSent) – Source and target language sentence pair to generate a sample of alignments from
Returns:A set of best alignments represented by their AlignmentInfo and the best alignment of the set for convenience
Return type:set(AlignmentInfo), AlignmentInfo
set_uniform_probabilities(sentence_aligned_corpus)[source]

Initialize probability tables to a uniform distribution

Derived classes should implement this accordingly.

nltk.translate.ibm_model.longest_target_sentence_length(sentence_aligned_corpus)[source]
Parameters:sentence_aligned_corpus (list(AlignedSent)) – Parallel corpus under consideration
Returns:Number of words in the longest target language sentence of sentence_aligned_corpus

nltk.translate.metrics module

nltk.translate.metrics.alignment_error_rate(reference, hypothesis, possible=None)[source]

Return the Alignment Error Rate (AER) of an alignment with respect to a “gold standard” reference alignment. Return an error rate between 0.0 (perfect alignment) and 1.0 (no alignment).

>>> from nltk.translate import Alignment
>>> ref = Alignment([(0, 0), (1, 1), (2, 2)])
>>> test = Alignment([(0, 0), (1, 2), (2, 1)])
>>> alignment_error_rate(ref, test) 
0.6666666666666667
Parameters:
  • reference (Alignment) – A gold standard alignment (sure alignments)
  • hypothesis (Alignment) – A hypothesis alignment (aka. candidate alignments)
  • possible (Alignment or None) – A gold standard reference of possible alignments (defaults to reference if None)
Return type:

float or None

nltk.translate.phrase_based module

nltk.translate.phrase_based.extract(f_start, f_end, e_start, e_end, alignment, f_aligned, srctext, trgtext, srclen, trglen, max_phrase_length)[source]

This function checks for alignment point consistency and extracts phrases using the chunk of consistent phrases.

A phrase pair (e, f ) is consistent with an alignment A if and only if:

  1. No English words in the phrase pair are aligned to words outside it.

    ∀e i ∈ e, (e i , f j ) ∈ A ⇒ f j ∈ f

  2. No Foreign words in the phrase pair are aligned to words outside it.

    ∀f j ∈ f , (e i , f j ) ∈ A ⇒ e i ∈ e

  3. The phrase pair contains at least one alignment point.

    ∃e i ∈ e ̄ , f j ∈ f ̄ s.t. (e i , f j ) ∈ A

Parameters:
  • f_start (int) – Starting index of the possible foreign language phrases
  • f_end (int) – Starting index of the possible foreign language phrases
  • e_start (int) – Starting index of the possible source language phrases
  • e_end (int) – Starting index of the possible source language phrases
  • srctext (list) – The source language tokens, a list of string.
  • trgtext (list) – The target language tokens, a list of string.
  • srclen (int) – The number of tokens in the source language tokens.
  • trglen (int) – The number of tokens in the target language tokens.
nltk.translate.phrase_based.phrase_extraction(srctext, trgtext, alignment, max_phrase_length=0)[source]

Phrase extraction algorithm extracts all consistent phrase pairs from a word-aligned sentence pair.

The idea is to loop over all possible source language (e) phrases and find the minimal foreign phrase (f) that matches each of them. Matching is done by identifying all alignment points for the source phrase and finding the shortest foreign phrase that includes all the foreign counterparts for the source words.

In short, a phrase alignment has to (a) contain all alignment points for all covered words (b) contain at least one alignment point

>>> srctext = "michael assumes that he will stay in the house"
>>> trgtext = "michael geht davon aus , dass er im haus bleibt"
>>> alignment = [(0,0), (1,1), (1,2), (1,3), (2,5), (3,6), (4,9), 
... (5,9), (6,7), (7,7), (8,8)]
>>> phrases = phrase_extraction(srctext, trgtext, alignment)
>>> for i in sorted(phrases):
...    print(i)
...
((0, 1), (0, 1), 'michael', 'michael')
((0, 2), (0, 4), 'michael assumes', 'michael geht davon aus')
((0, 2), (0, 4), 'michael assumes', 'michael geht davon aus ,')
((0, 3), (0, 6), 'michael assumes that', 'michael geht davon aus , dass')
((0, 4), (0, 7), 'michael assumes that he', 'michael geht davon aus , dass er')
((0, 9), (0, 10), 'michael assumes that he will stay in the house', 'michael geht davon aus , dass er im haus bleibt')
((1, 2), (1, 4), 'assumes', 'geht davon aus')
((1, 2), (1, 4), 'assumes', 'geht davon aus ,')
((1, 3), (1, 6), 'assumes that', 'geht davon aus , dass')
((1, 4), (1, 7), 'assumes that he', 'geht davon aus , dass er')
((1, 9), (1, 10), 'assumes that he will stay in the house', 'geht davon aus , dass er im haus bleibt')
((2, 3), (5, 6), 'that', ', dass')
((2, 3), (5, 6), 'that', 'dass')
((2, 4), (5, 7), 'that he', ', dass er')
((2, 4), (5, 7), 'that he', 'dass er')
((2, 9), (5, 10), 'that he will stay in the house', ', dass er im haus bleibt')
((2, 9), (5, 10), 'that he will stay in the house', 'dass er im haus bleibt')
((3, 4), (6, 7), 'he', 'er')
((3, 9), (6, 10), 'he will stay in the house', 'er im haus bleibt')
((4, 6), (9, 10), 'will stay', 'bleibt')
((4, 9), (7, 10), 'will stay in the house', 'im haus bleibt')
((6, 8), (7, 8), 'in the', 'im')
((6, 9), (7, 9), 'in the house', 'im haus')
((8, 9), (8, 9), 'house', 'haus')
Parameters:
  • srctext (str) – The sentence string from the source language.
  • trgtext (str) – The sentence string from the target language.
  • alignment (str) – The word alignment outputs as list of tuples, where the first elements of tuples are the source words’ indices and second elements are the target words’ indices. This is also the output format of nltk.translate.ibm1
  • max_phrase_length (int) – maximal phrase length, if 0 or not specified it is set to a length of the longer sentence (srctext or trgtext).
Return type:

list(tuple)

Returns:

A list of tuples, each element in a list is a phrase and each phrase is a tuple made up of (i) its source location, (ii) its target location, (iii) the source phrase and (iii) the target phrase. The phrase list of tuples represents all the possible phrases extracted from the word alignments.

nltk.translate.ribes_score module

RIBES score implementation

nltk.translate.ribes_score.corpus_ribes(list_of_references, hypotheses, alpha=0.25, beta=0.1)[source]

This function “calculates RIBES for a system output (hypothesis) with multiple references, and returns “best” score among multi-references and individual scores. The scores are corpus-wise, i.e., averaged by the number of sentences.” (c.f. RIBES version 1.03.1 code).

Different from BLEU’s micro-average precision, RIBES calculates the macro-average precision by averaging the best RIBES score for each pair of hypothesis and its corresponding references

>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...         'ensures', 'that', 'the', 'military', 'always',
...         'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...          'ensures', 'that', 'the', 'military', 'will', 'forever',
...          'heed', 'Party', 'commands']
>>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...          'guarantees', 'the', 'military', 'forces', 'always',
...          'being', 'under', 'the', 'command', 'of', 'the', 'Party']
>>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...          'army', 'always', 'to', 'heed', 'the', 'directions',
...          'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was', 
...         'interested', 'in', 'world', 'history']
>>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history', 
...          'because', 'he', 'read', 'the', 'book']
>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]]
>>> hypotheses = [hyp1, hyp2]
>>> round(corpus_ribes(list_of_references, hypotheses),4)
0.3597
Parameters:
  • references (list(list(list(str)))) – a corpus of lists of reference sentences, w.r.t. hypotheses
  • hypotheses (list(list(str))) – a list of hypothesis sentences
  • alpha (float) – hyperparameter used as a prior for the unigram precision.
  • beta (float) – hyperparameter used as a prior for the brevity penalty.
Returns:

The best ribes score from one of the references.

Return type:

float

nltk.translate.ribes_score.find_increasing_sequences(worder)[source]

Given the worder list, this function groups monotonic +1 sequences.

>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> list(find_increasing_sequences(worder))
[(7, 8, 9, 10), (0, 1, 2, 3, 4, 5)]
Parameters:
  • worder – The worder list output from word_rank_alignment
  • type – list(int)
nltk.translate.ribes_score.kendall_tau(worder, normalize=True)[source]

Calculates the Kendall’s Tau correlation coefficient given the worder list of word alignments from word_rank_alignment(), using the formula:

tau = 2 * num_increasing_pairs / num_possible pairs -1

Note that the no. of increasing pairs can be discontinuous in the worder list and each each increasing sequence can be tabulated as choose(len(seq), 2) no. of increasing pairs, e.g.

>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> number_possible_pairs = choose(len(worder), 2)
>>> round(kendall_tau(worder, normalize=False),3)
-0.236
>>> round(kendall_tau(worder),3)
0.382
Parameters:
  • worder (list(int)) – The worder list output from word_rank_alignment
  • normalize (boolean) – Flag to indicate normalization
Returns:

The Kendall’s Tau correlation coefficient.

Return type:

float

nltk.translate.ribes_score.position_of_ngram(ngram, sentence)[source]

This function returns the position of the first instance of the ngram appearing in a sentence.

Note that one could also use string as follows but the code is a little convoluted with type casting back and forth:

char_pos = ‘ ‘.join(sent)[:’ ‘.join(sent).index(‘ ‘.join(ngram))] word_pos = char_pos.count(‘ ‘)

Another way to conceive this is:

return next(i for i, ng in enumerate(ngrams(sentence, len(ngram)))
if ng == ngram)
Parameters:
  • ngram (tuple) – The ngram that needs to be searched
  • sentence (list(str)) – The list of tokens to search from.
nltk.translate.ribes_score.sentence_ribes(references, hypothesis, alpha=0.25, beta=0.1)[source]

The RIBES (Rank-based Intuitive Bilingual Evaluation Score) from Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh and Hajime Tsukada. 2010. “Automatic Evaluation of Translation Quality for Distant Language Pairs”. In Proceedings of EMNLP. http://www.aclweb.org/anthology/D/D10/D10-1092.pdf

The generic RIBES scores used in shared task, e.g. Workshop for Asian Translation (WAT) uses the following RIBES calculations:

RIBES = kendall_tau * (alpha**p1) * (beta**bp)

Please note that this re-implementation differs from the official RIBES implementation and though it emulates the results as describe in the original paper, there are further optimization implemented in the official RIBES script.

Users are encouraged to use the official RIBES script instead of this implementation when evaluating your machine translation system. Refer to http://www.kecl.ntt.co.jp/icl/lirg/ribes/ for the official script.

Parameters:
  • references – a list of reference sentences
  • hypothesis (list(str)) – a hypothesis sentence
  • alpha (float) – hyperparameter used as a prior for the unigram precision.
  • beta (float) – hyperparameter used as a prior for the brevity penalty.
Returns:

The best ribes score from one of the references.

Return type:

float

nltk.translate.ribes_score.spearman_rho(worder, normalize=True)[source]

Calculates the Spearman’s Rho correlation coefficient given the worder list of word alignment from word_rank_alignment(), using the formula:

rho = 1 - sum(d**2) / choose(len(worder)+1, 3)

Given that d is the sum of difference between the worder list of indices and the original word indices from the reference sentence.

Using the (H0,R0) and (H5, R5) example from the paper

>>> worder =  [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
>>> round(spearman_rho(worder, normalize=False), 3)
-0.591
>>> round(spearman_rho(worder), 3)
0.205
Parameters:
  • worder – The worder list output from word_rank_alignment
  • type – list(int)
nltk.translate.ribes_score.word_rank_alignment(reference, hypothesis, character_based=False)[source]

This is the word rank alignment algorithm described in the paper to produce the worder list, i.e. a list of word indices of the hypothesis word orders w.r.t. the list of reference words.

Below is (H0, R0) example from the Isozaki et al. 2010 paper, note the examples are indexed from 1 but the results here are indexed from 0:

>>> ref = str('he was interested in world history because he '
... 'read the book').split()
>>> hyp = str('he read the book because he was interested in world '
... 'history').split()
>>> word_rank_alignment(ref, hyp)
[7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]

The (H1, R1) example from the paper, note the 0th index:

>>> ref = 'John hit Bob yesterday'.split()
>>> hyp = 'Bob hit John yesterday'.split()
>>> word_rank_alignment(ref, hyp)
[2, 1, 0, 3]

Here is the (H2, R2) example from the paper, note the 0th index here too:

>>> ref = 'the boy read the book'.split()
>>> hyp = 'the book was read by the boy'.split()
>>> word_rank_alignment(ref, hyp)
[3, 4, 2, 0, 1]
Parameters:
  • reference (list(str)) – a reference sentence
  • hypothesis (list(str)) – a hypothesis sentence

nltk.translate.stack_decoder module

A decoder that uses stacks to implement phrase-based translation.

In phrase-based translation, the source sentence is segmented into phrases of one or more words, and translations for those phrases are used to build the target sentence.

Hypothesis data structures are used to keep track of the source words translated so far and the partial output. A hypothesis can be expanded by selecting an untranslated phrase, looking up its translation in a phrase table, and appending that translation to the partial output. Translation is complete when a hypothesis covers all source words.

The search space is huge because the source sentence can be segmented in different ways, the source phrases can be selected in any order, and there could be multiple translations for the same source phrase in the phrase table. To make decoding tractable, stacks are used to limit the number of candidate hypotheses by doing histogram and/or threshold pruning.

Hypotheses with the same number of words translated are placed in the same stack. In histogram pruning, each stack has a size limit, and the hypothesis with the lowest score is removed when the stack is full. In threshold pruning, hypotheses that score below a certain threshold of the best hypothesis in that stack are removed.

Hypothesis scoring can include various factors such as phrase translation probability, language model probability, length of translation, cost of remaining words to be translated, and so on.

References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.

class nltk.translate.stack_decoder.StackDecoder(phrase_table, language_model)[source]

Bases: object

Phrase-based stack decoder for machine translation

>>> from nltk.translate import PhraseTable
>>> phrase_table = PhraseTable()
>>> phrase_table.add(('niemand',), ('nobody',), log(0.8))
>>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2))
>>> phrase_table.add(('erwartet',), ('expects',), log(0.8))
>>> phrase_table.add(('erwartet',), ('expecting',), log(0.2))
>>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1))
>>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8))
>>> phrase_table.add(('!',), ('!',), log(0.8))
>>> #  nltk.model should be used here once it is implemented
>>> from collections import defaultdict
>>> language_prob = defaultdict(lambda: -999.0)
>>> language_prob[('nobody',)] = log(0.5)
>>> language_prob[('expects',)] = log(0.4)
>>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2)
>>> language_prob[('!',)] = log(0.1)
>>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()
>>> stack_decoder = StackDecoder(phrase_table, language_model)
>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!'])
['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']
beam_threshold = None
float: Hypotheses that score below this factor of the best
hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0.
compute_future_scores(src_sentence)[source]

Determines the approximate scores for translating every subsequence in src_sentence

Future scores can be used a look-ahead to determine the difficulty of translating the remaining parts of a src_sentence.

Returns:Scores of subsequences referenced by their start and

end positions. For example, result[2][5] is the score of the subsequence covering positions 2, 3, and 4. :rtype: dict(int: (dict(int): float))

distortion_factor
float: Amount of reordering of source phrases.
Lower values favour monotone translation, suitable when word order is similar for both source and target languages. Value between 0.0 and 1.0. Default 0.5.
distortion_score(hypothesis, next_src_phrase_span)[source]
expansion_score(hypothesis, translation_option, src_phrase_span)[source]

Calculate the score of expanding hypothesis with translation_option

Parameters:
  • hypothesis (_Hypothesis) – Hypothesis being expanded
  • translation_option (PhraseTableEntry) – Information about the proposed expansion
  • src_phrase_span (tuple(int, int)) – Word position span of the source phrase
find_all_src_phrases(src_sentence)[source]

Finds all subsequences in src_sentence that have a phrase translation in the translation table

Returns:Subsequences that have a phrase translation, represented as a table of lists of end positions. For example, if result[2] is [5, 6, 9], then there are three phrases starting from position 2 in src_sentence, ending at positions 5, 6, and 9 exclusive. The list of ending positions are in ascending order.
Return type:list(list(int))
future_score(hypothesis, future_score_table, sentence_length)[source]

Determines the approximate score for translating the untranslated words in hypothesis

stack_size = None
int: Maximum number of hypotheses to consider in a stack.
Higher values increase the likelihood of a good translation, but increases processing time.
translate(src_sentence)[source]
Parameters:src_sentence (list(str)) – Sentence to be translated
Returns:Translated sentence
Return type:list(str)
static valid_phrases(all_phrases_from, hypothesis)[source]

Extract phrases from all_phrases_from that contains words that have not been translated by hypothesis

Parameters:all_phrases_from (list(list(int))) – Phrases represented by their spans, in the same format as the return value of find_all_src_phrases
Returns:A list of phrases, represented by their spans, that cover untranslated positions.
Return type:list(tuple(int, int))
word_penalty = None
float: Influences the translation length exponentially.
If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied.

Module contents

Experimental features for machine translation. These interfaces are prone to change.