nltk.translate package¶
Submodules¶
nltk.translate.api module¶
-
class
nltk.translate.api.
AlignedSent
(words, mots, alignment=None)[source]¶ Bases:
object
Return an aligned sentence object, which encapsulates two sentences along with an
Alignment
between them.Typically used in machine translation to represent a sentence and its translation.
>>> from nltk.translate import AlignedSent, Alignment >>> algnsent = AlignedSent(['klein', 'ist', 'das', 'Haus'], ... ['the', 'house', 'is', 'small'], Alignment.fromstring('0-3 1-2 2-0 3-1')) >>> algnsent.words ['klein', 'ist', 'das', 'Haus'] >>> algnsent.mots ['the', 'house', 'is', 'small'] >>> algnsent.alignment Alignment([(0, 3), (1, 2), (2, 0), (3, 1)]) >>> from nltk.corpus import comtrans >>> print(comtrans.aligned_sents()[54]) <AlignedSent: 'Weshalb also sollten...' -> 'So why should EU arm...'> >>> print(comtrans.aligned_sents()[54].alignment) 0-0 0-1 1-0 2-2 3-4 3-5 4-7 5-8 6-3 7-9 8-9 9-10 9-11 10-12 11-6 12-6 13-13
Parameters: - words (list(str)) – Words in the target language sentence
- mots (list(str)) – Words in the source language sentence
- alignment (Alignment) – Word-level alignments between
words
andmots
. Each alignment is represented as a 2-tuple (words_index, mots_index).
-
alignment
¶
-
invert
()[source]¶ Return the aligned sentence pair, reversing the directionality
Return type: AlignedSent
-
mots
¶
-
unicode_repr
()¶ Return a string representation for this
AlignedSent
.Return type: str
-
words
¶
-
class
nltk.translate.api.
Alignment
[source]¶ Bases:
frozenset
A storage class for representing alignment between two sequences, s1, s2. In general, an alignment is a set of tuples of the form (i, j, …) representing an alignment between the i-th element of s1 and the j-th element of s2. Tuples are extensible (they might contain additional data, such as a boolean to indicate sure vs possible alignments).
>>> from nltk.translate import Alignment >>> a = Alignment([(0, 0), (0, 1), (1, 2), (2, 2)]) >>> a.invert() Alignment([(0, 0), (1, 0), (2, 1), (2, 2)]) >>> print(a.invert()) 0-0 1-0 2-1 2-2 >>> a[0] [(0, 1), (0, 0)] >>> a.invert()[2] [(2, 1), (2, 2)] >>> b = Alignment([(0, 0), (0, 1)]) >>> b.issubset(a) True >>> c = Alignment.fromstring('0-0 0-1') >>> b == c True
-
classmethod
fromstring
(s)[source]¶ Read a giza-formatted string and return an Alignment object.
>>> Alignment.fromstring('0-0 2-1 9-2 21-3 10-4 7-5') Alignment([(0, 0), (2, 1), (7, 5), (9, 2), (10, 4), (21, 3)])
Parameters: s (str) – the positional alignments in giza format Return type: Alignment Returns: An Alignment object corresponding to the string representation s
.
-
range
(positions=None)[source]¶ Work out the range of the mapping from the given positions. If no positions are specified, compute the range of the entire mapping.
-
unicode_repr
()¶ Produce a Giza-formatted string representing the alignment.
-
classmethod
-
class
nltk.translate.api.
PhraseTable
[source]¶ Bases:
object
In-memory store of translations for a given phrase, and the log probability of the those translations
-
add
(src_phrase, trg_phrase, log_prob)[source]¶ Parameters: log_prob (float) – Log probability that given src_phrase
,trg_phrase
is its translation
-
translations_for
(src_phrase)[source]¶ Get the translations for a source language phrase
Parameters: src_phrase (tuple(str)) – Source language phrase of interest Returns: A list of target language phrases that are translations of src_phrase
, ordered in decreasing order of likelihood. Each list element is a tuple of the target phrase and its log probability.Return type: list(PhraseTableEntry)
-
nltk.translate.bleu_score module¶
BLEU score implementation.
-
class
nltk.translate.bleu_score.
SmoothingFunction
(epsilon=0.1, alpha=5, k=5)[source]¶ Bases:
object
This is an implementation of the smoothing techniques for segment-level BLEU scores that was presented in Boxing Chen and Collin Cherry (2014) A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU. In WMT14. http://acl2014.org/acl2014/W14-33/pdf/W14-3346.pdf
-
method1
(p_n, *args, **kwargs)[source]¶ Smoothing method 1: Add epsilon counts to precision with 0 counts.
-
method2
(p_n, *args, **kwargs)[source]¶ Smoothing method 2: Add 1 to both numerator and denominator from Chin-Yew Lin and Franz Josef Och (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In ACL04.
-
method3
(p_n, *args, **kwargs)[source]¶ Smoothing method 3: NIST geometric sequence smoothing The smoothing is computed by taking 1 / ( 2^k ), instead of 0, for each precision score whose matching n-gram count is null. k is 1 for the first ‘n’ value for which the n-gram match count is null/ For example, if the text contains:
- one 2-gram match
- and (consequently) two 1-gram matches
- the n-gram count for each individual precision score would be:
- n=1 => prec_count = 2 (two unigrams)
- n=2 => prec_count = 1 (one bigram)
- n=3 => prec_count = 1/2 (no trigram, taking ‘smoothed’ value of 1 / ( 2^k ), with k=1)
- n=4 => prec_count = 1/4 (no fourgram, taking ‘smoothed’ value of 1 / ( 2^k ), with k=2)
-
method4
(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]¶ Smoothing method 4: Shorter translations may have inflated precision values due to having smaller denominators; therefore, we give them proportionally smaller smoothed counts. Instead of scaling to 1/(2^k), Chen and Cherry suggests dividing by 1/ln(len(T)), where T is the length of the translation.
-
method5
(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]¶ Smoothing method 5: The matched counts for similar values of n should be similar. To a calculate the n-gram matched count, it averages the n−1, n and n+1 gram matched counts.
-
method6
(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]¶ Smoothing method 6: Interpolates the maximum likelihood estimate of the precision p_n with a prior estimate pi0. The prior is estimated by assuming that the ratio between pn and pn−1 will be the same as that between pn−1 and pn−2; from Gao and He (2013) Training MRF-Based Phrase Translation Models using Gradient Ascent. In NAACL.
-
method7
(p_n, references, hypothesis, hyp_len, *args, **kwargs)[source]¶ Smoothing method 6: Interpolates the maximum likelihood estimate of the precision p_n with a prior estimate pi0. The prior is estimated by assuming that the ratio between pn and pn−1 will be the same as that between pn−1 and pn−2.
-
-
nltk.translate.bleu_score.
brevity_penalty
(closest_ref_len, hyp_len)[source]¶ Calculate brevity penalty.
As the modified n-gram precision still has the problem from the short length sentence, brevity penalty is used to modify the overall BLEU score according to length.
An example from the paper. There are three references with length 12, 15 and 17. And a concise hypothesis of the length 12. The brevity penalty is 1.
>>> reference1 = list('aaaaaaaaaaaa') # i.e. ['a'] * 12 >>> reference2 = list('aaaaaaaaaaaaaaa') # i.e. ['a'] * 15 >>> reference3 = list('aaaaaaaaaaaaaaaaa') # i.e. ['a'] * 17 >>> hypothesis = list('aaaaaaaaaaaa') # i.e. ['a'] * 12 >>> references = [reference1, reference2, reference3] >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(references, hyp_len) >>> brevity_penalty(closest_ref_len, hyp_len) 1.0
In case a hypothesis translation is shorter than the references, penalty is applied.
>>> references = [['a'] * 28, ['a'] * 28] >>> hypothesis = ['a'] * 12 >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(references, hyp_len) >>> brevity_penalty(closest_ref_len, hyp_len) 0.2635971381157267
The length of the closest reference is used to compute the penalty. If the length of a hypothesis is 12, and the reference lengths are 13 and 2, the penalty is applied because the hypothesis length (12) is less then the closest reference length (13).
>>> references = [['a'] * 13, ['a'] * 2] >>> hypothesis = ['a'] * 12 >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(references, hyp_len) >>> brevity_penalty(closest_ref_len, hyp_len) 0.9200...
The brevity penalty doesn’t depend on reference order. More importantly, when two reference sentences are at the same distance, the shortest reference sentence length is used.
>>> references = [['a'] * 13, ['a'] * 11] >>> hypothesis = ['a'] * 12 >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(references, hyp_len) >>> bp1 = brevity_penalty(closest_ref_len, hyp_len) >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(reversed(references), hyp_len) >>> bp2 = brevity_penalty(closest_ref_len, hyp_len) >>> bp1 == bp2 == 1 True
A test example from mteval-v13a.pl (starting from the line 705):
>>> references = [['a'] * 11, ['a'] * 8] >>> hypothesis = ['a'] * 7 >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(references, hyp_len) >>> brevity_penalty(closest_ref_len, hyp_len) 0.8668...
>>> references = [['a'] * 11, ['a'] * 8, ['a'] * 6, ['a'] * 7] >>> hypothesis = ['a'] * 7 >>> hyp_len = len(hypothesis) >>> closest_ref_len = closest_ref_length(references, hyp_len) >>> brevity_penalty(closest_ref_len, hyp_len) 1.0
Parameters: hyp_len – The length of the hypothesis for a single sentence OR the sum of all the hypotheses’ lengths for a corpus :type hyp_len: int :param closest_ref_len: The length of the closest reference for a single hypothesis OR the sum of all the closest references for every hypotheses. :type closest_ref_len: int :return: BLEU’s brevity penalty. :rtype: float
-
nltk.translate.bleu_score.
closest_ref_length
(references, hyp_len)[source]¶ This function finds the reference that is the closest length to the hypothesis. The closest reference length is referred to as r variable from the brevity penalty formula in Papineni et. al. (2002)
Parameters: - references (list(list(str))) – A list of reference translations.
- hyp_len (int) – The length of the hypothesis.
Returns: The length of the reference that’s closest to the hypothesis.
Return type: int
-
nltk.translate.bleu_score.
corpus_bleu
(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None, auto_reweigh=False)[source]¶ Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all the hypotheses and their respective references.
Instead of averaging the sentence level BLEU scores (i.e. marco-average precision), the original BLEU metric (Papineni et al. 2002) accounts for the micro-average precision (i.e. summing the numerators and denominators for each hypothesis-reference(s) pairs before the division).
>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party'] >>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands'] >>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', 'Party'] >>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was', ... 'interested', 'in', 'world', 'history'] >>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history', ... 'because', 'he', 'read', 'the', 'book']
>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]] >>> hypotheses = [hyp1, hyp2] >>> corpus_bleu(list_of_references, hypotheses) 0.5920...
The example below show that corpus_bleu() is different from averaging sentence_bleu() for hypotheses
>>> score1 = sentence_bleu([ref1a, ref1b, ref1c], hyp1) >>> score2 = sentence_bleu([ref2a], hyp2) >>> (score1 + score2) / 2 0.6223...
Parameters: - list_of_references (list(list(list(str)))) – a corpus of lists of reference sentences, w.r.t. hypotheses
- hypotheses (list(list(str))) – a list of hypothesis sentences
- weights (list(float)) – weights for unigrams, bigrams, trigrams and so on
- smoothing_function (SmoothingFunction) –
- auto_reweigh (bool) – Option to re-normalize the weights uniformly.
Returns: The corpus-level BLEU score.
Return type: float
-
nltk.translate.bleu_score.
modified_precision
(references, hypothesis, n)[source]¶ Calculate modified ngram precision.
The normal precision method may lead to some wrong translations with high-precision, e.g., the translation, in which a word of reference repeats several times, has very high precision.
This function only returns the Fraction object that contains the numerator and denominator necessary to calculate the corpus-level precision. To calculate the modified precision for a single pair of hypothesis and references, cast the Fraction object into a float.
The famous “the the the … ” example shows that you can get BLEU precision by duplicating high frequency words.
>>> reference1 = 'the cat is on the mat'.split() >>> reference2 = 'there is a cat on the mat'.split() >>> hypothesis1 = 'the the the the the the the'.split() >>> references = [reference1, reference2] >>> float(modified_precision(references, hypothesis1, n=1)) 0.2857...
In the modified n-gram precision, a reference word will be considered exhausted after a matching hypothesis word is identified, e.g.
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', ... 'forever', 'heed', 'Party', 'commands'] >>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party'] >>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party'] >>> hypothesis = 'of the'.split() >>> references = [reference1, reference2, reference3] >>> float(modified_precision(references, hypothesis, n=1)) 1.0 >>> float(modified_precision(references, hypothesis, n=2)) 1.0
An example of a normal machine translation hypothesis:
>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', ... 'forever', 'hearing', 'the', 'activity', 'guidebook', ... 'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', ... 'forever', 'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party'] >>> references = [reference1, reference2, reference3] >>> float(modified_precision(references, hypothesis1, n=1)) 0.9444... >>> float(modified_precision(references, hypothesis2, n=1)) 0.5714... >>> float(modified_precision(references, hypothesis1, n=2)) 0.5882352941176471 >>> float(modified_precision(references, hypothesis2, n=2)) 0.07692...
Parameters: - references (list(list(str))) – A list of reference translations.
- hypothesis (list(str)) – A hypothesis translation.
- n (int) – The ngram order.
Returns: BLEU’s modified precision for the nth order ngram.
Return type: Fraction
-
nltk.translate.bleu_score.
sentence_bleu
(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None, auto_reweigh=False)[source]¶ Calculate BLEU score (Bilingual Evaluation Understudy) from Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. “BLEU: a method for automatic evaluation of machine translation.” In Proceedings of ACL. http://www.aclweb.org/anthology/P02-1040.pdf
>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', ... 'forever', 'hearing', 'the', 'activity', 'guidebook', ... 'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party']
>>> sentence_bleu([reference1, reference2, reference3], hypothesis1) 0.5045...
If there is no ngrams overlap for any order of n-grams, BLEU returns the value 0. This is because the precision for the order of n-grams without overlap is 0, and the geometric mean in the final BLEU score computation multiplies the 0 with the precision of other n-grams. This results in 0 (independently of the precision of the othe n-gram orders). The following example has zero 3-gram and 4-gram overlaps:
>>> round(sentence_bleu([reference1, reference2, reference3], hypothesis2),4) 0.0
To avoid this harsh behaviour when no ngram overlaps are found a smoothing function can be used.
>>> chencherry = SmoothingFunction() >>> sentence_bleu([reference1, reference2, reference3], hypothesis2, ... smoothing_function=chencherry.method1) 0.0370...
The default BLEU calculates a score for up to 4-grams using uniform weights (this is called BLEU-4). To evaluate your translations with higher/lower order ngrams, use customized weights. E.g. when accounting for up to 5-grams with uniform weights (this is called BLEU-5) use:
>>> weights = (1./5., 1./5., 1./5., 1./5., 1./5.) >>> sentence_bleu([reference1, reference2, reference3], hypothesis1, weights) 0.3920...
Parameters: - references (list(list(str))) – reference sentences
- hypothesis (list(str)) – a hypothesis sentence
- weights (list(float)) – weights for unigrams, bigrams, trigrams and so on
- smoothing_function (SmoothingFunction) –
- auto_reweigh (bool) – Option to re-normalize the weights uniformly.
Returns: The sentence-level BLEU score.
Return type: float
nltk.translate.chrf_score module¶
ChrF score implementation
-
nltk.translate.chrf_score.
chrf_precision_recall_fscore_support
(reference, hypothesis, n, beta=3.0, epsilon=1e-16)[source]¶ This function computes the precision, recall and fscore from the ngram overlaps. It returns the support which is the true positive score.
By underspecifying the input type, the function will be agnostic as to how it computes the ngrams and simply take the whichever element in the list; it could be either token or character.
Parameters: - reference (list) – The reference sentence.
- hypothesis (list) – The hypothesis sentence.
- n (int) – Extract up to the n-th order ngrams
- beta (float) – The parameter to assign more importance to recall over precision.
- epsilon (float) – The fallback value if the hypothesis or reference is empty.
Returns: Returns the precision, recall and f-score and support (true positive).
Return type: tuple(float)
-
nltk.translate.chrf_score.
corpus_chrf
(references, hypotheses, min_len=1, max_len=6, beta=3.0, ignore_whitespace=True)[source]¶ Calculates the corpus level CHRF (Character n-gram F-score), it is the macro-averaged value of the sentence/segment level CHRF score.
This implementation of CHRF only supports a single reference at the moment.
>>> ref1 = str('It is a guide to action that ensures that the military ' ... 'will forever heed Party commands').split() >>> ref2 = str('It is the guiding principle which guarantees the military ' ... 'forces always being under the command of the Party').split() >>> >>> hyp1 = str('It is a guide to action which ensures that the military ' ... 'always obeys the commands of the party').split() >>> hyp2 = str('It is to insure the troops forever hearing the activity ' ... 'guidebook that party direct') >>> corpus_chrf([ref1, ref2, ref1, ref2], [hyp1, hyp2, hyp2, hyp1]) 0.3910...
Parameters: - references (list(list(str))) – a corpus of list of reference sentences, w.r.t. hypotheses
- hypotheses (list(list(str))) – a list of hypothesis sentences
- min_len (int) – The minimum order of n-gram this function should extract.
- max_len (int) – The maximum order of n-gram this function should extract.
- beta (float) – the parameter to assign more importance to recall over precision
- ignore_whitespace (bool) – ignore whitespace characters in scoring
Returns: the sentence level CHRF score.
Return type: float
-
nltk.translate.chrf_score.
sentence_chrf
(reference, hypothesis, min_len=1, max_len=6, beta=3.0, ignore_whitespace=True)[source]¶ - Calculates the sentence level CHRF (Character n-gram F-score) described in
- Maja Popovic. 2015. CHRF: Character n-gram F-score for Automatic MT Evaluation. In Proceedings of the 10th Workshop on Machine Translation. http://www.statmt.org/wmt15/pdf/WMT49.pdf
- Maja Popovic. 2016. CHRF Deconstructed: β Parameters and n-gram Weights. In Proceedings of the 1st Conference on Machine Translation. http://www.statmt.org/wmt16/pdf/W16-2341.pdf
This implementation of CHRF only supports a single reference at the moment.
For details not reported in the paper, consult Maja Popovic’s original implementation: https://github.com/m-popovic/chrF
The code should output results equivalent to running CHRF++ with the following options: -nw 0 -b 3
An example from the original BLEU paper http://www.aclweb.org/anthology/P02-1040.pdf
>>> ref1 = str('It is a guide to action that ensures that the military ' ... 'will forever heed Party commands').split() >>> hyp1 = str('It is a guide to action which ensures that the military ' ... 'always obeys the commands of the party').split() >>> hyp2 = str('It is to insure the troops forever hearing the activity ' ... 'guidebook that party direct').split() >>> sentence_chrf(ref1, hyp1) 0.6349... >>> sentence_chrf(ref1, hyp2) 0.3330...
The infamous “the the the … ” example
>>> ref = 'the cat is on the mat'.split() >>> hyp = 'the the the the the the the'.split() >>> sentence_chrf(ref, hyp) 0.1468...
An example to show that this function allows users to use strings instead of tokens, i.e. list(str) as inputs.
>>> ref1 = str('It is a guide to action that ensures that the military ' ... 'will forever heed Party commands') >>> hyp1 = str('It is a guide to action which ensures that the military ' ... 'always obeys the commands of the party') >>> sentence_chrf(ref1, hyp1) 0.6349... >>> type(ref1) == type(hyp1) == str True >>> sentence_chrf(ref1.split(), hyp1.split()) 0.6349...
To skip the unigrams and only use 2- to 3-grams:
>>> sentence_chrf(ref1, hyp1, min_len=2, max_len=3) 0.6617...
Parameters: - references (list(str) / str) – reference sentence
- hypothesis (list(str) / str) – a hypothesis sentence
- min_len (int) – The minimum order of n-gram this function should extract.
- max_len (int) – The maximum order of n-gram this function should extract.
- beta (float) – the parameter to assign more importance to recall over precision
- ignore_whitespace (bool) – ignore whitespace characters in scoring
Returns: the sentence level CHRF score.
Return type: float
nltk.translate.gale_church module¶
A port of the Gale-Church Aligner.
Gale & Church (1993), A Program for Aligning Sentences in Bilingual Corpora. http://aclweb.org/anthology/J93-1004.pdf
-
class
nltk.translate.gale_church.
LanguageIndependent
[source]¶ Bases:
object
-
AVERAGE_CHARACTERS
= 1¶
-
PRIORS
= {(0, 1): 0.0099, (1, 0): 0.0099, (1, 1): 0.89, (1, 2): 0.089, (2, 1): 0.089, (2, 2): 0.011}¶
-
VARIANCE_CHARACTERS
= 6.8¶
-
-
nltk.translate.gale_church.
align_blocks
(source_sents_lens, target_sents_lens, params=<class 'nltk.translate.gale_church.LanguageIndependent'>)[source]¶ Return the sentence alignment of two text blocks (usually paragraphs).
>>> align_blocks([5,5,5], [7,7,7]) [(0, 0), (1, 1), (2, 2)] >>> align_blocks([10,5,5], [12,20]) [(0, 0), (1, 1), (2, 1)] >>> align_blocks([12,20], [10,5,5]) [(0, 0), (1, 1), (1, 2)] >>> align_blocks([10,2,10,10,2,10], [12,3,20,3,12]) [(0, 0), (1, 1), (2, 2), (3, 2), (4, 3), (5, 4)]
@param source_sents_lens: The list of source sentence lengths. @param target_sents_lens: The list of target sentence lengths. @param params: the sentence alignment parameters. @return: The sentence alignments, a list of index pairs.
-
nltk.translate.gale_church.
align_log_prob
(i, j, source_sents, target_sents, alignment, params)[source]¶ Returns the log probability of the two sentences C{source_sents[i]}, C{target_sents[j]} being aligned with a specific C{alignment}.
@param i: The offset of the source sentence. @param j: The offset of the target sentence. @param source_sents: The list of source sentence lengths. @param target_sents: The list of target sentence lengths. @param alignment: The alignment type, a tuple of two integers. @param params: The sentence alignment parameters.
@returns: The log probability of a specific alignment between the two sentences, given the parameters.
-
nltk.translate.gale_church.
align_texts
(source_blocks, target_blocks, params=<class 'nltk.translate.gale_church.LanguageIndependent'>)[source]¶ Creates the sentence alignment of two texts.
Texts can consist of several blocks. Block boundaries cannot be crossed by sentence alignment links.
Each block consists of a list that contains the lengths (in characters) of the sentences in this block.
@param source_blocks: The list of blocks in the source text. @param target_blocks: The list of blocks in the target text. @param params: the sentence alignment parameters.
@returns: A list of sentence alignment lists
-
nltk.translate.gale_church.
norm_cdf
(x)[source]¶ Return the area under the normal distribution from M{-∞..x}.
-
nltk.translate.gale_church.
parse_token_stream
(stream, soft_delimiter, hard_delimiter)[source]¶ Parses a stream of tokens and splits it into sentences (using C{soft_delimiter} tokens) and blocks (using C{hard_delimiter} tokens) for use with the L{align_texts} function.
-
nltk.translate.gale_church.
split_at
(it, split_value)[source]¶ Splits an iterator C{it} at values of C{split_value}.
Each instance of C{split_value} is swallowed. The iterator produces subiterators which need to be consumed fully before the next subiterator can be used.
-
nltk.translate.gale_church.
trace
(backlinks, source_sents_lens, target_sents_lens)[source]¶ Traverse the alignment cost from the tracebacks and retrieves appropriate sentence pairs.
Parameters: - backlinks (dict) – A dictionary where the key is the alignment points and value is the cost (referencing the LanguageIndependent.PRIORS)
- source_sents_lens (list(int)) – A list of target sentences’ lengths
- target_sents_lens (list(int)) – A list of target sentences’ lengths
nltk.translate.gdfa module¶
-
nltk.translate.gdfa.
grow_diag_final_and
(srclen, trglen, e2f, f2e)[source]¶ This module symmetrisatizes the source-to-target and target-to-source word alignment output and produces, aka. GDFA algorithm (Koehn, 2005).
Step 1: Find the intersection of the bidirectional alignment.
- Step 2: Search for additional neighbor alignment points to be added, given
- these criteria: (i) neighbor alignments points are not in the intersection and (ii) neighbor alignments are in the union.
- Step 3: Add all other alignment points thats not in the intersection, not in
- the neighboring alignments that met the criteria but in the original foward/backward alignment outputs.
>>> forw = ('0-0 2-1 9-2 21-3 10-4 7-5 11-6 9-7 12-8 1-9 3-10 ' ... '4-11 17-12 17-13 25-14 13-15 24-16 11-17 28-18') >>> back = ('0-0 1-9 2-9 3-10 4-11 5-12 6-6 7-5 8-6 9-7 10-4 ' ... '11-6 12-8 13-12 15-12 17-13 18-13 19-12 20-13 ' ... '21-3 22-12 23-14 24-17 25-15 26-17 27-18 28-18') >>> srctext = ("この よう な ハロー 白色 わい 星 の L 関数 " ... "は L と 共 に 不連続 に 増加 する こと が " ... "期待 さ れる こと を 示し た 。") >>> trgtext = ("Therefore , we expect that the luminosity function " ... "of such halo white dwarfs increases discontinuously " ... "with the luminosity .") >>> srclen = len(srctext.split()) >>> trglen = len(trgtext.split()) >>> >>> gdfa = grow_diag_final_and(srclen, trglen, forw, back) >>> gdfa == sorted(set([(28, 18), (6, 6), (24, 17), (2, 1), (15, 12), (13, 12), ... (2, 9), (3, 10), (26, 17), (25, 15), (8, 6), (9, 7), (20, ... 13), (18, 13), (0, 0), (10, 4), (13, 15), (23, 14), (7, 5), ... (25, 14), (1, 9), (17, 13), (4, 11), (11, 17), (9, 2), (22, ... 12), (27, 18), (24, 16), (21, 3), (19, 12), (17, 12), (5, ... 12), (11, 6), (12, 8)])) True
References: Koehn, P., A. Axelrod, A. Birch, C. Callison, M. Osborne, and D. Talbot. 2005. Edinburgh System Description for the 2005 IWSLT Speech Translation Evaluation. In MT Eval Workshop.
Parameters: - srclen (int) – the number of tokens in the source language
- trglen (int) – the number of tokens in the target language
- e2f (str) – the forward word alignment outputs from source-to-target language (in pharaoh output format)
- f2e (str) – the backward word alignment outputs from target-to-source language (in pharaoh output format)
Return type: set(tuple(int))
Returns: the symmetrized alignment points from the GDFA algorithm
nltk.translate.gleu_score module¶
GLEU score implementation.
-
nltk.translate.gleu_score.
corpus_gleu
(list_of_references, hypotheses, min_len=1, max_len=4)[source]¶ Calculate a single corpus-level GLEU score (aka. system-level GLEU) for all the hypotheses and their respective references.
Instead of averaging the sentence level GLEU scores (i.e. macro-average precision), Wu et al. (2016) sum up the matching tokens and the max of hypothesis and reference tokens for each sentence, then compute using the aggregate values.
- From Mike Schuster (via email):
- “For the corpus, we just add up the two statistics n_match and
- n_all = max(n_all_output, n_all_target) for all sentences, then calculate gleu_score = n_match / n_all, so it is not just a mean of the sentence gleu scores (in our case, longer sentences count more, which I think makes sense as they are more difficult to translate).”
>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party'] >>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands'] >>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', 'Party'] >>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was', ... 'interested', 'in', 'world', 'history'] >>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history', ... 'because', 'he', 'read', 'the', 'book']
>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]] >>> hypotheses = [hyp1, hyp2] >>> corpus_gleu(list_of_references, hypotheses) 0.5673...
The example below show that corpus_gleu() is different from averaging sentence_gleu() for hypotheses
>>> score1 = sentence_gleu([ref1a], hyp1) >>> score2 = sentence_gleu([ref2a], hyp2) >>> (score1 + score2) / 2 0.6144...
Parameters: - list_of_references (list(list(list(str)))) – a list of reference sentences, w.r.t. hypotheses
- hypotheses (list(list(str))) – a list of hypothesis sentences
- min_len (int) – The minimum order of n-gram this function should extract.
- max_len (int) – The maximum order of n-gram this function should extract.
Returns: The corpus-level GLEU score.
Return type: float
-
nltk.translate.gleu_score.
sentence_gleu
(references, hypothesis, min_len=1, max_len=4)[source]¶ Calculates the sentence level GLEU (Google-BLEU) score described in
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. (2016) Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. eprint arXiv:1609.08144. https://arxiv.org/pdf/1609.08144v2.pdf Retrieved on 27 Oct 2016.- From Wu et al. (2016):
- “The BLEU score has some undesirable properties when used for single
- sentences, as it was designed to be a corpus measure. We therefore use a slightly different score for our RL experiments which we call the ‘GLEU score’. For the GLEU score, we record all sub-sequences of 1, 2, 3 or 4 tokens in output and target sequence (n-grams). We then compute a recall, which is the ratio of the number of matching n-grams to the number of total n-grams in the target (ground truth) sequence, and a precision, which is the ratio of the number of matching n-grams to the number of total n-grams in the generated output sequence. Then GLEU score is simply the minimum of recall and precision. This GLEU score’s range is always between 0 (no matches) and 1 (all match) and it is symmetrical when switching output and target. According to our experiments, GLEU score correlates quite well with the BLEU metric on a corpus level but does not have its drawbacks for our per sentence reward objective.”
- Note: The initial implementation only allowed a single reference, but now
- a list of references is required (which is consistent with bleu_score.sentence_bleu()).
The infamous “the the the … ” example
>>> ref = 'the cat is on the mat'.split() >>> hyp = 'the the the the the the the'.split() >>> sentence_gleu([ref], hyp) 0.0909...
An example to evaluate normal machine translation outputs
>>> ref1 = str('It is a guide to action that ensures that the military ' ... 'will forever heed Party commands').split() >>> hyp1 = str('It is a guide to action which ensures that the military ' ... 'always obeys the commands of the party').split() >>> hyp2 = str('It is to insure the troops forever hearing the activity ' ... 'guidebook that party direct').split() >>> sentence_gleu([ref1], hyp1) 0.4393... >>> sentence_gleu([ref1], hyp2) 0.1206...
Parameters: - references (list(list(str))) – a list of reference sentences
- hypothesis (list(str)) – a hypothesis sentence
- min_len (int) – The minimum order of n-gram this function should extract.
- max_len (int) – The maximum order of n-gram this function should extract.
Returns: the sentence level GLEU score.
Return type: float
nltk.translate.ibm1 module¶
Lexical translation model that ignores word order.
In IBM Model 1, word order is ignored for simplicity. As long as the word alignments are equivalent, it doesn’t matter where the word occurs in the source or target sentence. Thus, the following three alignments are equally likely.
Source: je mange du jambon Target: i eat some ham Alignment: (0,0) (1,1) (2,2) (3,3)
Source: je mange du jambon Target: some ham eat i Alignment: (0,2) (1,3) (2,1) (3,1)
Source: du jambon je mange Target: eat i some ham Alignment: (0,3) (1,2) (2,0) (3,1)
Note that an alignment is represented here as (word_index_in_target, word_index_in_source).
The EM algorithm used in Model 1 is: E step - In the training data, count how many times a source language
word is translated into a target language word, weighted by the prior probability of the translation.
- M step - Estimate the new probability of translation based on the
- counts from the Expectation step.
Notations: i: Position in the source sentence
Valid values are 0 (for NULL), 1, 2, …, length of source sentence
- j: Position in the target sentence
- Valid values are 1, 2, …, length of target sentence
s: A word in the source language t: A word in the target language
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
-
class
nltk.translate.ibm1.
IBMModel1
(sentence_aligned_corpus, iterations, probability_tables=None)[source]¶ Bases:
nltk.translate.ibm_model.IBMModel
Lexical translation model that ignores word order
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm1 = IBMModel1(bitext, 5)
>>> print(ibm1.translation_table['buch']['book']) 0.889... >>> print(ibm1.translation_table['das']['book']) 0.061... >>> print(ibm1.translation_table['buch'][None]) 0.113... >>> print(ibm1.translation_table['ja'][None]) 0.072...
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])
-
align
(sentence_pair)[source]¶ Determines the best word alignment for one sentence pair from the corpus that the model was trained on.
The best alignment will be set in
sentence_pair
when the method returns. In contrast with the internal implementation of IBM models, the word indices in theAlignment
are zero- indexed, not one-indexed.Parameters: sentence_pair (AlignedSent) – A sentence in the source language and its counterpart sentence in the target language
-
prob_alignment_point
(s, t)[source]¶ Probability that word
t
in the target sentence is aligned to words
in the source sentence
-
prob_all_alignments
(src_sentence, trg_sentence)[source]¶ Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t
Each entry in the return value represents the contribution to the total alignment probability by the target word t.
To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.
Returns: Probability of t for all s in src_sentence
Return type: dict(str): float
-
prob_t_a_given_s
(alignment_info)[source]¶ Probability of target sentence and an alignment given the source sentence
-
nltk.translate.ibm2 module¶
Lexical translation model that considers word order.
IBM Model 2 improves on Model 1 by accounting for word order. An alignment probability is introduced, a(i | j,l,m), which predicts a source word position, given its aligned target word’s position.
The EM algorithm used in Model 2 is: E step - In the training data, collect counts, weighted by prior
probabilities. (a) count how many times a source language word is translated
into a target language word
- count how many times a particular position in the source sentence is aligned to a particular position in the target sentence
M step - Estimate new probabilities based on the counts from the E step
Notations: i: Position in the source sentence
Valid values are 0 (for NULL), 1, 2, …, length of source sentence
- j: Position in the target sentence
- Valid values are 1, 2, …, length of target sentence
l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
-
class
nltk.translate.ibm2.
IBMModel2
(sentence_aligned_corpus, iterations, probability_tables=None)[source]¶ Bases:
nltk.translate.ibm_model.IBMModel
Lexical translation model that considers word order
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'ist', 'ja', 'groß'], ['the', 'house', 'is', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book']))
>>> ibm2 = IBMModel2(bitext, 5)
>>> print(round(ibm2.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm2.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm2.translation_table['buch'][None], 3)) 0.0 >>> print(round(ibm2.translation_table['ja'][None], 3)) 0.0
>>> print(ibm2.alignment_table[1][1][2][2]) 0.938... >>> print(round(ibm2.alignment_table[1][2][2][2], 3)) 0.0 >>> print(round(ibm2.alignment_table[2][2][4][5], 3)) 1.0
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, 2), (4, 3)])
-
align
(sentence_pair)[source]¶ Determines the best word alignment for one sentence pair from the corpus that the model was trained on.
The best alignment will be set in
sentence_pair
when the method returns. In contrast with the internal implementation of IBM models, the word indices in theAlignment
are zero- indexed, not one-indexed.Parameters: sentence_pair (AlignedSent) – A sentence in the source language and its counterpart sentence in the target language
-
prob_alignment_point
(i, j, src_sentence, trg_sentence)[source]¶ Probability that position j in
trg_sentence
is aligned to position i in thesrc_sentence
-
prob_all_alignments
(src_sentence, trg_sentence)[source]¶ Computes the probability of all possible word alignments, expressed as a marginal distribution over target words t
Each entry in the return value represents the contribution to the total alignment probability by the target word t.
To obtain probability(alignment | src_sentence, trg_sentence), simply sum the entries in the return value.
Returns: Probability of t for all s in src_sentence
Return type: dict(str): float
-
prob_t_a_given_s
(alignment_info)[source]¶ Probability of target sentence and an alignment given the source sentence
-
nltk.translate.ibm3 module¶
Translation model that considers how a word can be aligned to multiple words in another language.
IBM Model 3 improves on Model 2 by directly modeling the phenomenon where a word in one language may be translated into zero or more words in another. This is expressed by the fertility probability, n(phi | source word).
If a source word translates into more than one word, it is possible to generate sentences that have the same alignment in multiple ways. This is modeled by a distortion step. The distortion probability, d(j|i,l,m), predicts a target word position, given its aligned source word’s position. The distortion probability replaces the alignment probability of Model 2.
The fertility probability is not applicable for NULL. Target words that align to NULL are assumed to be distributed uniformly in the target sentence. The existence of these words is modeled by p1, the probability that a target word produced by a real source word requires another target word that is produced by NULL.
The EM algorithm used in Model 3 is: E step - In the training data, collect counts, weighted by prior
probabilities. (a) count how many times a source language word is translated
into a target language word
- count how many times a particular position in the target sentence is aligned to a particular position in the source sentence
- count how many times a source word is aligned to phi number of target words
- count how many times NULL is aligned to a target word
M step - Estimate new probabilities based on the counts from the E step
Because there are too many possible alignments, only the most probable ones are considered. First, the best alignment is determined using prior probabilities. Then, a hill climbing approach is used to find other good candidates.
Notations: i: Position in the source sentence
Valid values are 0 (for NULL), 1, 2, …, length of source sentence
- j: Position in the target sentence
- Valid values are 1, 2, …, length of target sentence
l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language phi: Fertility, the number of target words produced by a source word p1: Probability that a target word produced by a source word is
accompanied by another target word that is aligned to NULL
p0: 1 - p1
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
-
class
nltk.translate.ibm3.
IBMModel3
(sentence_aligned_corpus, iterations, probability_tables=None)[source]¶ Bases:
nltk.translate.ibm_model.IBMModel
Translation model that considers how a word can be aligned to multiple words in another language
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize']))
>>> ibm3 = IBMModel3(bitext, 5)
>>> print(round(ibm3.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm3.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm3.translation_table['ja'][None], 3)) 1.0
>>> print(round(ibm3.distortion_table[1][1][2][2], 3)) 1.0 >>> print(round(ibm3.distortion_table[1][2][2][2], 3)) 0.0 >>> print(round(ibm3.distortion_table[2][2][4][5], 3)) 0.75
>>> print(round(ibm3.fertility_table[2]['summarize'], 3)) 1.0 >>> print(round(ibm3.fertility_table[1]['book'], 3)) 1.0
>>> print(ibm3.p1) 0.054...
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
-
prob_t_a_given_s
(alignment_info)[source]¶ Probability of target sentence and an alignment given the source sentence
-
-
class
nltk.translate.ibm3.
Model3Counts
[source]¶ Bases:
nltk.translate.ibm_model.Counts
Data object to store counts of various parameters during training. Includes counts for distortion.
nltk.translate.ibm4 module¶
Translation model that reorders output words based on their type and distance from other related words in the output sentence.
IBM Model 4 improves the distortion model of Model 3, motivated by the observation that certain words tend to be re-ordered in a predictable way relative to one another. For example, <adjective><noun> in English usually has its order flipped as <noun><adjective> in French.
Model 4 requires words in the source and target vocabularies to be categorized into classes. This can be linguistically driven, like parts of speech (adjective, nouns, prepositions, etc). Word classes can also be obtained by statistical methods. The original IBM Model 4 uses an information theoretic approach to group words into 50 classes for each vocabulary.
Terminology: Cept:
A source word with non-zero fertility i.e. aligned to one or more target words.
- Tablet:
- The set of target word(s) aligned to a cept.
- Head of cept:
- The first word of the tablet of that cept.
- Center of cept:
- The average position of the words in that cept’s tablet. If the value is not an integer, the ceiling is taken. For example, for a tablet with words in positions 2, 5, 6 in the target sentence, the center of the corresponding cept is ceil((2 + 5 + 6) / 3) = 5
- Displacement:
- For a head word, defined as (position of head word - position of previous cept’s center). Can be positive or negative. For a non-head word, defined as (position of non-head word - position of previous word in the same tablet). Always positive, because successive words in a tablet are assumed to appear to the right of the previous word.
In contrast to Model 3 which reorders words in a tablet independently of other words, Model 4 distinguishes between three cases. (1) Words generated by NULL are distributed uniformly. (2) For a head word t, its position is modeled by the probability
d_head(displacement | word_class_s(s),word_class_t(t)), where s is the previous cept, and word_class_s and word_class_t maps s and t to a source and target language word class respectively.
- For a non-head word t, its position is modeled by the probability d_non_head(displacement | word_class_t(t))
The EM algorithm used in Model 4 is: E step - In the training data, collect counts, weighted by prior
probabilities. (a) count how many times a source language word is translated
into a target language word
- for a particular word class, count how many times a head word is located at a particular displacement from the previous cept’s center
- for a particular word class, count how many times a non-head word is located at a particular displacement from the previous target word
- count how many times a source word is aligned to phi number of target words
- count how many times NULL is aligned to a target word
M step - Estimate new probabilities based on the counts from the E step
Like Model 3, there are too many possible alignments to consider. Thus, a hill climbing approach is used to sample good candidates.
Notations: i: Position in the source sentence
Valid values are 0 (for NULL), 1, 2, …, length of source sentence
- j: Position in the target sentence
- Valid values are 1, 2, …, length of target sentence
l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language phi: Fertility, the number of target words produced by a source word p1: Probability that a target word produced by a source word is
accompanied by another target word that is aligned to NULL
p0: 1 - p1 dj: Displacement, Δj
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
-
class
nltk.translate.ibm4.
IBMModel4
(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]¶ Bases:
nltk.translate.ibm_model.IBMModel
Translation model that reorders output words based on their type and their distance from other related words in the output sentence
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize'])) >>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 } >>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm4 = IBMModel4(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm4.translation_table['buch']['book'], 3)) 1.0 >>> print(round(ibm4.translation_table['das']['book'], 3)) 0.0 >>> print(round(ibm4.translation_table['ja'][None], 3)) 1.0
>>> print(round(ibm4.head_distortion_table[1][0][1], 3)) 1.0 >>> print(round(ibm4.head_distortion_table[2][0][1], 3)) 0.0 >>> print(round(ibm4.non_head_distortion_table[3][6], 3)) 0.5
>>> print(round(ibm4.fertility_table[2]['summarize'], 3)) 1.0 >>> print(round(ibm4.fertility_table[1]['book'], 3)) 1.0
>>> print(ibm4.p1) 0.033...
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
-
prob_t_a_given_s
(alignment_info)[source]¶ Probability of target sentence and an alignment given the source sentence
-
-
class
nltk.translate.ibm4.
Model4Counts
[source]¶ Bases:
nltk.translate.ibm_model.Counts
Data object to store counts of various parameters during training. Includes counts for distortion.
nltk.translate.ibm5 module¶
Translation model that keeps track of vacant positions in the target sentence to decide where to place translated words.
Translation can be viewed as a process where each word in the source
sentence is stepped through sequentially, generating translated words
for each source word. The target sentence can be viewed as being made
up of m
empty slots initially, which gradually fill up as generated
words are placed in them.
Models 3 and 4 use distortion probabilities to decide how to place translated words. For simplicity, these models ignore the history of which slots have already been occupied with translated words. Consider the placement of the last translated word: there is only one empty slot left in the target sentence, so the distortion probability should be 1.0 for that position and 0.0 everywhere else. However, the distortion probabilities for Models 3 and 4 are set up such that all positions are under consideration.
IBM Model 5 fixes this deficiency by accounting for occupied slots during translation. It introduces the vacancy function v(j), the number of vacancies up to, and including, position j in the target sentence.
Terminology: Maximum vacancy:
The number of valid slots that a word can be placed in. This is not necessarily the same as the number of vacant slots. For example, if a tablet contains more than one word, the head word cannot be placed at the last vacant slot because there will be no space for the other words in the tablet. The number of valid slots has to take into account the length of the tablet. Non-head words cannot be placed before the head word, so vacancies to the left of the head word are ignored.
- Vacancy difference:
- For a head word: (v(j) - v(center of previous cept)) Can be positive or negative. For a non-head word: (v(j) - v(position of previously placed word)) Always positive, because successive words in a tablet are assumed to appear to the right of the previous word.
Positioning of target words fall under three cases: (1) Words generated by NULL are distributed uniformly (2) For a head word t, its position is modeled by the probability
v_head(dv | max_v,word_class_t(t))
- For a non-head word t, its position is modeled by the probability v_non_head(dv | max_v,word_class_t(t))
dv and max_v are defined differently for head and non-head words.
The EM algorithm used in Model 5 is: E step - In the training data, collect counts, weighted by prior
probabilities. (a) count how many times a source language word is translated
into a target language word
- for a particular word class and maximum vacancy, count how many times a head word and the previous cept’s center have a particular difference in number of vacancies
- for a particular word class and maximum vacancy, count how many times a non-head word and the previous target word have a particular difference in number of vacancies
- count how many times a source word is aligned to phi number of target words
- count how many times NULL is aligned to a target word
M step - Estimate new probabilities based on the counts from the E step
Like Model 4, there are too many possible alignments to consider. Thus, a hill climbing approach is used to sample good candidates. In addition, pruning is used to weed out unlikely alignments based on Model 4 scores.
Notations: i: Position in the source sentence
Valid values are 0 (for NULL), 1, 2, …, length of source sentence
- j: Position in the target sentence
- Valid values are 1, 2, …, length of target sentence
l: Number of words in the source sentence, excluding NULL m: Number of words in the target sentence s: A word in the source language t: A word in the target language phi: Fertility, the number of target words produced by a source word p1: Probability that a target word produced by a source word is
accompanied by another target word that is aligned to NULL
p0: 1 - p1 max_v: Maximum vacancy dv: Vacancy difference, Δv
The definition of v_head here differs from GIZA++, section 4.7 of [Brown et al., 1993], and [Koehn, 2010]. In the latter cases, v_head is v_head(v(j) | v(center of previous cept),max_v,word_class(t)).
Here, we follow appendix B of [Brown et al., 1993] and combine v(j) with v(center of previous cept) to obtain dv: v_head(v(j) - v(center of previous cept) | max_v,word_class(t)).
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
-
class
nltk.translate.ibm5.
IBMModel5
(sentence_aligned_corpus, iterations, source_word_classes, target_word_classes, probability_tables=None)[source]¶ Bases:
nltk.translate.ibm_model.IBMModel
Translation model that keeps track of vacant positions in the target sentence to decide where to place translated words
>>> bitext = [] >>> bitext.append(AlignedSent(['klein', 'ist', 'das', 'haus'], ['the', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus', 'war', 'ja', 'groß'], ['the', 'house', 'was', 'big'])) >>> bitext.append(AlignedSent(['das', 'buch', 'ist', 'ja', 'klein'], ['the', 'book', 'is', 'small'])) >>> bitext.append(AlignedSent(['ein', 'haus', 'ist', 'klein'], ['a', 'house', 'is', 'small'])) >>> bitext.append(AlignedSent(['das', 'haus'], ['the', 'house'])) >>> bitext.append(AlignedSent(['das', 'buch'], ['the', 'book'])) >>> bitext.append(AlignedSent(['ein', 'buch'], ['a', 'book'])) >>> bitext.append(AlignedSent(['ich', 'fasse', 'das', 'buch', 'zusammen'], ['i', 'summarize', 'the', 'book'])) >>> bitext.append(AlignedSent(['fasse', 'zusammen'], ['summarize'])) >>> src_classes = {'the': 0, 'a': 0, 'small': 1, 'big': 1, 'house': 2, 'book': 2, 'is': 3, 'was': 3, 'i': 4, 'summarize': 5 } >>> trg_classes = {'das': 0, 'ein': 0, 'haus': 1, 'buch': 1, 'klein': 2, 'groß': 2, 'ist': 3, 'war': 3, 'ja': 4, 'ich': 5, 'fasse': 6, 'zusammen': 6 }
>>> ibm5 = IBMModel5(bitext, 5, src_classes, trg_classes)
>>> print(round(ibm5.head_vacancy_table[1][1][1], 3)) 1.0 >>> print(round(ibm5.head_vacancy_table[2][1][1], 3)) 0.0 >>> print(round(ibm5.non_head_vacancy_table[3][3][6], 3)) 1.0
>>> print(round(ibm5.fertility_table[2]['summarize'], 3)) 1.0 >>> print(round(ibm5.fertility_table[1]['book'], 3)) 1.0
>>> print(ibm5.p1) 0.033...
>>> test_sentence = bitext[2] >>> test_sentence.words ['das', 'buch', 'ist', 'ja', 'klein'] >>> test_sentence.mots ['the', 'book', 'is', 'small'] >>> test_sentence.alignment Alignment([(0, 0), (1, 1), (2, 2), (3, None), (4, 3)])
-
MIN_SCORE_FACTOR
= 0.2¶ Alignments with scores below this factor are pruned during sampling
-
hillclimb
(alignment_info, j_pegged=None)[source]¶ Starting from the alignment in
alignment_info
, look at neighboring alignments iteratively for the best one, according to Model 4Note that Model 4 scoring is used instead of Model 5 because the latter is too expensive to compute.
There is no guarantee that the best alignment in the alignment space will be found, because the algorithm might be stuck in a local maximum.
Parameters: j_pegged (int) – If specified, the search will be constrained to alignments where j_pegged
remains unchangedReturns: The best alignment found from hill climbing Return type: AlignmentInfo
-
prob_t_a_given_s
(alignment_info)[source]¶ Probability of target sentence and an alignment given the source sentence
-
prune
(alignment_infos)[source]¶ Removes alignments from
alignment_infos
that have substantially lower Model 4 scores than the best alignmentReturns: Pruned alignments Return type: set(AlignmentInfo)
-
sample
(sentence_pair)[source]¶ Sample the most probable alignments from the entire alignment space according to Model 4
Note that Model 4 scoring is used instead of Model 5 because the latter is too expensive to compute.
First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a IBM Model 4. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point. Finally, prune alignments that have substantially lower Model 4 scores than the best alignment.
Parameters: sentence_pair (AlignedSent) – Source and target language sentence pair to generate a sample of alignments from Returns: A set of best alignments represented by their AlignmentInfo
and the best alignment of the set for convenienceReturn type: set(AlignmentInfo), AlignmentInfo
-
-
class
nltk.translate.ibm5.
Model5Counts
[source]¶ Bases:
nltk.translate.ibm_model.Counts
Data object to store counts of various parameters during training. Includes counts for vacancies.
-
update_vacancy
(count, alignment_info, i, trg_classes, slots)[source]¶ Parameters: - count – Value to add to the vacancy counts
- alignment_info – Alignment under consideration
- i – Source word position under consideration
- trg_classes – Target word classes
- slots – Vacancy states of the slots in the target sentence. Output parameter that will be modified as new words are placed in the target sentence.
-
nltk.translate.ibm_model module¶
Common methods and classes for all IBM models. See IBMModel1
,
IBMModel2
, IBMModel3
, IBMModel4
, and IBMModel5
for specific implementations.
The IBM models are a series of generative models that learn lexical translation probabilities, p(target language word|source language word), given a sentence-aligned parallel corpus.
The models increase in sophistication from model 1 to 5. Typically, the output of lower models is used to seed the higher models. All models use the Expectation-Maximization (EM) algorithm to learn various probability tables.
Words in a sentence are one-indexed. The first word of a sentence has position 1, not 0. Index 0 is reserved in the source sentence for the NULL token. The concept of position does not apply to NULL, but it is indexed at 0 by convention.
Each target word is aligned to exactly one source word or the NULL token.
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
Peter E Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19 (2), 263-311.
-
class
nltk.translate.ibm_model.
AlignmentInfo
(alignment, src_sentence, trg_sentence, cepts)[source]¶ Bases:
object
Helper data object for training IBM Models 3 and up
Read-only. For a source sentence and its counterpart in the target language, this class holds information about the sentence pair’s alignment, cepts, and fertility.
Warning: Alignments are one-indexed here, in contrast to nltk.translate.Alignment and AlignedSent, which are zero-indexed This class is not meant to be used outside of IBM models.
-
alignment
= None¶ tuple(int): Alignment function.
alignment[j]
is the position in the source sentence that is aligned to the position j in the target sentence.
-
center_of_cept
(i)[source]¶ Returns: The ceiling of the average positions of the words in the tablet of cept i
, or 0 ifi
is None
-
cepts
= None¶ list(list(int)): The positions of the target words, in ascending order, aligned to a source word position. For example, cepts[4] = (2, 3, 7) means that words in positions 2, 3 and 7 of the target sentence are aligned to the word in position 4 of the source sentence
-
is_head_word
(j)[source]¶ Returns: Whether the word in position j
of the target sentence is a head word
-
previous_in_tablet
(j)[source]¶ Returns: The position of the previous word that is in the same tablet as j
, or None ifj
is the first word of the tablet
-
score
= None¶ float: Optional. Probability of alignment, as defined by the IBM model that assesses this alignment
-
src_sentence
= None¶ tuple(str): Source sentence referred to by this object. Should include NULL token (None) in index 0.
-
trg_sentence
= None¶ tuple(str): Target sentence referred to by this object. Should have a dummy element in index 0 so that the first word starts from index 1.
-
-
class
nltk.translate.ibm_model.
Counts
[source]¶ Bases:
object
Data object to store counts of various parameters during training
-
class
nltk.translate.ibm_model.
IBMModel
(sentence_aligned_corpus)[source]¶ Bases:
object
Abstract base class for all IBM models
-
MIN_PROB
= 1e-12¶
-
best_model2_alignment
(sentence_pair, j_pegged=None, i_pegged=0)[source]¶ Finds the best alignment according to IBM Model 2
Used as a starting point for hill climbing in Models 3 and above, because it is easier to compute than the best alignments in higher models
Parameters: - sentence_pair (AlignedSent) – Source and target language sentence pair to be word-aligned
- j_pegged (int) – If specified, the alignment point of j_pegged will be fixed to i_pegged
- i_pegged (int) – Alignment point to j_pegged
-
hillclimb
(alignment_info, j_pegged=None)[source]¶ Starting from the alignment in
alignment_info
, look at neighboring alignments iteratively for the best oneThere is no guarantee that the best alignment in the alignment space will be found, because the algorithm might be stuck in a local maximum.
Parameters: j_pegged (int) – If specified, the search will be constrained to alignments where j_pegged
remains unchangedReturns: The best alignment found from hill climbing Return type: AlignmentInfo
-
neighboring
(alignment_info, j_pegged=None)[source]¶ Determine the neighbors of
alignment_info
, obtained by moving or swapping one alignment pointParameters: j_pegged (int) – If specified, neighbors that have a different alignment point from j_pegged will not be considered Returns: A set neighboring alignments represented by their AlignmentInfo
Return type: set(AlignmentInfo)
-
prob_t_a_given_s
(alignment_info)[source]¶ Probability of target sentence and an alignment given the source sentence
All required information is assumed to be in
alignment_info
and self.Derived classes should override this method
-
sample
(sentence_pair)[source]¶ Sample the most probable alignments from the entire alignment space
First, determine the best alignment according to IBM Model 2. With this initial alignment, use hill climbing to determine the best alignment according to a higher IBM Model. Add this alignment and its neighbors to the sample set. Repeat this process with other initial alignments obtained by pegging an alignment point.
Hill climbing may be stuck in a local maxima, hence the pegging and trying out of different alignments.
Parameters: sentence_pair (AlignedSent) – Source and target language sentence pair to generate a sample of alignments from Returns: A set of best alignments represented by their AlignmentInfo
and the best alignment of the set for convenienceReturn type: set(AlignmentInfo), AlignmentInfo
-
-
nltk.translate.ibm_model.
longest_target_sentence_length
(sentence_aligned_corpus)[source]¶ Parameters: sentence_aligned_corpus (list(AlignedSent)) – Parallel corpus under consideration Returns: Number of words in the longest target language sentence of sentence_aligned_corpus
nltk.translate.metrics module¶
-
nltk.translate.metrics.
alignment_error_rate
(reference, hypothesis, possible=None)[source]¶ Return the Alignment Error Rate (AER) of an alignment with respect to a “gold standard” reference alignment. Return an error rate between 0.0 (perfect alignment) and 1.0 (no alignment).
>>> from nltk.translate import Alignment >>> ref = Alignment([(0, 0), (1, 1), (2, 2)]) >>> test = Alignment([(0, 0), (1, 2), (2, 1)]) >>> alignment_error_rate(ref, test) 0.6666666666666667
Parameters: Return type: float or None
nltk.translate.nist_score module¶
NIST score implementation.
-
nltk.translate.nist_score.
corpus_nist
(list_of_references, hypotheses, n=5)[source]¶ Calculate a single corpus-level NIST score (aka. system-level BLEU) for all the hypotheses and their respective references.
Parameters: - references (list(list(list(str)))) – a corpus of lists of reference sentences, w.r.t. hypotheses
- hypotheses (list(list(str))) – a list of hypothesis sentences
- n (int) – highest n-gram order
-
nltk.translate.nist_score.
nist_length_penalty
(ref_len, hyp_len)[source]¶ Calculates the NIST length penalty, from Eq. 3 in Doddington (2002)
penalty = exp( beta * log( min( len(hyp)/len(ref) , 1.0 )))where,
beta is chosen to make the brevity penalty factor = 0.5 when the no. of words in the system output (hyp) is 2/3 of the average no. of words in the reference translation (ref)The NIST penalty is different from BLEU’s such that it minimize the impact of the score of small variations in the length of a translation. See Fig. 4 in Doddington (2002)
-
nltk.translate.nist_score.
sentence_nist
(references, hypothesis, n=5)[source]¶ Calculate NIST score from George Doddington. 2002. “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics.” Proceedings of HLT. Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=1289189.1289273
DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score. The official script used by NIST to compute BLEU and NIST score is mteval-14.pl. The main differences are:
- BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.
- NIST has a different brevity penalty
- NIST score from mteval-14.pl has a self-contained tokenizer
- Note: The mteval-14.pl includes a smoothing function for BLEU score that is NOT
- used in the NIST score computation.
>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops', ... 'forever', 'hearing', 'the', 'activity', 'guidebook', ... 'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', ... 'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party']
>>> sentence_nist([reference1, reference2, reference3], hypothesis1) 3.3709...
>>> sentence_nist([reference1, reference2, reference3], hypothesis2) 1.4619...
Parameters: - references (list(list(str))) – reference sentences
- hypothesis (list(str)) – a hypothesis sentence
- n (int) – highest n-gram order
nltk.translate.phrase_based module¶
-
nltk.translate.phrase_based.
extract
(f_start, f_end, e_start, e_end, alignment, f_aligned, srctext, trgtext, srclen, trglen, max_phrase_length)[source]¶ This function checks for alignment point consistency and extracts phrases using the chunk of consistent phrases.
A phrase pair (e, f ) is consistent with an alignment A if and only if:
No English words in the phrase pair are aligned to words outside it.
∀e i ∈ e, (e i , f j ) ∈ A ⇒ f j ∈ f
No Foreign words in the phrase pair are aligned to words outside it.
∀f j ∈ f , (e i , f j ) ∈ A ⇒ e i ∈ e
The phrase pair contains at least one alignment point.
∃e i ∈ e ̄ , f j ∈ f ̄ s.t. (e i , f j ) ∈ A
Parameters: - f_start (int) – Starting index of the possible foreign language phrases
- f_end (int) – Starting index of the possible foreign language phrases
- e_start (int) – Starting index of the possible source language phrases
- e_end (int) – Starting index of the possible source language phrases
- srctext (list) – The source language tokens, a list of string.
- trgtext (list) – The target language tokens, a list of string.
- srclen (int) – The number of tokens in the source language tokens.
- trglen (int) – The number of tokens in the target language tokens.
-
nltk.translate.phrase_based.
phrase_extraction
(srctext, trgtext, alignment, max_phrase_length=0)[source]¶ Phrase extraction algorithm extracts all consistent phrase pairs from a word-aligned sentence pair.
The idea is to loop over all possible source language (e) phrases and find the minimal foreign phrase (f) that matches each of them. Matching is done by identifying all alignment points for the source phrase and finding the shortest foreign phrase that includes all the foreign counterparts for the source words.
In short, a phrase alignment has to (a) contain all alignment points for all covered words (b) contain at least one alignment point
>>> srctext = "michael assumes that he will stay in the house" >>> trgtext = "michael geht davon aus , dass er im haus bleibt" >>> alignment = [(0,0), (1,1), (1,2), (1,3), (2,5), (3,6), (4,9), ... (5,9), (6,7), (7,7), (8,8)] >>> phrases = phrase_extraction(srctext, trgtext, alignment) >>> for i in sorted(phrases): ... print(i) ... ((0, 1), (0, 1), 'michael', 'michael') ((0, 2), (0, 4), 'michael assumes', 'michael geht davon aus') ((0, 2), (0, 4), 'michael assumes', 'michael geht davon aus ,') ((0, 3), (0, 6), 'michael assumes that', 'michael geht davon aus , dass') ((0, 4), (0, 7), 'michael assumes that he', 'michael geht davon aus , dass er') ((0, 9), (0, 10), 'michael assumes that he will stay in the house', 'michael geht davon aus , dass er im haus bleibt') ((1, 2), (1, 4), 'assumes', 'geht davon aus') ((1, 2), (1, 4), 'assumes', 'geht davon aus ,') ((1, 3), (1, 6), 'assumes that', 'geht davon aus , dass') ((1, 4), (1, 7), 'assumes that he', 'geht davon aus , dass er') ((1, 9), (1, 10), 'assumes that he will stay in the house', 'geht davon aus , dass er im haus bleibt') ((2, 3), (5, 6), 'that', ', dass') ((2, 3), (5, 6), 'that', 'dass') ((2, 4), (5, 7), 'that he', ', dass er') ((2, 4), (5, 7), 'that he', 'dass er') ((2, 9), (5, 10), 'that he will stay in the house', ', dass er im haus bleibt') ((2, 9), (5, 10), 'that he will stay in the house', 'dass er im haus bleibt') ((3, 4), (6, 7), 'he', 'er') ((3, 9), (6, 10), 'he will stay in the house', 'er im haus bleibt') ((4, 6), (9, 10), 'will stay', 'bleibt') ((4, 9), (7, 10), 'will stay in the house', 'im haus bleibt') ((6, 8), (7, 8), 'in the', 'im') ((6, 9), (7, 9), 'in the house', 'im haus') ((8, 9), (8, 9), 'house', 'haus')
Parameters: - srctext (str) – The sentence string from the source language.
- trgtext (str) – The sentence string from the target language.
- alignment (str) – The word alignment outputs as list of tuples, where the first elements of tuples are the source words’ indices and second elements are the target words’ indices. This is also the output format of nltk.translate.ibm1
- max_phrase_length (int) – maximal phrase length, if 0 or not specified it is set to a length of the longer sentence (srctext or trgtext).
Return type: list(tuple)
Returns: A list of tuples, each element in a list is a phrase and each phrase is a tuple made up of (i) its source location, (ii) its target location, (iii) the source phrase and (iii) the target phrase. The phrase list of tuples represents all the possible phrases extracted from the word alignments.
nltk.translate.ribes_score module¶
RIBES score implementation
-
nltk.translate.ribes_score.
corpus_ribes
(list_of_references, hypotheses, alpha=0.25, beta=0.1)[source]¶ This function “calculates RIBES for a system output (hypothesis) with multiple references, and returns “best” score among multi-references and individual scores. The scores are corpus-wise, i.e., averaged by the number of sentences.” (c.f. RIBES version 1.03.1 code).
Different from BLEU’s micro-average precision, RIBES calculates the macro-average precision by averaging the best RIBES score for each pair of hypothesis and its corresponding references
>>> hyp1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which', ... 'ensures', 'that', 'the', 'military', 'always', ... 'obeys', 'the', 'commands', 'of', 'the', 'party'] >>> ref1a = ['It', 'is', 'a', 'guide', 'to', 'action', 'that', ... 'ensures', 'that', 'the', 'military', 'will', 'forever', ... 'heed', 'Party', 'commands'] >>> ref1b = ['It', 'is', 'the', 'guiding', 'principle', 'which', ... 'guarantees', 'the', 'military', 'forces', 'always', ... 'being', 'under', 'the', 'command', 'of', 'the', 'Party'] >>> ref1c = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the', ... 'army', 'always', 'to', 'heed', 'the', 'directions', ... 'of', 'the', 'party']
>>> hyp2 = ['he', 'read', 'the', 'book', 'because', 'he', 'was', ... 'interested', 'in', 'world', 'history'] >>> ref2a = ['he', 'was', 'interested', 'in', 'world', 'history', ... 'because', 'he', 'read', 'the', 'book']
>>> list_of_references = [[ref1a, ref1b, ref1c], [ref2a]] >>> hypotheses = [hyp1, hyp2] >>> round(corpus_ribes(list_of_references, hypotheses),4) 0.3597
Parameters: - references (list(list(list(str)))) – a corpus of lists of reference sentences, w.r.t. hypotheses
- hypotheses (list(list(str))) – a list of hypothesis sentences
- alpha (float) – hyperparameter used as a prior for the unigram precision.
- beta (float) – hyperparameter used as a prior for the brevity penalty.
Returns: The best ribes score from one of the references.
Return type: float
-
nltk.translate.ribes_score.
find_increasing_sequences
(worder)[source]¶ Given the worder list, this function groups monotonic +1 sequences.
>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5] >>> list(find_increasing_sequences(worder)) [(7, 8, 9, 10), (0, 1, 2, 3, 4, 5)]
Parameters: - worder – The worder list output from word_rank_alignment
- type – list(int)
-
nltk.translate.ribes_score.
kendall_tau
(worder, normalize=True)[source]¶ Calculates the Kendall’s Tau correlation coefficient given the worder list of word alignments from word_rank_alignment(), using the formula:
tau = 2 * num_increasing_pairs / num_possible pairs -1Note that the no. of increasing pairs can be discontinuous in the worder list and each each increasing sequence can be tabulated as choose(len(seq), 2) no. of increasing pairs, e.g.
>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5] >>> number_possible_pairs = choose(len(worder), 2) >>> round(kendall_tau(worder, normalize=False),3) -0.236 >>> round(kendall_tau(worder),3) 0.382
Parameters: - worder (list(int)) – The worder list output from word_rank_alignment
- normalize (boolean) – Flag to indicate normalization
Returns: The Kendall’s Tau correlation coefficient.
Return type: float
-
nltk.translate.ribes_score.
position_of_ngram
(ngram, sentence)[source]¶ This function returns the position of the first instance of the ngram appearing in a sentence.
Note that one could also use string as follows but the code is a little convoluted with type casting back and forth:
char_pos = ‘ ‘.join(sent)[:’ ‘.join(sent).index(‘ ‘.join(ngram))] word_pos = char_pos.count(‘ ‘)Another way to conceive this is:
- return next(i for i, ng in enumerate(ngrams(sentence, len(ngram)))
- if ng == ngram)
Parameters: - ngram (tuple) – The ngram that needs to be searched
- sentence (list(str)) – The list of tokens to search from.
-
nltk.translate.ribes_score.
sentence_ribes
(references, hypothesis, alpha=0.25, beta=0.1)[source]¶ The RIBES (Rank-based Intuitive Bilingual Evaluation Score) from Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh and Hajime Tsukada. 2010. “Automatic Evaluation of Translation Quality for Distant Language Pairs”. In Proceedings of EMNLP. http://www.aclweb.org/anthology/D/D10/D10-1092.pdf
The generic RIBES scores used in shared task, e.g. Workshop for Asian Translation (WAT) uses the following RIBES calculations:
RIBES = kendall_tau * (alpha**p1) * (beta**bp)Please note that this re-implementation differs from the official RIBES implementation and though it emulates the results as describe in the original paper, there are further optimization implemented in the official RIBES script.
Users are encouraged to use the official RIBES script instead of this implementation when evaluating your machine translation system. Refer to http://www.kecl.ntt.co.jp/icl/lirg/ribes/ for the official script.
Parameters: - references – a list of reference sentences
- hypothesis (list(str)) – a hypothesis sentence
- alpha (float) – hyperparameter used as a prior for the unigram precision.
- beta (float) – hyperparameter used as a prior for the brevity penalty.
Returns: The best ribes score from one of the references.
Return type: float
-
nltk.translate.ribes_score.
spearman_rho
(worder, normalize=True)[source]¶ Calculates the Spearman’s Rho correlation coefficient given the worder list of word alignment from word_rank_alignment(), using the formula:
rho = 1 - sum(d**2) / choose(len(worder)+1, 3)Given that d is the sum of difference between the worder list of indices and the original word indices from the reference sentence.
Using the (H0,R0) and (H5, R5) example from the paper
>>> worder = [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5] >>> round(spearman_rho(worder, normalize=False), 3) -0.591 >>> round(spearman_rho(worder), 3) 0.205
Parameters: - worder – The worder list output from word_rank_alignment
- type – list(int)
-
nltk.translate.ribes_score.
word_rank_alignment
(reference, hypothesis, character_based=False)[source]¶ This is the word rank alignment algorithm described in the paper to produce the worder list, i.e. a list of word indices of the hypothesis word orders w.r.t. the list of reference words.
Below is (H0, R0) example from the Isozaki et al. 2010 paper, note the examples are indexed from 1 but the results here are indexed from 0:
>>> ref = str('he was interested in world history because he ' ... 'read the book').split() >>> hyp = str('he read the book because he was interested in world ' ... 'history').split() >>> word_rank_alignment(ref, hyp) [7, 8, 9, 10, 6, 0, 1, 2, 3, 4, 5]
The (H1, R1) example from the paper, note the 0th index:
>>> ref = 'John hit Bob yesterday'.split() >>> hyp = 'Bob hit John yesterday'.split() >>> word_rank_alignment(ref, hyp) [2, 1, 0, 3]
Here is the (H2, R2) example from the paper, note the 0th index here too:
>>> ref = 'the boy read the book'.split() >>> hyp = 'the book was read by the boy'.split() >>> word_rank_alignment(ref, hyp) [3, 4, 2, 0, 1]
Parameters: - reference (list(str)) – a reference sentence
- hypothesis (list(str)) – a hypothesis sentence
nltk.translate.stack_decoder module¶
A decoder that uses stacks to implement phrase-based translation.
In phrase-based translation, the source sentence is segmented into phrases of one or more words, and translations for those phrases are used to build the target sentence.
Hypothesis data structures are used to keep track of the source words translated so far and the partial output. A hypothesis can be expanded by selecting an untranslated phrase, looking up its translation in a phrase table, and appending that translation to the partial output. Translation is complete when a hypothesis covers all source words.
The search space is huge because the source sentence can be segmented in different ways, the source phrases can be selected in any order, and there could be multiple translations for the same source phrase in the phrase table. To make decoding tractable, stacks are used to limit the number of candidate hypotheses by doing histogram and/or threshold pruning.
Hypotheses with the same number of words translated are placed in the same stack. In histogram pruning, each stack has a size limit, and the hypothesis with the lowest score is removed when the stack is full. In threshold pruning, hypotheses that score below a certain threshold of the best hypothesis in that stack are removed.
Hypothesis scoring can include various factors such as phrase translation probability, language model probability, length of translation, cost of remaining words to be translated, and so on.
References: Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York.
-
class
nltk.translate.stack_decoder.
StackDecoder
(phrase_table, language_model)[source]¶ Bases:
object
Phrase-based stack decoder for machine translation
>>> from nltk.translate import PhraseTable >>> phrase_table = PhraseTable() >>> phrase_table.add(('niemand',), ('nobody',), log(0.8)) >>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2)) >>> phrase_table.add(('erwartet',), ('expects',), log(0.8)) >>> phrase_table.add(('erwartet',), ('expecting',), log(0.2)) >>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1)) >>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8)) >>> phrase_table.add(('!',), ('!',), log(0.8))
>>> # nltk.model should be used here once it is implemented >>> from collections import defaultdict >>> language_prob = defaultdict(lambda: -999.0) >>> language_prob[('nobody',)] = log(0.5) >>> language_prob[('expects',)] = log(0.4) >>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2) >>> language_prob[('!',)] = log(0.1) >>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()
>>> stack_decoder = StackDecoder(phrase_table, language_model)
>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!']) ['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']
-
beam_threshold
= None¶ - float: Hypotheses that score below this factor of the best
- hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0.
-
compute_future_scores
(src_sentence)[source]¶ Determines the approximate scores for translating every subsequence in
src_sentence
Future scores can be used a look-ahead to determine the difficulty of translating the remaining parts of a src_sentence.
Returns: Scores of subsequences referenced by their start and end positions. For example, result[2][5] is the score of the subsequence covering positions 2, 3, and 4. :rtype: dict(int: (dict(int): float))
-
distortion_factor
¶ - float: Amount of reordering of source phrases.
- Lower values favour monotone translation, suitable when word order is similar for both source and target languages. Value between 0.0 and 1.0. Default 0.5.
-
expansion_score
(hypothesis, translation_option, src_phrase_span)[source]¶ Calculate the score of expanding
hypothesis
withtranslation_option
Parameters: - hypothesis (_Hypothesis) – Hypothesis being expanded
- translation_option (PhraseTableEntry) – Information about the proposed expansion
- src_phrase_span (tuple(int, int)) – Word position span of the source phrase
-
find_all_src_phrases
(src_sentence)[source]¶ Finds all subsequences in src_sentence that have a phrase translation in the translation table
Returns: Subsequences that have a phrase translation, represented as a table of lists of end positions. For example, if result[2] is [5, 6, 9], then there are three phrases starting from position 2 in src_sentence
, ending at positions 5, 6, and 9 exclusive. The list of ending positions are in ascending order.Return type: list(list(int))
-
future_score
(hypothesis, future_score_table, sentence_length)[source]¶ Determines the approximate score for translating the untranslated words in
hypothesis
-
stack_size
= None¶ - int: Maximum number of hypotheses to consider in a stack.
- Higher values increase the likelihood of a good translation, but increases processing time.
-
translate
(src_sentence)[source]¶ Parameters: src_sentence (list(str)) – Sentence to be translated Returns: Translated sentence Return type: list(str)
-
static
valid_phrases
(all_phrases_from, hypothesis)[source]¶ Extract phrases from
all_phrases_from
that contains words that have not been translated byhypothesis
Parameters: all_phrases_from (list(list(int))) – Phrases represented by their spans, in the same format as the return value of find_all_src_phrases
Returns: A list of phrases, represented by their spans, that cover untranslated positions. Return type: list(tuple(int, int))
-
word_penalty
= None¶ - float: Influences the translation length exponentially.
- If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied.
-
Module contents¶
Experimental features for machine translation. These interfaces are prone to change.