BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating text similarity, which is often used to evaluate the similarity between machine translation and human translation.
Recently, due to the need to compare the similarity between sentences, in the summer of recommendation by others, I studied the principle of BLEU a bit, and tried the BLEU function provided in NLTK at the same time.
The following will not record too complicated mathematics, just introduce BLEU and how to use NLTK tools. After all, I have always believed that people who can try not to reinvent the wheel can develop programs really efficiently.
Even if they feel very at ease and a sense of accomplishment by writing from scratch.
N-gram
When it comes to BLEU, one has to mention the so-called N-gram. N-gram is a language model that can combine N vocabularies of a sentence to express the characteristics of the sentence. N is the number of words seen in a set of features.
Assume we have the following sentence:
Today is a nice day.
The so-called 1-gram (uni-gram) is the following expression:
['Today', 'is', 'a', 'nice', 'day']
2-gram (bi-gram) is:
[['Today', 'is'],
['is', 'a'],
['a', 'nice'],
['nice', 'day']]
This is how N-gram is expressed.
How To Use BLEU
The calculation method of BLEU is roughly as follows:
If there is a need to calculate BLEU, it is recommended to use the tools in NLTK. If there is no NLTK package, first install it using the following command:
pip3 install nltk
Here is the simplest example:
# coding: utf-8 from nltk.translate import bleu sent_a = 'Today is such a nice day'.split() sent_b = 'Today is such a good day'.split() print(bleu([sent_a], sent_b))
Output:
0.537284965911771
References
- https://www.researchgate.net/figure/Score-Calculation-Formula-in-BLEU-Here-m-is-number-of-words-from-the-candidate-that-are_fig1_224771080
- https://stackoverflow.com/questions/44324681/variation-in-bleu-score