TL; DR :
>>> import nltk >>> hypothesis = ['This', 'is', 'cat'] >>> reference = ['This', 'is', 'a', 'cat'] >>> references = [reference]
(Note: To get a stable version of the implementation of the BLEU score, you must list the latest version of NLTK in the develop branch)
In the long :
Actually, if there is only one link and one hypothesis in your full corpus, then both corpus_bleu() and sentence_bleu() should return the same value as in the above example.
In the code, we see that sentence_bleu is actually a duck type of corpus_bleu :
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): return corpus_bleu([references], [hypothesis], weights, smoothing_function)
And if we look at the parameters for sentence_bleu :
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """" :param references: reference sentences :type references: list(list(str)) :param hypothesis: a hypothesis sentence :type hypothesis: list(str) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The sentence-level BLEU score. :rtype: float """
The input for sentence_bleu links is list(list(str)) .
So, if you have a sentence string, for example. "This is a cat" , you have to fake it to get a list of strings, ["This", "is", "a", "cat"] , and since it allows multiple links, it should be a list of strings, eg. if you have a second link: "This is a cat", your entry in sentence_bleu() will be:
references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ] hypothesis = ["This", "is", "cat"] sentence_bleu(references, hypothesis)
When it comes to the corpus_bleu() parameter list_of_references, basically a list of what sentence_bleu() accepts as links :
def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """ :param references: a corpus of lists of reference sentences, wrt hypotheses :type references: list(list(list(str))) :param hypotheses: a list of hypothesis sentences :type hypotheses: list(list(str)) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The corpus-level BLEU score. :rtype: float """
Besides looking at the doctrine at nltk/translate/bleu_score.py , you can also look at unittest at nltk/test/unit/translate/test_bleu_score.py to learn how to use each of the components in bleu_score.py .
By the way, since sentence_bleu imported as bleu in ( nltk.translate.__init__.py ] ( https://github.com/nltk/nltk/blob/develop/nltk/translate/ init .py # L21 ) using
from nltk.translate import bleu
will be the same as:
from nltk.translate.bleu_score import sentence_bleu
and in code:
>>> from nltk.translate import bleu >>> from nltk.translate.bleu_score import sentence_bleu >>> from nltk.translate.bleu_score import corpus_bleu >>> bleu == sentence_bleu True >>> bleu == corpus_bleu False