NLTK: bleu level versus BLEU supply level at offer level

I imported nltk in python to calculate the BLEU Score on Ubuntu. I understand how the BLEU layer works at the sentence level, but I donโ€™t understand how the BLEU layer works at the package level.

Below is my code for assessing the BLEU level at the case level:

import nltk hypothesis = ['This', 'is', 'cat'] reference = ['This', 'is', 'a', 'cat'] BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1]) print(BLEUscore) 

For some reason, the bleu score is 0 for the above code. I expected the BLEU level at the case level to be at least 0.5.

Here is my code for assessing the level of BLEU at the proposal level

 import nltk hypothesis = ['This', 'is', 'cat'] reference = ['This', 'is', 'a', 'cat'] BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1]) print(BLEUscore) 

Here, the BLEU score at the sentence level is 0.71, which I expect, given the brevity and the missing word "a". However, I do not understand how the BLEU level works at the case level.

Any help would be appreciated.

+7
python machine-learning nlp nltk bleu
source share
2 answers

TL; DR :

 >>> import nltk >>> hypothesis = ['This', 'is', 'cat'] >>> reference = ['This', 'is', 'a', 'cat'] >>> references = [reference] # list of references for 1 sentence. >>> list_of_references = [references] # list of references for all sentences in corpus. >>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references. >>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses) 0.6025286104785453 >>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis) 0.6025286104785453 

(Note: To get a stable version of the implementation of the BLEU score, you must list the latest version of NLTK in the develop branch)


In the long :

Actually, if there is only one link and one hypothesis in your full corpus, then both corpus_bleu() and sentence_bleu() should return the same value as in the above example.

In the code, we see that sentence_bleu is actually a duck type of corpus_bleu :

 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): return corpus_bleu([references], [hypothesis], weights, smoothing_function) 

And if we look at the parameters for sentence_bleu :

  def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """" :param references: reference sentences :type references: list(list(str)) :param hypothesis: a hypothesis sentence :type hypothesis: list(str) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The sentence-level BLEU score. :rtype: float """ 

The input for sentence_bleu links is list(list(str)) .

So, if you have a sentence string, for example. "This is a cat" , you have to fake it to get a list of strings, ["This", "is", "a", "cat"] , and since it allows multiple links, it should be a list of strings, eg. if you have a second link: "This is a cat", your entry in sentence_bleu() will be:

 references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ] hypothesis = ["This", "is", "cat"] sentence_bleu(references, hypothesis) 

When it comes to the corpus_bleu() parameter list_of_references, basically a list of what sentence_bleu() accepts as links :

 def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None): """ :param references: a corpus of lists of reference sentences, wrt hypotheses :type references: list(list(list(str))) :param hypotheses: a list of hypothesis sentences :type hypotheses: list(list(str)) :param weights: weights for unigrams, bigrams, trigrams and so on :type weights: list(float) :return: The corpus-level BLEU score. :rtype: float """ 

Besides looking at the doctrine at nltk/translate/bleu_score.py , you can also look at unittest at nltk/test/unit/translate/test_bleu_score.py to learn how to use each of the components in bleu_score.py .

By the way, since sentence_bleu imported as bleu in ( nltk.translate.__init__.py ] ( https://github.com/nltk/nltk/blob/develop/nltk/translate/ init .py # L21 ) using

 from nltk.translate import bleu 

will be the same as:

 from nltk.translate.bleu_score import sentence_bleu 

and in code:

 >>> from nltk.translate import bleu >>> from nltk.translate.bleu_score import sentence_bleu >>> from nltk.translate.bleu_score import corpus_bleu >>> bleu == sentence_bleu True >>> bleu == corpus_bleu False 
+11
source share

We'll see:

 >>> help(nltk.translate.bleu_score.corpus_bleu) Help on function corpus_bleu in module nltk.translate.bleu_score: corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None) Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all the hypotheses and their respective references. Instead of averaging the sentence level BLEU scores (ie marco-average precision), the original BLEU metric (Papineni et al. 2002) accounts for the micro-average precision (ie summing the numerators and denominators for each hypothesis-reference(s) pairs before the division). ... 

You are in a better position than me to understand the description of the algorithm, so I will not try to โ€œexplainโ€ it to you. If the doxrin is not well understood, take a look at the source itself . Or find it locally:

 >>> nltk.translate.bleu_score.__file__ '.../lib/python3.4/site-packages/nltk/translate/bleu_score.py' 
+3
source share

All Articles