Check the similarity between two words with NLTK using Python

I have two lists, and I want to check the similarity between each words in the two lists and find out the maximum similarity. Here is my code

from nltk.corpus import wordnet list1 = ['Compare', 'require'] list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write'] list = [] for word1 in list1: for word2 in list2: wordFromList1 = wordnet.synsets(word1)[0] wordFromList2 = wordnet.synsets(word2)[0] s = wordFromList1.wup_similarity(wordFromList2) list.append(s) print(max(list)) 

But this will result in an error:

 wordFromList2 = wordnet.synsets(word2)[0] IndexError: list index out of range 

Please help me fix this.
Thank you

+5
source share
2 answers

You get an error if the synset list is empty and you are trying to get the item at a (nonexistent) null index. But why only check the null element? If you want to check everything, try all pairs of elements in the returned synsets. You can use itertools.product() to save two two loops:

 from itertools import product sims = [] for word1, word2 in product(list1, list2): syns1 = wordnet.synsets(word1) syns2 = wordnet.synsets(word2) for sense1, sense2 in product(syns1, syns2): d = wordnet.wup_similarity(sense1, sense2) sims.append((d, syns1, syns2)) 

This is inefficient because the same syntaxes are scanned over and over, but they are closest to the logic of your code. If you have enough data to make speed a problem, you can speed it up by collecting synsets for all the words in list1 and list2 once and taking the synsets product.

 >>> allsyns1 = set(ss for word in list1 for ss in wordnet.synsets(word)) >>> allsyns2 = set(ss for word in list2 for ss in wordnet.synsets(word)) >>> best = max((wordnet.wup_similarity(s1, s2) or 0, s1, s2) for s1, s2 in product(allsyns1, allsyns2)) >>> print(best) (0.9411764705882353, Synset('command.v.02'), Synset('order.v.01')) 
+9
source

Try checking if these lists are empty before using:

 from nltk.corpus import wordnet list1 = ['Compare', 'require'] list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show', 'spell', 'state', 'tell', 'trace', 'write'] list = [] for word1 in list1: for word2 in list2: wordFromList1 = wordnet.synsets(word1) wordFromList2 = wordnet.synsets(word2) if wordFromList1 and wordFromList2: #Thanks to @alexis' note s = wordFromList1[0].wup_similarity(wordFromList2[0]) list.append(s) print(max(list)) 
+6
source

All Articles