Levenshtein distance calculation using word lists

First I want to say that I am new to python. I am trying to calculate the Levenshtein distance for many word lists. So far I have managed to write code for a couple of words, but I am having problems with this for lists. I just use two lists with words one below the other: Carlos Stiv Peter

I want to use Levenshtein distance for a similarity approach. Can someone tell me how I can load lists and then use the function to calculate the distance?

I will be grateful!

Here is my code for only two lines:

#!/usr/bin/env python # -*- coding=utf-8 -*- def lev_dist(source, target): if source == target: return 0 #words = open(test_file.txt,'r').read().split(); # Prepare matrix slen, tlen = len(source), len(target) dist = [[0 for i in range(tlen+1)] for x in range(slen+1)] for i in xrange(slen+1): dist[i][0] = i for j in xrange(tlen+1): dist[0][j] = j # Counting distance for i in xrange(slen): for j in xrange(tlen): cost = 0 if source[i] == target[j] else 1 dist[i+1][j+1] = min( dist[i][j+1] + 1, # deletion dist[i+1][j] + 1, # insertion dist[i][j] + cost # substitution ) return dist[-1][-1] if __name__ == '__main__': import sys if len(sys.argv) != 3: print 'Usage: You have to enter a source_word and a target_word' sys.exit(-1) source, target = sys.argv[1], sys.argv[2] print lev_dist(source, target) 
+6
python levenshtein distance
source share
2 answers

Finally, I got the code with some help from a friend :) You can calculate the Levenshtein distance and compare it with all the words from the second list by changing the last line in the script, that is: print (list1 [0], list2 [i]), to compare the first word from list1 with each word in list2.

thanks

 #!/usr/bin/env python # -*- coding=utf-8 -*- import codecs def lev_dist(source, target): if source == target: return 0 # Prepare a matrix slen, tlen = len(source), len(target) dist = [[0 for i in range(tlen+1)] for x in range(slen+1)] for i in xrange(slen+1): dist[i][0] = i for j in xrange(tlen+1): dist[0][j] = j # Counting distance, here is my function for i in xrange(slen): for j in xrange(tlen): cost = 0 if source[i] == target[j] else 1 dist[i+1][j+1] = min( dist[i][j+1] + 1, # deletion dist[i+1][j] + 1, # insertion dist[i][j] + cost # substitution ) return dist[-1][-1] # load words from a file into a list def loadWords(file): list = [] # create an empty list to hold the file contents file_contents = codecs.open(file, "r", "utf-8") # open the file for line in file_contents: # loop over the lines in the file line = line.strip() # strip the line breaks and any extra spaces list.append(line) # append the word to the list return list if __name__ == '__main__': import sys if len(sys.argv) != 3: print 'Usage: You have to enter a source_word and a target_word' sys.exit(-1) source, target = sys.argv[1], sys.argv[2] # create two lists, one of each file by calling the loadWords() function on the file list1 = loadWords(source) list2 = loadWords(target) # now you have two lists; each file has to have the words you are comparing on the same lines # now call you lev_distance function on each pair from those lists for i in range(0, len(list1)): # so now you are looping over a range of numbers, not lines print lev_dist(list1[0], list2[i]) # print lev_dist(source, target) 
+7
source share
+5
source share

All Articles