Edit distance algorithm

I have a dictionary of "n" words, and there are answers "m" for the answer. I want to output the number of words in a dictionary that are editing distances of 1 or 2. I want to optimize the result set, given that n and m are approximately 3000.

Edit added from answer below:

I will try to say it differently.

Initially, there are “n” words defined as a set of vocabulary words. The following “m” words are given, which are the query words and for each query word, I need to find if the word exists in the dictionary (Edit Distance '0') or the total number of words in the dictionary that are at an editing distance of 1 or 2 of the dictionary words.

Hopefully now the question is transparent.

Well, this expires if the time complexity is (m * n) n. The naive use of the DP Edit Distance editing algorithm. Even the calculation of the diagonal elements 2k + 1 times, where k is the threshold here k = 3 in the above case.

+5
source share
3 answers

You want to use the Levenshtein distance between two words, but I assume that you know that since the question tags say.

You will need to iterate over your list (guess) and compare each word in the list with the current query that you are doing. You can build a BK tree to limit your search space, but that sounds like an overkill if you have only ~ 3,000 words.

var upperLimit = 2;
var allWords = GetAllWords();
var matchingWords = allWords
        .Where(word => Levenshtein(query, word) <= upperLimit)
        .ToList();

, = 0 Contains-, . , <= 2, , 3000 . , 9 .

, , , ? - ?

Graph showing visited search space using a bk-tree with different amount of input
(: itu.edu.tr)
CLiki: bk-tree

, bk- <= 2 1% , , , . , , , /.

+6

-.

"n" , . "m" , , , (Edit Distance '0') , 1 2 .

, .

, , (m * n) * n. DP . 2 * k + 1 , k - k = 3 .

PS: BK Tree . ++.

+1
public class Solution   {
    public int minDistance(String word1, String word2) {
        int[][] table = new int[word1.length()+1][word2.length()+1];
        for(int i = 0; i < table.length; ++i) {
            for(int j = 0; j < table[i].length; ++j) {
                if(i == 0)
                    table[i][j] = j;
                else if(j == 0)
                    table[i][j] = i;
                else {
                    if(word1.charAt(i-1) == word2.charAt(j-1))
                        table[i][j] = table[i-1][j-1];
                    else
                        table[i][j] = 1 + Math.min(Math.min(table[i-1][j-1], 
                            table[i-1][j]), table[i][j-1]);
                }
            }
        }
        return table[word1.length()][word2.length()];
    }
}
+1
source

All Articles