OCR: Levenshtein Weighted Distance

I am trying to create an optical character recognition system with a dictionary.

In fact, I do not yet have a implemented dictionary =)

I heard that there are simple metrics based on Levenshtein distance that take into account different distances between different characters. For instance. "N" and "H" are very close to each other, and d ("THEATER", "TNEATRE") should be less than d ("THEATER", "TOEATRE"), which is impossible when using the Levenshtein base distance.

Could you help me find such a metric, please.

+6
source share
3 answers

(#), " " :

      static double WeightedLevenshtein(string b1, string b2) {
        b1 = b1.ToUpper();
        b2 = b2.ToUpper();

        double[,] matrix = new double[b1.Length + 1, b2.Length + 1];

        for (int i = 1; i <= b1.Length; i++) {
            matrix[i, 0] = i;
        }

        for (int i = 1; i <= b2.Length; i++) {
            matrix[0, i] = i;
        }

        for (int i = 1; i <= b1.Length; i++) {
            for (int j = 1; j <= b2.Length; j++) {
                double distance_replace = matrix[(i - 1), (j - 1)];
                if (b1[i - 1] != b2[j - 1]) {
                    // Cost of replace
                    distance_replace += Math.Abs((float)(b1[i - 1]) - b2[j - 1]) / ('Z'-'A');
                }

                // Cost of remove = 1 
                double distance_remove = matrix[(i - 1), j] + 1;
                // Cost of add = 1
                double distance_add = matrix[i, (j - 1)] + 1;

                matrix[i, j] = Math.Min(distance_replace, 
                                    Math.Min(distance_add, distance_remove));
            }
        }

        return matrix[b1.Length, b2.Length] ;
    }

, : http://ideone.com/RblFK

+1

Already a few years later, but the next python package (with which I am NOT associated) allows you to arbitrarily weigh all the operations of editing Levenshtein, displaying ASCII characters, etc.

https://github.com/infoscout/weighted-levenshtein

pip install weighted-levenshtein

Also this one (also not affiliated):

https://github.com/luozhouyang/python-string-similarity
0
source

All Articles