I play with Levenshtein distance calculation in Haskell, and I'm a little disappointed with the following performance issue. If you implement it as โnormalโ for Haskell, for example, below (dist), everything works fine:
dist :: (Ord a) => [a] -> [a] -> Int dist s1 s2 = ldist s1 s2 (L.length s1, L.length s2) ldist :: (Ord a) => [a] -> [a] -> (Int, Int) -> Int ldist _ _ (0, 0) = 0 ldist _ _ (i, 0) = i ldist _ _ (0, j) = j ldist s1 s2 (i+1, j+1) = output where output | (s1!!(i)) == (s2!!(j)) = ldist s1 s2 (i, j) | otherwise = 1 + L.minimum [ldist s1 s2 (i, j) , ldist s1 s2 (i+1, j) , ldist s1 s2 (i, j+1)]
But, if you bend your brain a little and implement it as dist ', it performs MUCH faster (about 10x).
dist' :: (Ord a) => [a] -> [a] -> Int dist' o1 o2 = (levenDist o1 o2 [[]])!!0!!0 levenDist :: (Ord a) => [a] -> [a] -> [[Int]] -> [[Int]] levenDist s1 s2 arr@ ([[]]) = levenDist s1 s2 [[0]] levenDist s1 s2 arr@ ([]:xs) = levenDist s1 s2 ([(L.length arr) -1]:xs) levenDist s1 s2 arr@ (x:xs) = let n1 = L.length s1 n2 = L.length s2 n_i = L.length arr n_j = L.length x match | (s2!!(n_j-1) == s1!!(n_i-2)) = True | otherwise = False minCost = if match then (xs!!0)!!(n2 - n_j + 1) else L.minimum [(1 + (xs!!0)!!(n2 - n_j + 1)) , (1 + (xs!!0)!!(n2 - n_j + 0)) , (1 + (x!!0)) ] dist | (n_i > n1) && (n_j > n2) = arr | n_j > n2 = []:arr `seq` levenDist s1 s2 $ []:arr | n_i == 1 = (n_j:x):xs `seq` levenDist s1 s2 $ (n_j:x):xs | otherwise = (minCost:x):xs `seq` levenDist s1 s2 $ (minCost:x):xs in dist
I tried all the usual seq tricks in the first version, but nothing speeds it up. This is a little unsatisfactory for me, because I expected the first version to be faster, because I do not need to evaluate the entire matrix, but only those parts that it needs.
Does anyone know if it is possible to implement these two implementations, or am I just reaping the benefits of tail recursion optimization in the latter, and so you need to live with its unreadable if I want performance?
Thanks, Orion
recursion haskell tail levenshtein distance
jdo
source share