I am confused by the implementation of the nub (select unique values) function in the Haskell Data.List standard library. GHC implementation
nub l = nub' l [] where nub' [] _ = [] nub' (x:xs) ls | x `elem` ls = nub' xs ls | otherwise = x : nub' xs (x:ls)
As far as I can tell, this has the worst time complexity of O (n ^ 2), since for a list of unique values, she has to compare them all once to make sure they are truly unique.
If you use a hash table, complexity can be reduced to O (n) to build a table + O (1) to check each value compared to previous values ββin the hash table. Of course, this will not lead to an ordered list, but it is also possible in O (n log n) using its own ordered Data.Map GHC, if necessary.
Why choose such an inefficient implementation for an important library function? I understand that efficiency is not a major issue in Haskell, but at least the standard library may try to select (asymptotically) the best data structure to work with.
performance algorithm haskell
jforberg
source share