Haskell function ineffective

Question

Haskell function ineffective

I am confused by the implementation of the nub (select unique values) function in the Haskell Data.List standard library. GHC implementation

nub l = nub' l [] where nub' [] _ = [] nub' (x:xs) ls | x `elem` ls = nub' xs ls | otherwise = x : nub' xs (x:ls)

As far as I can tell, this has the worst time complexity of O (n ^ 2), since for a list of unique values, she has to compare them all once to make sure they are truly unique.

If you use a hash table, complexity can be reduced to O (n) to build a table + O (1) to check each value compared to previous values in the hash table. Of course, this will not lead to an ordered list, but it is also possible in O (n log n) using its own ordered Data.Map GHC, if necessary.

Why choose such an inefficient implementation for an important library function? I understand that efficiency is not a major issue in Haskell, but at least the standard library may try to select (asymptotically) the best data structure to work with.

+7

performance algorithm haskell

jforberg Jan 18 '14 at 21:02

source share

3 answers

You are absolutely right - nub is an O (n ^ 2) algorithm. However, there are still reasons why you can use it instead of using hashmap:

for small lists it can still be faster
nub only requires an Eq constraint; for comparison, Data.Map requires an Ord constraint for keys, and Data.HashMap requires a key type with the Hashable and Ord classes
he is lazy - you do not need to run the entire input list to start getting results.

Edit: a small correction on the third point - you do not need to process the entire list to start receiving results; you still have to check every element of the input list (so nub will not work in infinite lists), but you will start returning results as soon as you find a unique element.

+10

Erikr Jan 18 '14 at 21:26

source share

https://groups.google.com/forum/m/#!msg/haskell-cafe/4UJBbwVEacg/ieMzlWHUT_IJ

In my experience, a Haskell "novice" (including Prelude and bad packages) simply ignores performance in many cases in favor of simplicity.

Haskell performance is a difficult task to solve, therefore, if you are not experienced enough to search through the platform or Hackage for alternatives to a simple nub (and especially if your entry is on the list just because you haven’t thought about alternative structures), then Data.List.nub is probably not your only serious performance issue, and you are probably writing code for a toy project where performance is not a big deal.

You just need to believe that when you start creating a large (in code or data) project, you will be more experienced and know how to configure your programs more efficiently.

In other words, don't worry about this and assume that anything in Haskell 98 that comes from Prelude or the database is most likely not the most effective way to solve the problem.

+3

misterbee Jan 19 '14 at 14:17

source share

Nikita Volkov · Accepted Answer · 2014-01-18T21:22:42+0000

Efficiency is a pretty serious problem in Haskell, because the language works on a par with Java and surpasses it in memory, but, of course, this is not C.

The answer to your question is quite simple: only the Eq constraint is required for the Prelude nub , while any implementation based on Map or Set will also require either Ord or Hashable .

Haskell function ineffective

More articles: