Why is "++" for the Haskell List implemented recursively and is O (n) time worth it?

Question

Why is "++" for the Haskell List implemented recursively and is O (n) time worth it?

As I understand it, List in Haskell is like a Linked-List in C.

So for the expressions below:

a = [1,2,3] b = [4,5,6] a ++ b

Haskell implements this in a recursive way as follows:

 (++) (x:xs) ys = x:xs ++ ys

The time complexity for this is O(n) ..

However, I was wondering why I cannot implement ++ more efficiently.

The most effective way could be this:

make a copy of (fork) a , call her a' , there may be some tricks to do this in O(1) time
make the last element a' to point to the first element b . This can be done easily in O(1) time.

Does anyone have any ideas about this? Thanks!

+7

list pointers functional-programming recursion haskell

Hanfei sun Jun 07 '15 at 10:50

source share

3 answers

See the copy part (fork) - problem - the recursive solution does exactly that (and you really need to do this because you need to configure all the pointers to the elements in list a .

Let's say a = [a1,a2,a3] and b is a list.

You must create a new copy of a3 (call her a3' ), because now it will no longer point to an empty list, but to the beginning of b .

Then you need to make a copy of the second and last element a2 too, because it must point to a3' and, finally, for the same reason - you also need to create a new copy of a1 (point to a2' ).

This is exactly what the recursive definition does - it is not a problem with the algorithm - it is a problem with the data structure (it is just not good with concatenation).

If you don't allow volatility and want a list structure, you really can do nothing.

You have it in other languages. also, if they provide immutable data - for example, in .net strings are immutable - so almost the same problem is with string concatenation as here (if you concatenate a lot of strings, your program will work badly). There is a temporary solution ( StringBuilder ) that will work better with memory - but, of course, these are no longer immutable data structures.

+12

Carsten Jun 07 '15 at 12:18

source share

It is not possible to perform this concatenation in constant time, simply because the invariance of the data structure does not allow it.

You might think that you can do something similar to the cons ( : operator, which adds an additional element x0 to the front of the list oldList=[x1,x2,x3] (resulting in newList=(x0:oldLIst) ), without need to run the entire list. But this is because you are not touching the existing oldList list, but simply referring to it.

 x0 : ( x1 : ( x2 : ( x3 : [] ) ) ) ^ ^ newList oldList

But in your case ( a ++ b ) we are talking about updating the link within the data structure. You want to replace [] with 1:(2:(3:[])) (explicit view [1,2,3] ) with a new tail b . Just count the parenthesis and you will see that we need to go inside to get to [] . This is always expensive because we have to duplicate the entire exterior to make sure that a remains unchanged. In the resulting list, where would old a point to have an unmodified list?

 1 : ( 2 : ( 3 : b ) ) ^ ^ a++bb

This is not possible in the same data structure. So, we need the second:

 1 : ( 2 : ( 3 : [] ) ) ^ a

And this means duplication of these nodes : that the mentioned linear time is necessarily in the first list. So the “copy (fork)” you were talking about is different from what you said, not in O (1).

make a copy of (fork) a, let's call it a ', there may be some tricks for this in O (1) time

When you talk about a “trick” to develop something at a constant time, you probably think that you are not actually making a full copy, but creating a link to the original a , and the changes are saved as “annotations” (for example , hint: "tail change: use b instead of [] ").

But the fact that Haskell, thanks to his laziness, is all the same! It does not immediately execute the O (n) algorithm, but simply “remembers” that you need a concatenated list until you get access to its elements. But this does not save you from paying the cost at the end. Because, although at the beginning the link was cheap (in O (1), just as you wanted), when you access the actual elements of the list, each instance of the ++ operator adds a bit of overhead (the cost of “interpreting the“ annotation ”that you added to their link), to the access of each element in the first part of the concatenation, actually adding the cost of O (n) finally.

+1

mastov Jun 09 '15 at 18:38

source share

shang · Accepted Answer · 2015-06-07T11:00:37+0000

This is pretty much what the recursive solution makes. This is a copy of a , which takes O (n) (where n is the length of a . The length of b does not affect complexity).

There is no "trick" to copy a list of n elements at O (1) time.

Why is "++" for the Haskell List implemented recursively and is O (n) time worth it?

More articles: