How to write a constant space length function in Haskell?

The canonical implementation of length :: [a] -> Int is as follows:

 length [] = 0 length (x:xs) = 1 + length xs 

which is very beautiful but suffers from as it uses linear space.

Recursive version:

 length xs = length' xs 0 where length' [] n = n length' (x:xs) n = length xs (n + 1) 

doesn't suffer from this problem, but I don’t understand how it can work in a constant space in a lazy language.

Isn't this a run-time accumulating a lot of (n + 1) thunks when it moves around the list? Should this Haskell function consume O (n) space and cause a stack overflow?

(if it is important, I use GHC)

+7
haskell lazy-evaluation
source share
3 answers

Yes, you are faced with a common trap with the accumulation of parameters. The usual cure is to force a rigorous assessment of the cumulative parameter; for this i like the strict $! application operator $! . If you do not impose strictness, the GHC optimizer may decide that this function will be strict, but it may not be. Definitely, this is not a thing you can rely on: sometimes you want the accumulating parameter to be evaluated lazily, and O (N) is just fine, thanks.

How to write a constant space length function in Haskell?

As stated above, use a strict application statement to force the cumulative parameter to be evaluated:

 clength xs = length' xs 0 where length' [] n = n length' (x:xs) n = length' xs $! (n + 1) 

Type $! is equal to (a -> b) -> a -> b , and this forces an estimate of a before applying the function.

+15
source share

Launching the second version in GHCi:

 > length [1..1000000] *** Exception: stack overflow 

So, to answer your question: Yes, he suffers from this problem, as expected.

However, the GHC is smarter than the average compiler; if you compile with optimization, it will fix the code for you and make it work in a constant space.

More generally, there are ways to force rigor at specific points in the Haskell code, preventing the creation of deeply nested tricks. Common example: foldl vs. foldl' :

 len1 = foldl (\x _ -> x + 1) 0 len2 = foldl' (\x _ -> x + 1) 0 

Both functions are left folds that do the β€œsame” thing, except foldl lazy and foldl' is strict. The result is that len1 dies with a stack overflow in GHCi, and len2 working correctly.

+12
source share

The tail recursive function does not need to be supported on the stack, since the value returned by the function will simply be the value returned by the tail call. Thus, instead of creating a new stack frame, it is used again, and the locales overwritten by the new values ​​passed to the tail call. Thus, each n+1 written in the same place where the old n , and you have constant use of space.

Change In fact, as you wrote it, you are right, this will push (n+1) and cause an overflow. Just test, just try length [1..1000000] .. You can fix this by forcing it to evaluate it first: length xs $! (n+1) length xs $! (n+1) , which will then work as I said above.

+1
source share

All Articles