Why is the amount slower than foldl 'in haskell?

Question

Why is the amount slower than foldl 'in haskell?

How is the default sum in ghc ~ 10 times slower than its foldl' ( stricter foldl equivalent) equivalent? And if so, why is it not implemented with foldl' ?

 import Data.List > foldl' (+) 0 [1..10^7] 50000005000000 (0.39 secs, 963,528,816 bytes) > sum [1..10^7] 50000005000000 (4.13 secs, 1,695,569,176 bytes)

For completeness, foldl and foldr are also shown here.

 > foldl (+) 0 [1..10^7] 50000005000000 (4.02 secs, 1,695,828,752 bytes) > foldr (+) 0 [1..10^7] 50000005000000 (3.78 secs, 1,698,386,648 bytes)

sum seems to be implemented using foldl , as their runtime is similar. Tested on ghc 7.10.2.

+6

haskell fold ghc ghci

dimid Apr 28 '16 at 10:10

source share

2 answers

In addition to @Joachim Breitner's answer, I found this blog post a very interesting read (taken from the reddit discussion, thanks to @ZhekaKozlov for the link).

When Haskell 1.0 was published on this day 24 years ago, there was no seq function at all, so there was no choice but to define foldl in the "classic" way.
In the end, six years after much discussion, we got the seq function in Haskell 1.3. Although actually in Haskell 1.3, seq was part of the Eval class, so you could not use it anywhere, for example, in foldl. In Haskell 1.3, you would need to define foldl 'with a type:

 foldl' :: Eval b => (b -> a -> b) -> b -> [a] -> b

Haskell 1.4 and Haskell 98 got rid of the Eval class restriction for seq, but foldl was not changed. Hugs and GHC and other implementations added a non-standard warehouse.
I suspect that people then considered this a problem of compatibility and inertia. It’s easy enough to add a non-standard warehouse, but you cannot change the standard so easily.
I suspect that if we had seq from the start, we would define foldl using it.
Miranda, one of the predecessors of Haskells, already managed 5 years before Haskell 1.0.

By the way, I managed to save 20 ms using

 foldl1' (+) [1..10^7]

So, I think foldl1' should be the default for sum and product (with special handling of empty lists).

0

dimid May 01 '16 at 20:36

source share

Joachim breitner · Accepted Answer · 2016-04-28T11:35:41+0000

The sum function is implemented using foldl in the GHC:

 -- | The 'sum' function computes the sum of a finite list of numbers. sum :: (Num a) => [a] -> a {-# INLINE sum #-} sum = foldl (+) 0

as can be seen in the source .

This should be because it is a specification in a Haskell report .

The rationale was likely that for some lazy types of list items, foldl is the right thing. (I personally find that foldl almost always wrong, and you only need to use foldl' .)

With sufficient optimization, the GHC will embed this definition, specializing in this type of element, note that the cycle is strict and forces us to evaluate the battery at each iteration; effectively turning it into foldl' , as @ AndrásKovács noted.

Since GHC-7.10, sum itself is a method of a class of type Foldable , and the default definition passes through foldMap . However, instance Foldable [] overrides this with the above definition of sum .

Why is the amount slower than foldl 'in haskell?

More articles: