Why the method [x | x <- [1..10]] so slow in Haskell?

Question

Why the method [x | x <- [1..10]] so slow in Haskell?

Why is something like this happening in Haskell?

 test = [x|a<-[1..100],b<-[1..100],c<-[1..100],d<-[1..100],let x = a] print $ length test

To start, you only need about 10^8 numbers, this should be done in no time, but it looks like it works forever and almost crashed.

+6

haskell

CYC Oct 12 '15 at 1:55

source share

2 answers

You do not just start the loop 10 ^ 8 times, you create a list with 10 ^ 8 items. Since you are using length , Haskell must actually evaluate the entire list in order to return its length. Each item in the list accepts one word, which may be 32 bits or may be 64 bits. Under the optimistic assumption that these are 32 bits (4 bytes), you just allocated 400 MB (about 381.5 MB) of memory. If it is 64 bits, then it is 800 MB (about 763 megabytes) of memory that you just allocated. Depending on what else is happening on your system, you might just click on the swap / swap section, allocating as much RAM on the piece.

If other subtleties occur, I do not know about them, but using memory is my first suspicion of why it is so slow.

+4

rmunn Oct 12 '15 at 2:41

source share

Reid barton · Accepted Answer · 2015-10-12T13:21:44+0000

Do you run this in ghci or in a compiled program? This is of great importance.

If in ghci, ghci will support the calculated test value if you want to use it later. This is usually a good idea, but not in this case, when test is a huge value that would be cheap to recount anyway. How huge? For starters, this is a list of 10 ^ 8 elements and (on a 64-bit system) the list costs 24 bytes per element, so 2.4G is narrower. Then there is the use of the values of the values of space themselves. You might think that all values are taken from [1..100] , so they should be separated and use a small amount of space as a whole. But the values in the list really have the form x , which can depend on a , b , c and d , and length never checks the values in the list as it goes through. Therefore, each element will be presented as a closure that refers to a , b , c and d , which takes up at least 8 * (4 + 1) = 40 bytes, which brings us to just 6.4G.

This is quite a lot, and the garbage collector should do quite a lot of copying when you allocate 6.4G of data, and all this constantly lives on. This is something that takes so long, and does not compute the list or its length.

If you compile a program

 test = [x|a<-[1..100],b<-[1..100],c<-[1..100],d<-[1..100],let x = a] main = print $ length test

then test does not need to be supported live, because its length is calculated, since it is obvious that it will never be used again. So, now GC practically does not work, and the program runs in a couple of seconds (reasonable for ~ 10 ^ 8 lists of node distributions and calculations on Integer ).

Why the method [x | x <- [1..10]] so slow in Haskell?

More articles: