Repa performance compared to lists

The Numeric Haskell Repa Wiki tutorial has an excerpt that reads (for context):

10.1 Fusion, and why do you need it

Turnip is critically dependent on merging the array to achieve fast code. Fusion is a fancy name for the combination of embedding and code conversion executed by GHC when it compiles your program. The merge process combines the filling of an array of loops defined in the Repa library with the β€œworking” functions that you write in your own module. If the merge process fails, the resulting program will be much slower than it should be, often 10x slower than the equivalent program using simple Haskell lists. On the other hand, under the condition that the merge works, the resulting code will work as fast as the equivalent purely written program in C. Performing welding is not difficult as soon as you understand what is happening.

The part I don't understand is this:

"If the merge process fails, the resulting program will be much slower than it should be, often 10x slower than the equivalent program using simple Haskell lists."

I understand why it will work slower if thread merging fails, but why does it work much slower than lists?

Thanks!

+4
source share
2 answers

Edit: this is wrong - see Don Nelson's comment (and his answer - he knows a lot more about the library than I do).

Immutable arrays cannot share components; Apart from merging, any modification to an immutable array should redistribute the entire array. In contrast, while list operations are non-destructive, they can share parts: fi (h:t) = i:t , for example, replaces the head of the list in constant time, creating a new list in which the first cell points to the second cell original list. Moreover, since lists can be built gradually, functions such as generators that build the list by repeated calls to the function can work in O (n) time, while an equivalent function in an immutable array without merging will have to redistribute the array with each call functions, taking O (n ^ 2) time.

+3
source

Typically, since lists are lazy and Repa arrays are strict.

If you were unable to plan a lazy crawl of the list, for example

 map f . map g 

you pay the cost O (1) for each value, leaving there an intermediate (lazy) cell cons.

If you cannot merge the same bypass in a strict sequence, you will pay at least O (n) for the value to allocate the string intermediate array.

In addition, since fusion manages your code in an unrecognizable Stream data type, to improve analysis, you can leave the code with too many constructors and other overheads.

+9
source

All Articles