Partial Computing Optimization in Haskell

I am wondering how to optimize this code:

fun n = (sum l, f $ f0 l, g $ g0 l) where l = map h [1..n] 

Assuming f , f0 , g , g0 and h are expensive, but creating and storing l extremely expensive.

As indicated, l retained until the returned tuple is fully evaluated or garbage collected. Instead, length l , f0 l and g0 l should be executed whenever any of them are executed, but f and g should be delayed.

It looks like this behavior can be fixed by writing:

 fun n = a `seq` b `seq` c `seq` (a, fb, gc) where l = map h [1..n] a = sum l b = inline f0 $ l c = inline g0 $ l 

Or very similar:

 fun n = (a,b,c) `deepSeq` (a, fb, gc) where ... 

We could possibly specify a bunch of internal types to achieve the same effects that look painful. Are there any other options?

In addition, I obviously hope with my inline that the compiler combines sum , f0 and g0 in one loop, which constructs and consumes l each time. I could make it explicit by manually embedding it, but it suck. Are there ways to explicitly prohibit list l and / or forced nesting? Pragmas that cause warnings or errors if addition or merging is possible during the compilation process?

Aside, I wonder why seq , inline , lazy , etc. all are defined as let x = x in x in the prelude. Is it just to give them a definition for the compiler to override?

+8
optimization haskell ghc
source share
1 answer

If you want to be sure, the only way to do it yourself. For any given version of the compiler, you can try a few source formulations and check the generated bytecode core / assembly / llvm / no matter what it does, what you want. But this may break with every new version of the compiler.

If you write

 fun n = a `seq` b `seq` c `seq` (a, fb, gc) where l = map h [1..n] a = sum l b = inline f0 $ l c = inline g0 $ l 

or the deepseq version, the compiler can combine the calculations a , b and c , which will be executed in parallel (not in the sense of concurrency) during one crawl from l , but for now I’m sure that GHC does not, and I would be surprised if JHC or UHC. And for this, the structure of calculations b and c should be quite simple.

The only way to get the desired result in portability between versions of compilers and compilers is to do it yourself. Over the next few years, at least.

Depending on f0 and g0 , it can be as simple as performing a strict left fold with the appropriate type of battery and a combining function, for example, a known average

 data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double average :: [Double] -> Double average = ratio . foldl' count (P 0 0) where ratio (P ns) = s / fromIntegral n count (P ns) x = P (n+1) (s+x) 

but if the structure f0 and / or g0 does not fit, say, one left fold and the other a right fold, then it may not be possible to perform the calculation in one round. In such cases, the choice is to recreate l and preserve l . Storing l easily achieved with explicit sharing ( where l = map h [1..n] ), but re-creating it can be difficult if the compiler does some general removal of the subexpression (unfortunately, GHC tends to share lists of this form, although this some CSE). For the GHC, the fno-cse and -fno-full-laziness can help avoid unwanted exchanges.

+3
source share

All Articles