Pointfree version degrades performance

Question

Pointfree version degrades performance

Well, it turns out that I defined this function in my program code:

st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a st_zipOp f xs ys = St.foldr (\xr -> st_map (fx) r) xs ys

Does what it seems. It fastens (applying the operator several times, yes) two elements of type Stream a , which is a list type, using an internal operator of type a . The definition is pretty simple.

As soon as I defined the function this way, I tried this other version:

 st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a st_zipOp = St.foldr . (st_map .)

As far as I know, this is exactly the same definition as above. This is just a meaningless version of the previous definition.

However, I wanted to check if there was any performance change, and I found that, indeed, the pointless version made the program a little worse (both in memory and in time).

Why is this happening? If necessary, I can write a test program that reproduces this behavior.

I compile with -O2 if that matters.

Simple test case

I wrote the following code, trying to reproduce the behavior described above. This time I used lists, and the performance change was less noticeable, but still existed. This is the code:

 opEvery :: (a -> a -> a) -> [a] -> [a] -> [a] opEvery f xs ys = foldr (\xr -> map (fx) r) xs ys opEvery' :: (a -> a -> a) -> [a] -> [a] -> [a] opEvery' = foldr . (map .) main :: IO () main = print $ sum $ opEvery (+) [1..n] [1..n] where n :: Integer n = 5000

Profiling results using opEvery (version of explicit arguments):

 total time = 2.91 secs (2906 ticks @ 1000 us, 1 processor) total alloc = 1,300,813,124 bytes (excludes profiling overheads)

Profiling results using opEvery' (free version):

 total time = 3.24 secs (3242 ticks @ 1000 us, 1 processor) total alloc = 1,300,933,160 bytes (excludes profiling overheads)

However, I expected both versions to be equivalent (in every sense).

+7

performance haskell pointfree

Daniel Díaz Jan 30 '13 at 17:30

source share

2 answers

This is a big guess about what I'm going to say, but I think that the compiler does not have enough information to optimize your program. By not directly answering your question, but adding the Eq a constraint to both functions (as a test), I have an improvement from the zero-point option. See Attached Image (Splitting Explanations)

 Right -> TOP = everyOp initial, BOTTOM = everyOp' initial Left -> TOP = everyOp with Eq a constraint, BOTTOM = everyOp' Eq a constraint

EDIT: I am using GHC 7.4.2

+2

mhitza Jan 31 '13 at 12:00

source share

Daniel Fischer · Accepted Answer · 2013-01-31T12:39:45+0000

For a simple test case, both versions give the same kernel when compiling with optimization, but without profiling.

When compiling with profiling enabled ( -prof -fprof-auto ), the pointfull version becomes embedded, as a result, the main part

 Rec { Main.main_go [Occ=LoopBreaker] :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer] [GblId, Arity=1, Str=DmdType S] Main.main_go = \ (ds_asR :: [GHC.Integer.Type.Integer]) -> case ds_asR of _ { [] -> xs_r1L8; : y_asW ys_asX -> let { r_aeN [Dmd=Just S] :: [GHC.Integer.Type.Integer] [LclId, Str=DmdType] r_aeN = Main.main_go ys_asX } in scctick<opEvery.\> GHC.Base.map @ GHC.Integer.Type.Integer @ GHC.Integer.Type.Integer (GHC.Integer.Type.plusInteger y_asW) r_aeN } end Rec }

(you get something better without profiling).

When compiling a version of pointfree with profiling enabled, opEvery' not embedded, and you get

 Main.opEvery' :: forall a_aeW. (a_aeW -> a_aeW -> a_aeW) -> [a_aeW] -> [a_aeW] -> [a_aeW] [GblId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 80 60}] Main.opEvery' = \ (@ a_c) -> tick<opEvery'> \ (x_ass :: a_c -> a_c -> a_c) -> scc<opEvery'> GHC.Base.foldr @ a_c @ [a_c] (\ (x1_XsN :: a_c) -> GHC.Base.map @ a_c @ a_c (x_ass x1_XsN)) Main.main4 :: [GHC.Integer.Type.Integer] [GblId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 40 0}] Main.main4 = scc<main> Main.opEvery' @ GHC.Integer.Type.Integer GHC.Integer.Type.plusInteger Main.main7 Main.main5

When you add the pragma {-# INLINABLE opEvery' #-} , it can be embedded even when compiling for profiling, providing

 Rec { Main.main_go [Occ=LoopBreaker] :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer] [GblId, Arity=1, Str=DmdType S] Main.main_go = \ (ds_asz :: [GHC.Integer.Type.Integer]) -> case ds_asz of _ { [] -> lvl_r1KU; : y_asE ys_asF -> GHC.Base.map @ GHC.Integer.Type.Integer @ GHC.Integer.Type.Integer (GHC.Integer.Type.plusInteger y_asE) (Main.main_go ys_asF) } end Rec }

which is even slightly faster than the version without the pragma-dot version, since there is no need to gallop counters.

Probably a similar effect occurred for the case of Stream .

Stakeout:

Profiling prevents optimization. Code equivalent without profiling may not be with profiling support.
Never measure performance with code that has been compiled for profiling or without optimizations.
Profiling will help you find out where the time is spent in your code [but, sometimes, enabling profiling can completely change the behavior of the code; anything that depends heavily on optimizing rewriting and / or nesting rules is a candidate for this], but he cannot tell you how fast your code is.

Pointfree version degrades performance

Simple test case

More articles: