Pointfree version degrades performance

Well, it turns out that I defined this function in my program code:

st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a st_zipOp f xs ys = St.foldr (\xr -> st_map (fx) r) xs ys 

Does what it seems. It fastens (applying the operator several times, yes) two elements of type Stream a , which is a list type, using an internal operator of type a . The definition is pretty simple.

As soon as I defined the function this way, I tried this other version:

 st_zipOp :: (a -> a -> a) -> Stream a -> Stream a -> Stream a st_zipOp = St.foldr . (st_map .) 

As far as I know, this is exactly the same definition as above. This is just a meaningless version of the previous definition.

However, I wanted to check if there was any performance change, and I found that, indeed, the pointless version made the program a little worse (both in memory and in time).

Why is this happening? If necessary, I can write a test program that reproduces this behavior.

I compile with -O2 if that matters.

Simple test case

I wrote the following code, trying to reproduce the behavior described above. This time I used lists, and the performance change was less noticeable, but still existed. This is the code:

 opEvery :: (a -> a -> a) -> [a] -> [a] -> [a] opEvery f xs ys = foldr (\xr -> map (fx) r) xs ys opEvery' :: (a -> a -> a) -> [a] -> [a] -> [a] opEvery' = foldr . (map .) main :: IO () main = print $ sum $ opEvery (+) [1..n] [1..n] where n :: Integer n = 5000 

Profiling results using opEvery (version of explicit arguments):

 total time = 2.91 secs (2906 ticks @ 1000 us, 1 processor) total alloc = 1,300,813,124 bytes (excludes profiling overheads) 

Profiling results using opEvery' (free version):

 total time = 3.24 secs (3242 ticks @ 1000 us, 1 processor) total alloc = 1,300,933,160 bytes (excludes profiling overheads) 

However, I expected both versions to be equivalent (in every sense).

+7
source share
2 answers

For a simple test case, both versions give the same kernel when compiling with optimization, but without profiling.

When compiling with profiling enabled ( -prof -fprof-auto ), the pointfull version becomes embedded, as a result, the main part

 Rec { Main.main_go [Occ=LoopBreaker] :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer] [GblId, Arity=1, Str=DmdType S] Main.main_go = \ (ds_asR :: [GHC.Integer.Type.Integer]) -> case ds_asR of _ { [] -> xs_r1L8; : y_asW ys_asX -> let { r_aeN [Dmd=Just S] :: [GHC.Integer.Type.Integer] [LclId, Str=DmdType] r_aeN = Main.main_go ys_asX } in scctick<opEvery.\> GHC.Base.map @ GHC.Integer.Type.Integer @ GHC.Integer.Type.Integer (GHC.Integer.Type.plusInteger y_asW) r_aeN } end Rec } 

(you get something better without profiling).

When compiling a version of pointfree with profiling enabled, opEvery' not embedded, and you get

 Main.opEvery' :: forall a_aeW. (a_aeW -> a_aeW -> a_aeW) -> [a_aeW] -> [a_aeW] -> [a_aeW] [GblId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 80 60}] Main.opEvery' = \ (@ a_c) -> tick<opEvery'> \ (x_ass :: a_c -> a_c -> a_c) -> scc<opEvery'> GHC.Base.foldr @ a_c @ [a_c] (\ (x1_XsN :: a_c) -> GHC.Base.map @ a_c @ a_c (x_ass x1_XsN)) Main.main4 :: [GHC.Integer.Type.Integer] [GblId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 40 0}] Main.main4 = scc<main> Main.opEvery' @ GHC.Integer.Type.Integer GHC.Integer.Type.plusInteger Main.main7 Main.main5 

When you add the pragma {-# INLINABLE opEvery' #-} , it can be embedded even when compiling for profiling, providing

 Rec { Main.main_go [Occ=LoopBreaker] :: [GHC.Integer.Type.Integer] -> [GHC.Integer.Type.Integer] [GblId, Arity=1, Str=DmdType S] Main.main_go = \ (ds_asz :: [GHC.Integer.Type.Integer]) -> case ds_asz of _ { [] -> lvl_r1KU; : y_asE ys_asF -> GHC.Base.map @ GHC.Integer.Type.Integer @ GHC.Integer.Type.Integer (GHC.Integer.Type.plusInteger y_asE) (Main.main_go ys_asF) } end Rec } 

which is even slightly faster than the version without the pragma-dot version, since there is no need to gallop counters.

Probably a similar effect occurred for the case of Stream .

Stakeout:

  • Profiling prevents optimization. Code equivalent without profiling may not be with profiling support.
  • Never measure performance with code that has been compiled for profiling or without optimizations.
  • Profiling will help you find out where the time is spent in your code [but, sometimes, enabling profiling can completely change the behavior of the code; anything that depends heavily on optimizing rewriting and / or nesting rules is a candidate for this], but he cannot tell you how fast your code is.
+12
source

This is a big guess about what I'm going to say, but I think that the compiler does not have enough information to optimize your program. By not directly answering your question, but adding the Eq a constraint to both functions (as a test), I have an improvement from the zero-point option. See Attached Image (Splitting Explanations)

 Right -> TOP = everyOp initial, BOTTOM = everyOp' initial Left -> TOP = everyOp with Eq a constraint, BOTTOM = everyOp' Eq a constraint 

enter image description here

EDIT: I am using GHC 7.4.2

+2
source

All Articles