GHC calling convention for function arguments of type sum

Does the GHC ever save unpacking sum types when passing them to functions? For example, suppose we have the following type:

data Foo = Foo1 {-# UNPACK #-} !Int {-# UNPACK #-} !Word | Foo2 {-# UNPACK #-} !Int | Foo3 {-# UNPACK #-} !Word 

Then I define the function, a string in the argument Foo :

 consumeFoo :: Foo -> Int consumeFoo x = case x of ... 

At runtime, when I call consumeFoo , what can I expect? the GHC calling convention is to pass arguments in registers (or on the stack when there are too many of them). I see two ways of passing the argument:

  • A pointer to Foo on the heap is passed as a single argument.
  • The three-parameter representation of Foo , one argument representing the data constructor used, and the other two the possible values ​​of Int and Word in the data constructor.

I would prefer a second view, but I do not know if this is really happening. I know UnpackedSumTypes landing in GHC 8.2, but it is unclear if it does what I want. If I instead wrote a function as:

 consumeFooAlt :: (# (# Int#, Word# #) | Int# | Word# #) -> Int 

Then I would expect estimate (2) to be what will happen. And the “Unpacking” section on the unpacked sums page indicates that I could do this:

 data Wrap = Wrap {-# UNPACK #-} !Foo consumeFooAlt2 :: Wrap -> Int 

And it should also have the idea that I want, I think.

So my question is: without using a wrapper type or raw unpacked sum, how can I guarantee that the sum is unpacked into registers (or onto the stack) when I pass it as an argument to a function? If possible, is this something that GHC 8.0 can already do, or is it something that will only be available in GHC 8.2?

+7
haskell ghc
source share
2 answers

First: Guaranteed Optimization and GHC do not mix well. Due to the high level, it is very difficult to predict the code that the GHC will generate in each case. The only way to make sure is to look at Core. If you are developing a performance-dependent application with GHC, you need to become familiar with Core I.

I don't know any optimization in GHC that does exactly what you describe. Here is an example program:

 module Test where data Sum = A {-# UNPACK #-} !Int | B {-# UNPACK #-} !Int consumeSum :: Sum -> Int consumeSum x = case x of A y -> y + 1 B y -> y + 2 {-# NOINLINE consumeSumNoinline #-} consumeSumNoinline = consumeSum {-# INLINE produceSumInline #-} produceSumInline :: Int -> Sum produceSumInline x = if x == 0 then A x else B x {-# NOINLINE produceSumNoinline #-} produceSumNoinline :: Int -> Sum produceSumNoinline x = if x == 0 then A x else B x test :: Int -> Int --test x = consumeSum (produceSumInline x) test x = consumeSumNoinline (produceSumNoinline x) 

Let's first see what happens if we do not include consumeSum and produceSum . Here is the core:

 test :: Int -> Int test = \ (x :: Int) -> consumeSumNoinline (produceSumNoinline x) 

(done using ghc-core test.hs -- -dsuppress-unfoldings -dsuppress-idinfo -dsuppress-module-prefixes -dsuppress-uniques )

Here we see that GHC (8.0 in this case) does not decompress the sum type passed as an argument to the function. Nothing changes if we embed either consumeSum or produceSum .

If we embed both, the following code is generated:

 test :: Int -> Int test = \ (x :: Int) -> case x of _ { I# x1 -> case x1 of wild1 { __DEFAULT -> I# (+# wild1 2#); 0# -> lvl1 } } 

What happened is that through inline, the GHC ends:

 \x -> case (if x == 0 then A x else B x) of A y -> y + 1 B y -> y + 2 

Which in the case case ( if is only a special case ) turns into:

 \x -> if x == 0 then case (A x) of ... else case (B x) of ... 

Now this is the case with a well-known constructor, so GHC can reduce the compilation time by ending:

 \x -> if x == 0 then x + 1 else x + 2 

Thus, he completely eliminated the constructor.


In general, I believe that GHC does not have a concept like "unboxed sum" prior to version 8.2, which also applies to function arguments. The only way to get the “unpacked” amounts is to get a constructor that is completely excluded by nesting.

+7
source share

If you need such optimization, your easiest solution is to do it yourself. I think that in fact there are many ways to achieve this, but each of them:

 data Which = Left | Right | Both data Foo = Foo Which Int Word 

Unpacking any fields of this type is completely irrelevant to the question of the "presentation form" that you are really asking about. Enumerations are already highly optimized - only one value for each constructor has ever been created, so adding this field does not affect performance. An unpacked representation of this type is exactly what you want - one word for the Which constructor and one for each field.

If you write your functions correctly, you will get the correct code:

 data Which = Lft | Rgt | Both data Foo = Foo Which {-# UNPACK #-} !Int {-# UNPACK #-} !Word consumeFoo :: Foo -> Int consumeFoo (Foo wlr) = case w of Lft -> l Rgt -> fromIntegral r Both -> l + fromIntegral r 

The generated kernel is quite obvious:

 consumeFoo :: Foo -> Int consumeFoo = \ (ds :: Foo) -> case ds of _ { Foo w dt dt1 -> case w of _ { Lft -> I# dt; Rgt -> I# (word2Int# dt1); Both -> I# (+# dt (word2Int# dt1)) } } 

However, for simple programs such as:

 consumeFoos = foldl' (+) 0 . map consumeFoo 

This optimization does not matter. As pointed out in another answer, the consumeFoo internal function is simply inline:

 Rec { $wgo :: [Foo] -> Int# -> Int# $wgo = \ (w :: [Foo]) (ww :: Int#) -> case w of _ { [] -> ww; : y ys -> case y of _ { Lft dt -> $wgo ys (+# ww dt); Rgt dt -> $wgo ys (+# ww (word2Int# dt)); Both dt dt1 -> $wgo ys (+# ww (+# dt (word2Int# dt1))) } } end Rec } 

against.

 Rec { $wgo :: [Foo] -> Int# -> Int# $wgo = \ (w :: [Foo]) (ww :: Int#) -> case w of _ { [] -> ww; : y ys -> case y of _ { Foo w1 dt dt1 -> case w1 of _ { Lft -> $wgo ys (+# ww dt); Rgt -> $wgo ys (+# ww (word2Int# dt1)); Both -> $wgo ys (+# ww (+# dt (word2Int# dt1))) } } } end Rec } 

Which in almost every case when working with low-level unpacked data is the goal in any case, since most of your functions are small and cost little for built-in ones.

+2
source share

All Articles