Parallel Haskell - GHC GC'ing sparks

I have a program that I'm trying to parallelize (full insert with executable code here ).

I profiled and found that most of the time was spent in findNearest , which is essentially a simple foldr over a large Data.Map .

 findNearest :: RGB -> M.Map k RGB -> (k, Word32) findNearest rgb m0 = M.foldrWithKey' minDistance (k0, distance rgb r0) m0 where (k0, r0) = M.findMin m0 minDistance kr x@ (_, d1) = -- Euclidean distance in RGB-space let d0 = distance rgb r in if d0 < d1 then (k, d0) else x 

parFindNearest must run findNearest parallel to the larger subtrees of Map .

 parFindNearest :: NFData k => RGB -> M.Map k RGB -> (k, Word32) parFindNearest rgb = minimumBy (comparing snd) . parMap rdeepseq (findNearest rgb) . M.splitRoot 

Unfortunately, GHC GC has most of my sparks before they turn into useful parallelism.

Here's the compilation result with ghc -O2 -threaded and works with +RTS -s -N2

  839,892,616 bytes allocated in the heap 123,999,464 bytes copied during GC 5,320,184 bytes maximum residency (19 sample(s)) 3,214,200 bytes maximum slop 16 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1550 colls, 1550 par 0.23s 0.11s 0.0001s 0.0004s Gen 1 19 colls, 18 par 0.11s 0.06s 0.0030s 0.0052s Parallel GC work balance: 16.48% (serial 0%, perfect 100%) TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2) SPARKS: 215623 (1318 converted, 0 overflowed, 0 dud, 198111 GC'd, 16194 fizzled) INIT time 0.00s ( 0.00s elapsed) MUT time 3.72s ( 3.66s elapsed) GC time 0.34s ( 0.17s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.07s ( 3.84s elapsed) Alloc rate 225,726,318 bytes per MUT second Productivity 91.6% of total user, 97.1% of total elapsed gc_alloc_block_sync: 9862 whitehole_spin: 0 gen[0].sync: 0 gen[1].sync: 2103 

As you can see, most sparks are GC'd or fizzle before conversion. I tried experimenting with different stringency, having findNearest return a strict pair data type instead of a tuple, or using rdeepseq from Control.Parallel.Strategies , but my sparks are still GC'd.

I'd like to know

  • Why are my sparks gc'd before converting?
  • How can I change my program to take advantage of parallelism?
+7
garbage-collection parallel-processing haskell
source share
1 answer

I do not understand parallel strategies, so I can be completely wrong. But:

If you turn off the GC by setting a sufficiently large selection area (for example, using the -A20M runtime option), you will see that most sparks are incinerated, not GC'd. This means that they are evaluated by the normal flow of the program until the completion of the corresponding spark.

minimumBy forces parMap results immediately, starting to evaluate them. At the same time, sparks are planned and carried out, but it is too late. When the spark is finished, the value is already evaluated by the main thread. Without -A20M spark GC'd, because the value is estimated and GC'd even before the planned spark.

Here is a simplified example:

 import Control.Parallel.Strategies f :: Integer -> Integer f 0 = 1 fn = n * f (n - 1) main :: IO () main = do let l = [n..n+10] n = 1 res = parMap rdeepseq fl print res 

In this case, all sparks are curved:

  SPARKS: 11 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 11 fizzled) 

(Sometimes it's gc'd)

But if I give the main thread before the print results,

 import Control.Parallel.Strategies import Control.Concurrent f :: Integer -> Integer f 0 = 1 fn = n * f (n - 1) main :: IO () main = do let l = [n..n+10] n = 1 res = parMap rdeepseq fl res `seq` threadDelay 1 print res 

Then all sparks are converted:

 SPARKS: 11 (11 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled) 

So it looks like you don’t have enough spark (try setting l = [n..n+1000] in my example) and they are not heavy enough (try setting n = 1000 in my example).

+4
source share

All Articles