Speed up Haskell concurrency

Question

Speed up Haskell concurrency

MVar, TVar, IORef, ... I cannot speed up the thunk problem (I think).

(My original problem is multi-threaded code, I do forkIO n times by calling addMany, but I think my problem is with the shW function)

Let the following code:

{-# LANGUAGE BangPatterns #-} import Control.Concurrent import Control.Monad import System.Environment(getArgs) import Data.Int import Data.IORef -- "i" times, add "n" for each IORef (in "a") addMany :: [IORef Int64] -> Int64 -> Int64 -> IO () addMany !a !n !i = forM_ [1..i] (\_ -> forM_ a (shW n)) -- MVar, TVar, IORef, ... read/write (x' = x + k) shR = readIORef shW !k !r = atomicModifyIORef r (\ !x' -> (x' + k, ())) main = do (n':i':_) <- getArgs let (n, i) = (read n', read i') v <- forM [1..n] (\_ -> newIORef 0) addMany v 1 i mapM shR v >>= (putStrLn.show.sum)

then the result of the profile result:

 MUT time 3.12s ( 3.12s elapsed) GC time 6.96s ( 6.96s elapsed) ... COST CENTRE MODULE no. entries %time %alloc %time %alloc MAIN MAIN 47 0 0.0 0.0 100.0 100.0 main Main 95 0 0.0 0.0 100.0 100.0 main.i Main 100 1 0.0 0.0 0.0 0.0 addMany Main 99 1 0.4 0.5 100.0 100.0 addMany.\ Main 101 15000 6.6 0.0 99.6 99.5 shW Main 102 2250000 92.7 99.5 93.0 99.5 shW.\ Main 104 2250000 0.3 0.0 0.3 0.0

I cannot remove thunk on shW calls (and memory usage is huge). What's wrong?

Similar C # code works much faster:

 class Node { private object m; private int n; public Node() {n = 0; m = new object();} public void Inc() {lock(m) n++;} public int Read() {return n;} } class MainClass { public static void Main(string[] args) { var nitems = 1000; var nthreads = 6; var niters = 100000; var g = Enumerable.Range(0, nitems).Select(i => new Node()).ToArray(); Task.WaitAll(Enumerable.Range(0, nthreads).Select(q => Task.Factory.StartNew(() => { var z = niters; while(z-- > 0) foreach(var i in g) i.Inc(); })).ToArray()); Console.WriteLine("Sum node values: {0}", g.Sum(i => i.Read())); } }

Thank you so much!

UPDATE

Fixed original issue: https://gist.github.com/3742897

Thanks a lot Don Stewart !

+8

concurrency memory haskell lazy-evaluation

josejuan 18 sept '12 at 10:41

source share

1 answer

Don stewart · Accepted Answer · 2012-09-18T12:12:17+0000

You can immediately see that there is a leak when you look at the GC heap and time:

 USDGHTVVH1$ time ./A 1000 10000 +RTS -s 10000000 1,208,298,840 bytes allocated in the heap 1,352,861,868 bytes copied during GC 280,181,684 bytes maximum residency (9 sample(s)) 4,688,680 bytes maximum slop 545 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1677 colls, 0 par 2.27s 2.22s 0.0013s 0.0063s Gen 1 9 colls, 0 par 1.66s 1.77s 0.1969s 1.0273s INIT time 0.00s ( 0.00s elapsed) MUT time 0.70s ( 0.77s elapsed) GC time 3.92s ( 4.00s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.62s ( 4.77s elapsed) %GC time 84.8% (83.8% elapsed) Alloc rate 1,718,469,461 bytes per MUT second Productivity 15.2% of total user, 14.7% of total elapsed real 0m4.752s user 0m0.015s sys 0m0.046s

280M residence and 89% GC. A lot of noise is emitted and discarded.

The heap profile makes it obivous.

The key is that this is a "stg_app *" thing (that is, the STG machine uses thunks).

A subtle but wonderful problem with the family of modification functions is the problem here - when you have a lazy atomModify, there is simply no way to strictly update the field without requiring a value.

Thus, all your careful use of atomicModifyIORef r (\ !x' -> (x' + k, ())) is to build a chain of applications of the function (+) , so that the result of the chain (i.e. the value in the cell) is observed Each addition will be strictly his argument. Not what you want! None of your annotations on the severity of the argument to change the IORef will have any effect on the cell itself. Now, as a rule, a lazy change is what you want - it's just a collection of pointers, so you can have very short atomic sections.

But sometimes this is not what you want.

(For the background of this issue, see GHC ticket # 5926 , however the issue was known at least in 2007 when I wrote strict-concurrency to avoid this problem with MVars. Was discussed in 2009 , and now we have strict version in 2012).

By first requiring a value, you can remove the problem. For example.

 shW !k !r = --atomicModifyIORef r (\x -> (x + k, ())) do x <- shR r writeIORef r $! (x+1)

Note that this problem is now documented in libraries , and you can use atomicModifyIORef' to avoid it.

And we get:

 USDGHTVVH1$ time ./A 1000 10000 +RTS -s 10000000 120,738,836 bytes allocated in the heap 3,758,476 bytes copied during GC 73,020 bytes maximum residency (1 sample(s)) 16,348 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 230 colls, 0 par 0.00s 0.01s 0.0000s 0.0001s Gen 1 1 colls, 0 par 0.00s 0.00s 0.0002s 0.0002s INIT time 0.00s ( 0.00s elapsed) MUT time 0.17s ( 0.17s elapsed) GC time 0.00s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.19s ( 0.17s elapsed) %GC time 0.0% (3.9% elapsed) Alloc rate 643,940,458 bytes per MUT second Productivity 100.0% of total user, 108.3% of total elapsed real 0m0.218s user 0m0.000s sys 0m0.015s

That is, 22x acceleration and memory usage become permanent. For laughs, here's a new heap profile:

Speed ​​up Haskell concurrency

More articles:

Speed up Haskell concurrency