Deploy pmap from scratch. Why is my implementation slow?

I am new to Erlang, so for training I am trying to implement standard functions from scratch. I tried to create a parallel implementation of the map / 2 function from the list . But my implementation is very slow. Could you please tell me if I made the main mistakes in my implementation:

enter image description here

-module( my_pmap ). -export([ pmap/2 ]). -export([ map/4, collect/3 ]). map( F, Value, Indx, SenderPid ) -> SenderPid ! { Indx, F( Value ) }. pmap( F, List ) -> CollectorPid = spawn_link( my_pmap, collect, [ length( List ), [], self() ] ), lists:foldl( fun( X, Indx ) -> spawn_link( my_pmap, map, [ F, X, Indx, CollectorPid ] ), Indx + 1 end, 1, List ), Mapped = receive { collected, M } -> M end, Sorted = lists:sort( fun( { Indx1, _ }, { Indx2, _ } ) -> Indx1 < Indx2 end, Mapped ), [ Val || { _Indx, Val } <- Sorted ]. collect( 0, List, SenderPid ) -> SenderPid ! { collected, List }; collect( N, List, SenderPid ) when N > 0 -> receive Mapped -> collect( N - 1, [ Mapped | List ], SenderPid ) end. 

And here are the test results:

 1> c(my_pmap). {ok,my_pmap} 2> timer:tc( my_pmap, pmap, [ fun(X) -> X*X*X*X end, lists:seq( 1, 10000 ) ] ). {137804, [1,16,81,256,625,1296,2401,4096,6561,10000,14641,20736, 28561,38416,50625,65536,83521,104976,130321,160000,194481, 234256,279841,331776,390625,456976,531441|...]} 3> timer:tc( lists, map, [ fun(X) -> X*X*X*X end, lists:seq( 1, 10000 ) ] ). {44136, [1,16,81,256,625,1296,2401,4096,6561,10000,14641,20736, 28561,38416,50625,65536,83521,104976,130321,160000,194481, 234256,279841,331776,390625,456976,531441|...]} 

As you could see 0.137804 sec. vs 0.044136 seconds

thanks

+4
source share
1 answer

Comments are correct. The problem is that spawning processes are cheap, but they have a cost. Multiplying the Number three times is very fast, and the overhead of creating a new process kills your productivity.

Dividing the list into fragments and processing each fragment in a separate process is likely to be faster. If you know that you have 8 cores, you can try to break it into 8 fragments. Things like pmap can be implemented in Erlang, but that is not the power of Erlang. A system like the Haskell GHC has sparks, which is the best tool for fine-grained parallelism like this. In addition, such an increase is an obvious candidate for SIMD instructions in the SSE or GPU. Erlang has no solution for this, but again, the GHC has accelerate and repa , which are libraries to handle this situation.

On the other hand, you can get good acceleration in Erlang just by using processes to process multiple fragments as a hint. Also note that parallel computing often does not work well at low N (for example, 10,000) due to communication overhead. You need more problems to take advantage.

+5
source

All Articles