Why is the use of replication much slower than serial execution?

I have a problem. I wanted to use scala.concurrent.ops.replicate to parallelize my program. But I found out that the algorithm actually becomes much slower. So I wrote a little test and got the same result. So there they are.

Serial Code: takes about 63 seconds to complete

object SerTest { def main(args: Array[String]) { for(x <- 1 to 10){ for(i <- 1 to 4) { for(j <- 1 to 100000) { val a = BigInt(j).isProbablePrime(1000) if(!a && j == 100000) println(i + " is ready")}}}}} 

Parallel code: takes about 161 seconds to complete

 object ParTest { def main(args: Array[String]) { for(x <- 1 to 10){ replicate(1,5) { i => for(j <- 1 to 100000) { val a = BigInt(j).isProbablePrime(1000) if(!a && j == 100000) println(i + " is ready")}}}}} 

So where is the completely obvious and embarrassing mistake I made? :)

Edit: Oh, and I'm running this on a Quadcore-CPU. So it should be faster :)

Edit2: Due to Kevin Wright's answer, I changed the programs a bit to have more time to run.

+4
source share
2 answers

Take a look at the source of BigInteger.isProbablePrime (BigInt delegates to the java library). This makes a serious amount of new BigInteger (), since it is an immutable class.

I assume that memory allocation causes too many conflicts to benefit from parallelization. You can probably confirm by replacing a simple calculation (for example, by multiplying the number 100MM together) for your main test. Or rewrite the primary test using var longs instead of BigInt.

In addition, ops.replicate starts operations on new threads, rather than using some kind of thread pool. Creating a topic has a certain amount of overhead, but not enough to be a problem in this case. I personally prefer sticking to the more robust java.util.concurrent libraries.

+3
source

Looking at your sample code, I assume that you jump directly to the main method from the command line. This is the worst way you can do micro-profiling in Java!

You must first run your test several times (within the same VM call), at least enough for the JVM to warm up correctly and run for 30 seconds before you even start thinking about starting to measure something - either, This ensures that it will execute compiled (and not interpreted) code and is fully optimized.

You also need to know the cost of running threads. For short cycles, this will be excessively overhead and will consume more time than the cycle itself!

Update

The following definitions follow from ops.scala:

 val defaultRunner: FutureTaskRunner = TaskRunners.threadRunner def spawn(p: => Unit)(implicit runner: TaskRunner = defaultRunner): Unit = {...} def replicate(start: Int, end: Int)(p: Int => Unit) {...} 

Thus, the actual runner used is entered as implicit, or by default - TaskRunners.threadRunner

You can try changing this to use the thread pool, your code prefix:

 implicit val runner = TaskRunners.threadPoolRunner 

Or I believe the following will also work:

 import concurrent.TaskRunners.threadPoolRunner 

Look doesn't matter


Secondly...

I don’t think this parameter will really go through the nested spawn call, it might be better if you just duplicate the method yourself (I currently have a request for this posted on the mailing lists).

For your convenience, here is the method in full, scary, glory:

 def replicate(start: Int, end: Int)(p: Int => Unit) { if (start == end) () else if (start + 1 == end) p(start) else { val mid = (start + end) / 2 spawn { replicate(start, mid)(p) } replicate(mid, end)(p) } } 

(you still need to define an implicit runner ...)

+2
source

All Articles