Is there a “threshold” justifying multithreading computing?

So basically, I needed to optimize this piece of code today. He tries to find the longest sequence created by some function for the first million start numbers:

public static void main(String[] args) { int mostLen = 0; int mostInt = 0; long currTime = System.currentTimeMillis(); for(int j=2; j<=1000000; j++) { long i = j; int len = 0; while((i=next(i)) != 1) { len++; } if(len > mostLen) { mostLen = len; mostInt = j; } } System.out.println(System.currentTimeMillis() - currTime); System.out.println("Most len is " + mostLen + " for " + mostInt); } static long next(long i) { if(i%2==0) { return i/2; } else { return i*3+1; } } 

My mistake was to try to introduce multithreading:

 void doSearch() throws ExecutionException, InterruptedException { final int numProc = Runtime.getRuntime().availableProcessors(); System.out.println("numProc = " + numProc); ExecutorService executor = Executors.newFixedThreadPool(numProc); long currTime = System.currentTimeMillis(); List<Future<ValueBean>> list = new ArrayList<Future<ValueBean>>(); for (int j = 2; j <= 1000000; j++) { MyCallable<ValueBean> worker = new MyCallable<ValueBean>(); worker.setBean(new ValueBean(j, 0)); Future<ValueBean> f = executor.submit(worker); list.add(f); } System.out.println(System.currentTimeMillis() - currTime); int mostLen = 0; int mostInt = 0; for (Future<ValueBean> f : list) { final int len = f.get().getLen(); if (len > mostLen) { mostLen = len; mostInt = f.get().getNum(); } } executor.shutdown(); System.out.println(System.currentTimeMillis() - currTime); System.out.println("Most len is " + mostLen + " for " + mostInt); } public class MyCallable<T> implements Callable<ValueBean> { public ValueBean bean; public void setBean(ValueBean bean) { this.bean = bean; } public ValueBean call() throws Exception { long i = bean.getNum(); int len = 0; while ((i = next(i)) != 1) { len++; } return new ValueBean(bean.getNum(), len); } } public class ValueBean { int num; int len; public ValueBean(int num, int len) { this.num = num; this.len = len; } public int getNum() { return num; } public int getLen() { return len; } } long next(long i) { if (i % 2 == 0) { return i / 2; } else { return i * 3 + 1; } } 

Unfortunately, the multi-threaded version worked 5 times slower than the single-threaded version on 4 processors (cores).

Then I tried a slightly rougher approach:

 static int mostLen = 0; static int mostInt = 0; synchronized static void updateIfMore(int len, int intgr) { if (len > mostLen) { mostLen = len; mostInt = intgr; } } public static void main(String[] args) throws InterruptedException { long currTime = System.currentTimeMillis(); final int numProc = Runtime.getRuntime().availableProcessors(); System.out.println("numProc = " + numProc); ExecutorService executor = Executors.newFixedThreadPool(numProc); for (int i = 2; i <= 1000000; i++) { final int j = i; executor.execute(new Runnable() { public void run() { long l = j; int len = 0; while ((l = next(l)) != 1) { len++; } updateIfMore(len, j); } }); } executor.shutdown(); executor.awaitTermination(30, TimeUnit.SECONDS); System.out.println(System.currentTimeMillis() - currTime); System.out.println("Most len is " + mostLen + " for " + mostInt); } static long next(long i) { if (i % 2 == 0) { return i / 2; } else { return i * 3 + 1; } } 

and it worked much faster, but still it was slower than a single thread approach.

I hope this is not because I messed up the way I do multithreading, but this particular calculation / algorithm is not suitable for parallel computing. If I change the calculation to make it more intense with the processor, replacing the next method with:

 long next(long i) { Random r = new Random(); for(int j=0; j<10; j++) { r.nextLong(); } if (i % 2 == 0) { return i / 2; } else { return i * 3 + 1; } } 

both multi-threaded versions start working more than twice as fast as the single-threaded version on a 4-core computer.

So there should be a specific threshold that you can use to determine if multithreading is worth introducing, and my question is:

What is the main rule that will help to decide whether this calculation is sufficiently optimized for optimization by running it in parallel (without wasting effort on its implementation?)

+7
source share
4 answers

I think there is another component that you are not considering. Parallelization works best when the units of work are independent of each other. Performing a parallel calculation is not optimal if the later calculation results depend on the results of previous calculations. The dependency can be strong in the sense of "I need a first value to calculate the second value." In this case, the task is completely consistent, and subsequent values ​​cannot be calculated without waiting for earlier calculations. There may also be a weaker dependence in the sense of "If I had the first value, I could more quickly calculate the second value." In this case, the cost of parallelization is that some work can be duplicated.

This problem can be optimized without multithreading, because some of the later values ​​can be calculated faster if you already have previous results. Take for example j == 4 . After the inner loop i == 2 , but you just calculated the result for j == 2 two iterations back, if you saved the len value, you can calculate it as len (4) = 1 + len (2).

Using an array to store previously calculated len values ​​and hide a bit in the next method, you can complete the task> 50 times faster.

+2
source

The key to efficiently implementing multithreading is that the cost is not too high. There are no fixed rules, as they are highly dependent on your hardware.

Starting and stopping threads is expensive. Of course, you have already used the executor service, which significantly reduces these costs, because it uses a bunch of workflows to execute your Runnables. However, every runnable still has some overhead. Reducing the number of executable files and increasing the amount of work everyone needs to do will improve performance, but you still want to have enough executable files for the executing service to efficiently distribute them across workflows.

You decide to create one runnable for each initial value so that you end up with 1,000,000 runnables. You will probably get much better results so that each Runnable makes a batch of 1000 initial values. This means that you only need 1000 runnables, significantly reducing overhead.

+4
source

"Will the performance increase be greater than the cost of context switching and thread creation?"

This is a very OS, language and hardware, dependent cost; this question has some discussion of cost in Java, but has some numbers and some pointers on how to calculate cost.

You also want to have one thread per processor or less for working with the CPU. Thanks to David Harkness for the index on how to work out this number .

+2
source

Estimate the amount of work that a thread can do without interacting with other threads (directly or through shared data). If this part of the work can be completed in 1 microsecond or less, the overhead is too high and multithreading is useless. If it's 1 millisecond or more, multithreading should work well. If it is located between them, experimental testing is required.

+1
source

All Articles