So basically, I needed to optimize this piece of code today. He tries to find the longest sequence created by some function for the first million start numbers:
public static void main(String[] args) { int mostLen = 0; int mostInt = 0; long currTime = System.currentTimeMillis(); for(int j=2; j<=1000000; j++) { long i = j; int len = 0; while((i=next(i)) != 1) { len++; } if(len > mostLen) { mostLen = len; mostInt = j; } } System.out.println(System.currentTimeMillis() - currTime); System.out.println("Most len is " + mostLen + " for " + mostInt); } static long next(long i) { if(i%2==0) { return i/2; } else { return i*3+1; } }
My mistake was to try to introduce multithreading:
void doSearch() throws ExecutionException, InterruptedException { final int numProc = Runtime.getRuntime().availableProcessors(); System.out.println("numProc = " + numProc); ExecutorService executor = Executors.newFixedThreadPool(numProc); long currTime = System.currentTimeMillis(); List<Future<ValueBean>> list = new ArrayList<Future<ValueBean>>(); for (int j = 2; j <= 1000000; j++) { MyCallable<ValueBean> worker = new MyCallable<ValueBean>(); worker.setBean(new ValueBean(j, 0)); Future<ValueBean> f = executor.submit(worker); list.add(f); } System.out.println(System.currentTimeMillis() - currTime); int mostLen = 0; int mostInt = 0; for (Future<ValueBean> f : list) { final int len = f.get().getLen(); if (len > mostLen) { mostLen = len; mostInt = f.get().getNum(); } } executor.shutdown(); System.out.println(System.currentTimeMillis() - currTime); System.out.println("Most len is " + mostLen + " for " + mostInt); } public class MyCallable<T> implements Callable<ValueBean> { public ValueBean bean; public void setBean(ValueBean bean) { this.bean = bean; } public ValueBean call() throws Exception { long i = bean.getNum(); int len = 0; while ((i = next(i)) != 1) { len++; } return new ValueBean(bean.getNum(), len); } } public class ValueBean { int num; int len; public ValueBean(int num, int len) { this.num = num; this.len = len; } public int getNum() { return num; } public int getLen() { return len; } } long next(long i) { if (i % 2 == 0) { return i / 2; } else { return i * 3 + 1; } }
Unfortunately, the multi-threaded version worked 5 times slower than the single-threaded version on 4 processors (cores).
Then I tried a slightly rougher approach:
static int mostLen = 0; static int mostInt = 0; synchronized static void updateIfMore(int len, int intgr) { if (len > mostLen) { mostLen = len; mostInt = intgr; } } public static void main(String[] args) throws InterruptedException { long currTime = System.currentTimeMillis(); final int numProc = Runtime.getRuntime().availableProcessors(); System.out.println("numProc = " + numProc); ExecutorService executor = Executors.newFixedThreadPool(numProc); for (int i = 2; i <= 1000000; i++) { final int j = i; executor.execute(new Runnable() { public void run() { long l = j; int len = 0; while ((l = next(l)) != 1) { len++; } updateIfMore(len, j); } }); } executor.shutdown(); executor.awaitTermination(30, TimeUnit.SECONDS); System.out.println(System.currentTimeMillis() - currTime); System.out.println("Most len is " + mostLen + " for " + mostInt); } static long next(long i) { if (i % 2 == 0) { return i / 2; } else { return i * 3 + 1; } }
and it worked much faster, but still it was slower than a single thread approach.
I hope this is not because I messed up the way I do multithreading, but this particular calculation / algorithm is not suitable for parallel computing. If I change the calculation to make it more intense with the processor, replacing the next method with:
long next(long i) { Random r = new Random(); for(int j=0; j<10; j++) { r.nextLong(); } if (i % 2 == 0) { return i / 2; } else { return i * 3 + 1; } }
both multi-threaded versions start working more than twice as fast as the single-threaded version on a 4-core computer.
So there should be a specific threshold that you can use to determine if multithreading is worth introducing, and my question is:
What is the main rule that will help to decide whether this calculation is sufficiently optimized for optimization by running it in parallel (without wasting effort on its implementation?)