Why is my multi-threaded sorting algorithm no faster than my single-threaded merge

There are certain algorithms whose work time can be significantly reduced when you share a task and each of them performs in parallel. One of these algorithms is merge sort, where the list is divided into infinitely smaller parts, and then recombined in sorted order. I decided to conduct an experiment to check if I can increase the speed of this type using multiple threads. I perform the following functions in Java on a Dell quad-core processor with Windows Vista.

One function (control register) is simply recursive:

// x is an array of N elements in random order public int[] mergeSort(int[] x) { if (x.length == 1) return x; // Dividing the array in half int[] a = new int[x.length/2]; int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)]; for(int i = 0; i < x.length/2; i++) a[i] = x[i]; for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) b[i] = x[i+x.length/2]; // Sending them off to continue being divided mergeSort(a); mergeSort(b); // Recombining the two arrays int ia = 0, ib = 0, i = 0; while(ia != a.length || ib != b.length) { if (ia == a.length) { x[i] = b[ib]; ib++; } else if (ib == b.length) { x[i] = a[ia]; ia++; } else if (a[ia] < b[ib]) { x[i] = a[ia]; ia++; } else { x[i] = b[ib]; ib++; } i++; } return x; } 

The other is in the "run" function of the class, which extends the stream and recursively creates two new threads each time it is called:

 public class Merger extends Thread { int[] x; boolean finished; public Merger(int[] x) { this.x = x; } public void run() { if (x.length == 1) { finished = true; return; } // Divide the array in half int[] a = new int[x.length/2]; int[] b = new int[x.length/2+((x.length%2 == 1)?1:0)]; for(int i = 0; i < x.length/2; i++) a[i] = x[i]; for(int i = 0; i < x.length/2+((x.length%2 == 1)?1:0); i++) b[i] = x[i+x.length/2]; // Begin two threads to continue to divide the array Merger ma = new Merger(a); ma.run(); Merger mb = new Merger(b); mb.run(); // Wait for the two other threads to finish while(!ma.finished || !mb.finished) ; // Recombine the two arrays int ia = 0, ib = 0, i = 0; while(ia != a.length || ib != b.length) { if (ia == a.length) { x[i] = b[ib]; ib++; } else if (ib == b.length) { x[i] = a[ia]; ia++; } else if (a[ia] < b[ib]) { x[i] = a[ia]; ia++; } else { x[i] = b[ib]; ib++; } i++; } finished = true; } } 

It turns out that a function that does not use multithreading is faster. What for? Is the operating system and the Java virtual machine working efficiently enough to host different threads on different cores? Or am I missing something obvious?

+5
java multithreading parallel-processing
source share
4 answers

The problem is not multithreading: I wrote correctly multithreaded QuickSort in Java, and it owns standard Java sorting. I did this when I saw the giant dataset being a process, and had only one core of a 16-core machine.

One of your problems (huge) is that you are busy with the loop:

  // Wait for the two other threads to finish while(!ma.finished || !mb.finished) ; 

This is HUGE no-no: it is called a busy loop, and you destroy perfs.

(Another problem is that your code does not create new threads, as already indicated to you)

You need to use a different synchronization method: an example would be to use a CountDownLatch .

Another thing: there is no need to create two new threads when dividing the workload: generate only one new thread and do the second half of the current thread.

In addition, you probably do not want to create more threads than existing materials.

See my question here (asking for good open-source multi-threaded merges / quicksort / whatever). The one I use is property, I cannot insert it.

Multi-user sorting or merging

I did not implement Mergesort, but QuickSort, and I can tell you that copying arrays does not occur.

What am I doing:

  • choose a fulcrum
  • exchange values ​​as needed
  • Have we reached the limit of flow? (depending on the number of cores)
    • yes: sort the first part in this thread
    • no: create a new thread
  • sort the second part in the current topic
  • wait for the first part to complete, if it has not already been completed (using CountDownLatch).

The code creating the new thread and creating the CountDownLatch might look like this:

  final CountDownLatch cdl = new CountDownLatch( 1 ); final Thread t = new Thread( new Runnable() { public void run() { quicksort(a, i+1, r ); cdl.countDown(); } } }; 

The advantage of using synchronization tools such as CountDownLatch is that it is very efficient and that you do not waste time working with the low-level Java synchronization idiosync.

In your case, “split” might look like this (untested, this just gives an idea):

 if ( threads.getAndIncrement() < 4 ) { final CountDownLatch innerLatch = new CountDownLatch( 1 ); final Thread t = new Merger( innerLatch, b ); t.start(); mergeSort( a ); while ( innerLatch.getCount() > 0 ) { try { innerLatch.await( 1000, TimeUnit.SECONDS ); } catch ( InterruptedException e ) { // Up to you to decide what to do here } } } else { mergeSort( a ); mergeSort( b ); } 

(don’t forget to “count” the latches at each merger)

If you replaced the number of threads (up to 4 here) with the number of available cores. You can use the following (once, say, initialize some static variable at the beginning of your program: the number of cores is unlikely to change [unless you are on a machine that allows the CPU to be pumped out, as some Sun systems do):

 Runtime.getRuntime().availableProcessors() 
+12
source share

As others have said; This code will not work because it does not start new threads. You need to call the start () method instead of the run () method to create new threads. It also has concurrency errors: checks on the finished variable are not thread safe.

Parallel programming can be quite difficult if you do not understand the basics. You can read the Java concurrency book on the practice of Brian Goetz . He explains the basics and explains the constructs (e.g. Latch, etc.) to facilitate the creation of parallel programs.

+3
source share

Synchronization overhead can be relatively large and prevent many optimizations.

In addition, you create too many threads.

The other is in the "run" function of the class, which extends the thread, and recursively creates two new threads each time it is called .

You would be better off with a fixed number of threads, presumably 4 on the quad core. This can be implemented using a thread pool ( tutorial ), and the template will be a "bag of tasks." But, perhaps, it would be even better to first divide the task into four equally large tasks and do a “single-threaded” sorting by these tasks. This will allow you to use caches much better.


Instead of the "busy loop" waiting for threads to finish (stealing processor cycles), you should take a look at Thread.join() .

+1
source share

How many elements in the array do you need to sort? If there are too few elements, the time for synchronizing and switching the CPU will exceed the time that you save to divide the task into parallel

0
source share

All Articles