Multithreading is no faster than a single thread (simple test loop)

I am experimenting with some multithreaded constructs, but for some reason it seems that multithreading is not faster than a single thread. I narrowed it down to a very simple test with a nested loop (1000x1000), in which the system only takes into account.
Below I published the code for both single streaming and multithreading, and how they are executed.
As a result, one thread completes a cycle of approximately 110 ms , while two threads also take about 112 ms .
I don't think the problem is the overhead of multithreading. If I send only one of both Runnables to a ThreadPoolExecutor, it runs in half the time of a single thread, which makes sense. But adding that the second Runnable makes it 10 times slower. Both 3.00 GHz cores operate at 100%.
I think this may be PC specific, as someone from the PC showed double speed results with multithreading. But then, what can I do about it? I have Intel Pentium 4 3.00 GHz (2 processors) and Java jre6.

Test code:

// Single thread: long start = System.nanoTime(); // Start timer final int[] i = new int[1]; // This is to keep the test fair (see below) int i = 0; for(int x=0; x<10000; x++) { for(int y=0; y<10000; y++) { i++; // Just counting... } } int i0[0] = i; long end = System.nanoTime(); // Stop timer 

This code runs in approximately 110 ms .

 // Two threads: start = System.nanoTime(); // Start timer // Two of the same kind of variables to count with as in the single thread. final int[] i1 = new int [1]; final int[] i2 = new int [1]; // First partial task (0-5000) Thread t1 = new Thread() { @Override public void run() { int i = 0; for(int x=0; x<5000; x++) for(int y=0; y<10000; y++) i++; i1[0] = i; } }; // Second partial task (5000-10000) Thread t2 = new Thread() { @Override public void run() { int i = 0; for(int x=5000; x<10000; x++) for(int y=0; y<10000; y++) i++; int i2[0] = i; } }; // Start threads t1.start(); t2.start(); // Wait for completion try{ t1.join(); t2.join(); }catch(Exception e){ e.printStackTrace(); } end = System.nanoTime(); // Stop timer 

This code executes in approximately 112 ms .

Edit: I replaced Runnables with Threads and got rid of ExecutorService (for simplicity of the problem).

Edit: tried some suggestions

+6
java performance multithreading intel multicore
source share
6 answers

You definitely do not want to continue polling Thread.isAlive() - it burns a lot of processor cycles for no good reason. Use Thread.join() .

Also, this is probably not a good idea when threads increment arrays of results directly, cache lines, and that’s it. Update local variables and perform one repository when performing calculations.

EDIT:

I haven’t completely noticed that you are using Pentium 4. As far as I know, there are no multi-core versions of P4 - to create the illusion of multi-core, it has Hyper-Threading : two logical cores share the execution units of one physical core. If your threads depend on the same execution units, your performance will be the same as (or worse than!) Single-threaded performance. You will need, for example, floating point calculations in one thread and integer calculations in another to improve performance.

The P4 HT implementation has been criticized a lot; newer implementations (latest Core2) should be better.

+11
source share

Try increasing the size of the array. No, really.

Small objects selected sequentially in one thread will, as a rule, be initially distributed sequentially. This is probably on the same cache line. If you have two cores, access to the same cache line (and then micro-benhcmark, in fact, simply executes a sequence of records to the same address), then they will have to fight for access.

There is a class in java.util.concurrent that contains a bunch of unused long fields. Their goal is to separate objects that can often be used by different threads on different lines of the cache.

+4
source share

I am not at all surprised by the difference. You use the Java concurrency framework to create your threads (although I don’t see any guarantee that the two threads will be created even after the first task completes before the second begins.

All kinds of locks and synchronizations are likely happening behind the scenes that you really don't need for your simple test. In short, I think the problem is with the overhead of multithreading.

+2
source share

You are not doing anything with i, so your loop is probably just optimized.

+1
source share

You checked the number of available cores on your PC using Runtime.getRuntime (). availableProcessors ()?

+1
source share

Your code just increments the variable - this is a very fast operation. You do not get much from using multiple threads here. The performance improvement is more pronounced when thread-1 needs to wait for some external response or perform some more complicated calculations, while your main thread or some other thread can continue processing and is not delayed. You may seem more advantageous if you count more or use more threads (probably the safe number is the number of CPU / cores on your computer).

0
source share

All Articles