I study performance when scaling Java code on many processors. To do this, I wrote a simple program that uses 50,000 Fibonacci on one thread, then 2 * 50,000 on two threads, 3 * 50,000 on three threads, and so on, until the number of CPUs of the target node is reached.
Here is my code:
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class MultiThreadScalability {
static final int MAX_THREADS = 4;
static final int NB_RUN_PER_THREAD = 50000;
static final int FIBO_VALUE = 25;
public static void main(String[] args) {
MultiThreadScalability multiThreadScalability = new MultiThreadScalability();
multiThreadScalability.runTest();
}
private void runTest() {
int availableProcs = Runtime.getRuntime().availableProcessors();
System.out.println(availableProcs + " processors available");
for (int i = 1 ; i <= availableProcs ; i++) {
System.out.println("Running scalability test for " + i + " threads");
long timeInMillisecs = runTestForThreads(i);
System.out.println("=> " + timeInMillisecs + " milli-seconds");
}
}
private long runTestForThreads(int threadsNumber) {
final int nbRun = NB_RUN_PER_THREAD * threadsNumber;
ExecutorService executor = Executors.newFixedThreadPool(threadsNumber);
long startTime = System.currentTimeMillis();
for (int i = 0 ; i < nbRun ; i++) {
Runnable worker = new Runnable()
{
public void run()
{
fibo(FIBO_VALUE);
}
};
executor.execute(worker);
}
executor.shutdown();
while (!executor.isTerminated())
{}
return (System.currentTimeMillis() - startTime);
}
private static long fibo(int n) {
if (n < 2) {
return (n);
}
return (fibo(n - 1) + fibo(n - 2));
}
}
Under these conditions, I expected that - regardless of the number of threads - the runtime remains constant.
I ran it on an all-wheel drive car, and I had the following output:
48 processors available
Running scalability test for 1 threads
=> 34199 milli-seconds
Running scalability test for 2 threads
=> 34141 milli-seconds
Running scalability test for 3 threads
=> 34009 milli-seconds
Running scalability test for 4 threads
=> 34000 milli-seconds
Running scalability test for 5 threads
=> 34034 milli-seconds
Running scalability test for 6 threads
=> 34086 milli-seconds
Running scalability test for 7 threads
=> 34094 milli-seconds
Running scalability test for 8 threads
=> 34673 milli-seconds
Running scalability test for 9 threads
=> 35297 milli-seconds
Running scalability test for 10 threads
=> 35486 milli-seconds
Running scalability test for 11 threads
=> 35913 milli-seconds
Running scalability test for 12 threads
=> 36324 milli-seconds
Running scalability test for 13 threads
=> 35722 milli-seconds
Running scalability test for 14 threads
=> 35750 milli-seconds
Running scalability test for 15 threads
=> 35634 milli-seconds
Running scalability test for 16 threads
=> 35970 milli-seconds
Running scalability test for 17 threads
=> 37914 milli-seconds
Running scalability test for 18 threads
=> 36560 milli-seconds
Running scalability test for 19 threads
=> 36720 milli-seconds
Running scalability test for 20 threads
=> 37028 milli-seconds
Running scalability test for 21 threads
=> 37381 milli-seconds
Running scalability test for 22 threads
=> 37529 milli-seconds
Running scalability test for 23 threads
=> 37632 milli-seconds
Running scalability test for 24 threads
=> 39942 milli-seconds
Running scalability test for 25 threads
=> 40090 milli-seconds
Running scalability test for 26 threads
=> 41238 milli-seconds
Running scalability test for 27 threads
=> 42336 milli-seconds
Running scalability test for 28 threads
=> 43377 milli-seconds
Running scalability test for 29 threads
=> 44394 milli-seconds
Running scalability test for 30 threads
=> 46245 milli-seconds
Running scalability test for 31 threads
=> 45928 milli-seconds
Running scalability test for 32 threads
=> 47490 milli-seconds
Running scalability test for 33 threads
=> 47674 milli-seconds
Running scalability test for 34 threads
=> 48775 milli-seconds
Running scalability test for 35 threads
=> 56456 milli-seconds
Running scalability test for 36 threads
=> 50557 milli-seconds
Running scalability test for 37 threads
=> 51393 milli-seconds
Running scalability test for 38 threads
=> 52971 milli-seconds
Running scalability test for 39 threads
=> 53077 milli-seconds
Running scalability test for 40 threads
=> 54015 milli-seconds
Running scalability test for 41 threads
=> 55924 milli-seconds
Running scalability test for 42 threads
=> 55560 milli-seconds
Running scalability test for 43 threads
=> 56554 milli-seconds
Running scalability test for 44 threads
=> 57073 milli-seconds
Running scalability test for 45 threads
=> 65193 milli-seconds
Running scalability test for 46 threads
=> 58549 milli-seconds
Running scalability test for 47 threads
=> 59302 milli-seconds
Running scalability test for 48 threads
=> 60662 milli-seconds
Time stays up to almost 24 threads. It gets slower and slower. You can see it on this graph.
I ask for help to understand why such a “gap” occurs.
, : , , :
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 46
model name : Intel(R) Xeon(R) CPU E7540 @ 2.00GHz
stepping : 6
cpu MHz : 1997.885
cache size : 18432 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 6
apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat p
se36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc id
a nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lah
f_lm
bogomips : 3995.77
clflush size : 64
cache_alignment : 64
address sizes : 44 bits physical, 48 bits virtual
power management: [8]
, 6. Runtime.getRuntime(). availableProcessors() pysical CPU, "": 48
, "", 24 ?