I have my own multithreaded C program that scales seamlessly with the number of processor cores. I can run it with threads 1, 2, 3, etc. and get linear acceleration. Speed up to 5.5 times per 6-core processor in a Ubuntu Linux box.
I had the opportunity to run the program at a very high level Sunfire x4450 with four quad-core Xeon processors running on Red Hat Enterprise Linux. I was looking forward to seeing how quickly 16 cores can run my 16-thread program. But it works at the same speed as only two threads!
A lot of hair pulling and debugging later, I see that my program really creates all the threads, they really work at the same time, but the threads themselves are slower than they should be. 2 threads work about 1.7 times faster than 1, but 3, 4, 8, 10, 16 threads work only 1.9 times! I see that all threads are running (not closed or not sleeping), they are just slow.
To verify that the EQUIPMENT was not at fault, I simultaneously ran SIXTEEN copies of my program on my own. They all ran at full speed. There really are 16 cores, and they really work at full speed, and actually enough RAM (in fact, this device has 64 GB, and I use only 1 GB per process).
So my question is, is there some OPERATIONAL SYSTEM explanation, maybe some restriction for each process that automatically reduces thread scheduling so that one process does not hang.
Tips:
- My program does not access the disk or network. It is limited by the processor. Its speed scales linearly to a single processor box in Ubuntu Linux with i7 hexacore for 1-6 threads. 6 threads efficiently 6x speedup.
- My program never runs faster than 2x acceleration on this 16-core Sunfire Xeon, for any number of threads from 2-16.
- Starting 16 copies of my software single-threaded launch is excellent, all 16 running at once at full speed.
- top shows 1600% Processors allocated. / proc / cpuinfo shows all 16 cores with a full frequency of 2.9 GHz speed (no low idle frequency of 1.6 GHz)
- 48 , .
? - ? , ?
?
, , Xeon 2010 !