How good is the Linux kernel in the new quad-core processors using a multi-threaded application

Is there anyone here with experience in the linux thread scheduler that works with multi-threaded applications on new quad processors? If there is someone like this, can you write your experience here about how kernel performance manages different threads? Have you experienced any flow of hunger or hunger from one of the cores?

Thanks.

+4
source share
4 answers

Given that kernel developers, such as Christoph Lameter (and Ingo Molnar in the scheduler), tuned the kernel to work well on 4096 processors and gave the number of optimizations that Intel itself put into the problem, with multi-mode tuning for both performance and power saving, I'm sure That the kernel is much more optimized than anything that any of us can write in user space.

The same goes for the thread library; currently there is only one thread library, NPTL for Linux 2.6. LinuxThreads was removed from glibc in version 2.4, and NPTL was released before version 2.6. And it is very fast.

Just make sure you are not using the old kernel, the latest version of your distribution or kernel.org is the best. Before deploying in production, make sure you measure the difference in productivity and consider whether the additional support costs (if any) are worth it.

+7
source

Linux uses many processors very well. If I remember SMP correctly, Linux supports 4096 processors. What really matters is whether your applications are recorded to take advantage of multiple processors.

+5
source

It works very well on the dual quad system (V8) we have in production ... bloody fast.

But be very careful about Linux's tendency to starve threads when locks (mutexes) are strongly disputed. Imagine a scenario where 10 threads work with one lock, where blocking is required very often, but for very short periods of time, and work performed outside the lock at any given point is less time. Linux will very often deliver a lock almost always in one thread, excluding all others.

It depends on the specific streaming package associated with the kernel — I believe there are several.

+1
source

I got absolutely stunning results on our Intel Q6600, both for parallel applications and for some other parallel applications, but I tried to avoid excessive parallelism: I usually developed from four to eight threads, so there is not too much debate. If you have a lot of threads, you will have noticeable overhead, especially if they fight for the same semaphores. I assume that thousands of threads are probably too many, and dozens of threads are probably all right. But this is just a hunch; if you want to know, you will have to find someone who has measured, or you will have to conduct an experiment yourself.

But for a dozen streams, our results were incredible.

+1
source

All Articles