Given that kernel developers, such as Christoph Lameter (and Ingo Molnar in the scheduler), tuned the kernel to work well on 4096 processors and gave the number of optimizations that Intel itself put into the problem, with multi-mode tuning for both performance and power saving, I'm sure That the kernel is much more optimized than anything that any of us can write in user space.
The same goes for the thread library; currently there is only one thread library, NPTL for Linux 2.6. LinuxThreads was removed from glibc in version 2.4, and NPTL was released before version 2.6. And it is very fast.
Just make sure you are not using the old kernel, the latest version of your distribution or kernel.org is the best. Before deploying in production, make sure you measure the difference in productivity and consider whether the additional support costs (if any) are worth it.
source share