When I have data structures for each processor, does their performance on different pages improve performance?

I have a small data structure for each processor in the Linux kernel module, where each processor often writes and reads its own data. I know that I need to make sure that these data elements are not in the same cache line, because if they were then, the kernels would forever pollute each other. However, is there anything at the page level that I need to worry about in terms of SMP performance? i.e. will there be any performance impact from filling these structures with one processor up to 4096 bytes and their alignment?

This is on Linux 2.6 on x86_64.

(About whether it is worth optimizing, and the proposals that I make are not needed, what I'm looking for, is there a theoretical basis for concern about page alignment).

+6
optimization with memory-management linux-kernel
source share
4 answers

Within a single NUMA node, different pages are useful if you want to apply different permissions or map them individually to processes. For performance problems, it’s enough to be in different caches.

In NUMA architectures, you might want to place the central part of the processor on a page that is local to that CPU node, but you still won’t place the structure to the page size to achieve this, because you can place structures for several processors within the same NUMA node on the same page.

+5
source share

Even on a NUMA system, you probably won’t gain much by allocating local memory pages for each processor (use kmalloc_node() if you're interested).

Node -local memory will be faster, but only if it skips at all levels of the cache. For everything that is used at any frequency, you probably cannot tell the difference. If you allocate megabytes of local cpu data, then it probably makes sense to allocate pages locally for each processor.

+2
source share

Ok, I read a bit about linux with NUMA these days. In NUMA configuration, it would be useful if the data for each CPU was located on a page that is local to that CPU.

0
source share

percpu usually ensures that they do not share the cache line. Otherwise, commits, such as 7489aec8eed4f2f1eb3b4d35763bd3ea30b32ef5, would be useless.

0
source share

All Articles