Since one of Python's Achilles spots is GIL, it is better to support multiprocessing. For example, there are queues, channels, locks, shared values, and shared arrays. There is also something called a manager that allows you to wrap many Python data structures and share them in an IPC-friendly way. I believe that most of them work through pipes or sockets, but I did not delve too deeply into the internal parts.
http://docs.python.org/2/library/multiprocessing.html
How does Linux model NUMA systems?
The kernel discovers that it runs on a multi-core machine, and then discovers how much equipment is there and what the topology is. He then creates a model of this topology using the idea of nodes. A Node is a physical socket that contains a CPU (possibly with multiple cores) and memory attached to it. Why is Node used instead of the base? Since the memory bus is the physical wires that connect the RAM to the processor socket, and all the cores on the processor in the same socket will have the same access time to the entire RAM that is on this memory bus.
How is the memory on one memory bus accessed by the kernel on the other memory bus?
On x86 systems, this happens through caches. Modern operating systems use hardware called the Translation Lookaside (TLB) buffer to map virtual addresses to physical addresses. If the memory that the cache was programmed is local, it is read locally. If it is not local, it will transfer to the Hyper Transport bus on AMD systems or QuickPath on Intel to remote memory. Since this is done at the cache level, you theoretically need not know about it. And you, of course, have no control over this. But for high-performance applications, it’s incredibly useful to understand in order to minimize the number of remote accesses.
Where does the OS actually find the physical pages of virtual memory?
When a process forks, it inherits all the parent pages (due to COW). The kernel has the idea that Node is the "best" for a process that is a "preferred" node. This can be changed, but by default it will be the same as the parent by default. When allocating memory, by default the same Node will be used as the parent if it has not been explicitly changed.
Is there a transparent process that moves memory?
No. When a memory allocation is allocated, it is committed to the Node to which it was allocated. You can create a new selection on another node, transfer the data and release it on the first node, but this is a bit more complicated.
Is there a way to control the distribution?
By default, the local node is used. If you use libnuma, you can change the allocation method (say round-robin or interleave) instead of using the local one by default.
I got a lot of information from this blog post:
http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
I would definitely recommend you read it in its entirety to get more information.