I assume that you can load the entire dataset into RAM into a numpy array and work with Linux or Mac. (If you are on Windows or you cannot put the array into RAM, then you probably should copy the array to a file on disk and use numpy.memmap to access it. Your computer will cache the data from the disk into RAM, and it may also be that these caches will be shared between processes, so this is not a terrible decision.)
According to the above assumptions, if you need read-only access to a dataset in other processes created using multiprocessing , you can simply create a dataset and then start other processes. They will have read-only access to data from the original namespace. They can modify data from the original namespace, but these changes will not be visible to other processes (the memory manager will copy each memory segment that they change to the local memory card).
If your other processes need to modify the original dataset and make these changes visible to the parent process or other processes, you can use something like this:
import multiprocessing import numpy as np
source share