I have a large sparse X matrix in scipy.sparse.csr_matrix format, and I would like to multiply it by a numpy W array using parallelism. After some research, I found that I need to use Array for multiprocessing to avoid copying X and W between processes (for example, here: How to combine Pool.map with an array (shared memory) in Python Multiprocessing? And Are the shared data read-only copied to different processes for Python multiprocessing? ). Here is my last attempt.
import multiprocessing import numpy import scipy.sparse import time def initProcess(data, indices, indptr, shape, Warr, Wshp): global XData global XIndices global XIntptr global Xshape XData = data XIndices = indices XIntptr = indptr Xshape = shape global WArray global WShape WArray = Warr WShape = Wshp def dot2(args): rowInds, i = args global XData global XIndices global XIntptr global Xshape data = numpy.frombuffer(XData, dtype=numpy.float) indices = numpy.frombuffer(XIndices, dtype=numpy.int32) indptr = numpy.frombuffer(XIntptr, dtype=numpy.int32) Xr = scipy.sparse.csr_matrix((data, indices, indptr), shape=Xshape) global WArray global WShape W = numpy.frombuffer(WArray, dtype=numpy.float).reshape(WShape) return Xr[rowInds[i]:rowInds[i+1], :].dot(W) def getMatmat(X): numJobs = multiprocessing.cpu_count() rowInds = numpy.array(numpy.linspace(0, X.shape[0], numJobs+1), numpy.int)
However, the output looks something like this: (4.431, 0.165), indicating that the parallel version is much slower than the non-parallel multiplication.
I believe that slowdown can be caused in similar situations when you copy big data to processes, but this is not the case here, since I use Array to store shared variables (if this does not happen in numpy.frombuffer or when creating csr_matrix, but then I could not find a way to share csr_matrix). Another possible reason for slow speed returns the large result of each matrix multiplication for each process, however I am not sure about that.
Can anyone see where I'm wrong? Thanks for any help!
Update: I cannot be sure, but I think that exchanging large amounts of data between processes is not so efficient, and ideally I should use multithreading (although Global Interpreter Lock (GIL) makes it very difficult). One way is to free the GIL using Cython, for example (see http://docs.cython.org/src/userguide/parallelism.html ), although many numpy functions must go through the GIL.
python scipy parallel-processing sparse-matrix
Charanpal
source share