I am developing a small neural network, the parameters of which require a lot of optimization, so a lot of processing time. I profiled my script using cProfile , and 80% of the processor time is the NumPy dot function, the rest is the inverse of the matrix using the numpy.linalg.solve function. My current version of numpy uses blas , or this is what it seems, since numpy.core._dotblas.dot appears as a function that takes up 80% of the total processing time.
As the core of my neural network, and since I need to work hard, any small increase in speed can save a lot of time on numerous repeated optimization parameters.
More accuracy: matrix multiplication is on matrices that have a minimum length of 100 * 100 to 500 * 500. I have a computer with 12 cores and still use them to parallelly optimize various parameters of neural networks, but maybe matrix multiplication can run in parallel?
Thank you for your time!
Answer:
I spent several days testing and installing uninstall libraries ... Here is the result of what I tested: By default, in my version of Ubuntu (12.04) and the installed version of the Numpy repository, BLAS libraries are ATLAS libraries. I have done several tests that reflect the improvement, ESPECIALLY with respect to the computations that interest me, so these results should not be interpreted as the final answer. These calculations include matrix multiplication (point product) in a loop with iterations of 55,000 with a matrix of 500 * 500 and 1000 * 1000. I am using an HP Z800 workstation with Xeon X5675 @ 3.07GHZ with 12 cores. All results (in percent) are a comparison between the described condition and the link, which is a packaged ATLAS library.
Scipy.sparse module : I donโt know if I installed it correctly, but with 10% sparsity, using this module, it becomes useful, starting with 1500 * 1500 matrices with OpenBLAS and MKL. If you have a suggestion on how to use them correctly, I'm interested!- With OpenBlas I get a speed increase of 33% for 500 * 500 matrices, but 160% for 1000 * 1000. But with OpenBLAS, the scipy.sparse module does not work better, but actually worse.
- The big winner here are the MKL libraries. Acceleration goes up to 230% with 1000 * 1000 matrices from the original ATLAS libraries! For matrices 500 * 500, the acceleration is more modest (100%), but still very good. In addition, with compilation with OpenMP, matrix multiplication can work on my 12 processors, and here it is twice as fast as on a single processor with MKL libraries. But this is a waste of computing power, it is much more efficient to use multiprocessor modules to run scripts / matrix multiplications in parallel.
optimization python numpy parallel-processing blas
PierreE
source share