You might need to be a little careful in assuming your code is actually being used by multithreaded BLAS calls. Relatively few numpy statements actually use basic BLAS, and relatively few BLAS calls are actually multi-threaded. numpy.dotuses BLAS dot, gemvor gemm, depending on the operation, but only gemmusually multithreaded of them , because rarely is there any performance advantage for O (N) and O (N ^ 2) BLAS causes this. If you limit yourself to BLAS level 1 and level 2 actions, I doubt that you are actually using multi-threaded BLAS calls, even if you are using a numpy implementation built using Mulithreaded BLAS like Atlas or MKL.