I am trying to run a loop in python to work as fast as possible. So I dived into NumPy and Cython. Here is the Python source code:
def calculate_bsf_u_loop(uvel,dy,dz): """ Calculate barotropic stream function from zonal velocity uvel (t,z,y,x) dy (y,x) dz (t,z,y,x) bsf (t,y,x) """ nt = uvel.shape[0] nz = uvel.shape[1] ny = uvel.shape[2] nx = uvel.shape[3] bsf = np.zeros((nt,ny,nx)) for jn in range(0,nt): for jk in range(0,nz): for jj in range(0,ny): for ji in range(0,nx): bsf[jn,jj,ji] = bsf[jn,jj,ji] + uvel[jn,jk,jj,ji] * dz[jn,jk,jj,ji] * dy[jj,ji] return bsf
This is just the sum of the k indices. Array sizes: nt = 12, nz = 75, ny = 559, nx = 1442, so ~ 725 million elements. It took 68 seconds. Now I did it in cython as
import numpy as np cimport numpy as np cimport cython @cython.boundscheck(False)
and it took 49 seconds. However, replacing the loop for
for jn in range(0,nt): for jk in range(0,nz): bsf[jn,:,:] = bsf[jn,:,:] + uvel[jn,jk,:,:] * dz[jn,jk,:,:] * dy[:,:]
takes only 0.29 seconds! Unfortunately, I cannot do this in my full code.
Why does NumPy cut much faster than the Cython loop? I thought NumPy was fast because it is Cython under the hood. Shouldn't they be the same speed?
As you can see, I turned off border checking in the ziton, and I also compiled using โfast mathโ. However, this gives only slight acceleration. Is there anyway that the loop has the same speed as slicing NumPy, or is the loop always slower than slicing?
Any help is much appreciated! / Joachim