In my project, I need to calculate the Euclidean distance between each point stored in an array. The write array is a 2D numpy array with three columns that are the coordinates (x, y, z), and each row defines a new point.
I usually work with 5000-6000 points in tests.
My first algorithm uses Cython and my second numpy. I found that my numpy algorithm is faster than cython.
edit: with 6000 points:
numpy 1.76 s / cython 4.36 s
Here is my cython code:
cimport cython from libc.math cimport sqrt @cython.boundscheck(False) @cython.wraparound(False) cdef void calcul1(double[::1] M,double[::1] R): cdef int i=0 cdef int max = M.shape[0] cdef int x,y cdef int start = 1 for x in range(0,max,3): for y in range(start,max,3): R[i]= sqrt((M[y] - M[x])**2 + (M[y+1] - M[x+1])**2 + (M[y+2] - M[x+2])**2) i+=1 start += 1
M is the memory representation of the initial array of records, but flatten() numpy before calling the function calcul1() , R is the type of memory 1D of the output array to store all the results.
Here is my phone code:
def calcul2(M): return np.sqrt(((M[:,:,np.newaxis] - M[:,np.newaxis,:])**2).sum(axis=0))
Here M is the initial array of records, and transpose() is numpy, before the function call must have coordinates (x, y, z) in the form of rows and dots in the form of columns.
Also, this numpy function is pretty handy because the array that it returns is well organized. This is an n array of n with n number of points, and each point has a row and column. So, for example, the distance AB is stored in the intersection index of row A and column B.
This is what I call them (cython function):
cpdef test(): cdef double[::1] Mf cdef double[::1] out = np.empty(17998000,dtype=np.float64) # (6000ยฒ - 6000) / 2 M = np.arange(6000*3,dtype=np.float64).reshape(6000,3) # Example array with 6000 points Mf = M.flatten() #because my cython algorithm need a 1D array Mt = M.transpose() # because my numpy algorithm need coordinates as rows calcul2(Mt) calcul1(Mf,out)
Am I doing something wrong here? For my project, both are not fast enough.
1: Is there a way to improve my cython code to outperform numpy speed?
2: Is there a way to improve my numpy code to calculate even faster?
3: Or any other solutions, but should it be python / cython (e.g. parallel computing)?
Thanks.