Curve distance matrix in python

I have a set of curves defined as 2D arrays (number of points, number of coordinates). I calculate the distance matrix for them using the Hausdorff distance. My current code is as follows. Unfortunately, it is too slow with 500-600 curves, each of which has 50-100 3D points. Is there a faster way to do this?

def distanceBetweenCurves(C1, C2): D = scipy.spatial.distance.cdist(C1, C2, 'euclidean') #none symmetric Hausdorff distances H1 = np.max(np.min(D, axis=1)) H2 = np.max(np.min(D, axis=0)) return (H1 + H2) / 2. def distanceMatrixOfCurves(Curves): numC = len(Curves) D = np.zeros((numC, numC)) for i in range(0, numC-1): for j in range(i+1, numC): D[i, j] = D[j, i] = distanceBetweenCurves(Curves[i], Curves[j]) return D 
+7
source share
3 answers

Your question may also be related to this.

This is a difficult problem. A possible way would be to implement the Euclidean distance yourself, completely abandon scipy and use the pypy JIT compiler. But, most likely, this will not make you think hard.

Personally, I would recommend you write a procedure in C.

The problem is not so much in implementation, but in how you approach this problem. You chose the brute force method by calculating the Euclidean distance for each individual pair of points in each possible pair of metric spatial subsets. This requires computational requirements:

  • Suppose you have 500 curves, and each of them has 75 points. With brute force approach, you end up calculating the Euclidean distance 500 * 499 * 75 * 75 = 1,403,437,500 times. No wonder this approach runs forever.

I am not an expert in this, but I know that the Hausdorff distance is widely used in image processing. I suggest you familiarize yourself with the literature for algorithms with speed optimization. The starting point may be this , or this document, also often referred to in conjunction with the Hausdorff distance Voroni diagram .

I hope these links can help you with this problem.

+5
source

I recently answered a similar quiestion here: Hausdorff distance between three-dimensional grids

I hope this helps, I came across 25 x 25 000 points in pairwise comparison (25 x 25 x 25 000 points in general), and my code works from 1 minute to 3-4 hours (depending on the number of points). I do not see many options mathematically for speed.

An alternative could be to use different programming languages ​​(C / C ++) or port these calculations to the GPU (CUDA). Now I am playing with the CUDA approach.

Edit 12/03/2015:

I was able to speed up this comparison by doing parallel CPU-based computing. It was the fastest way. I used a good example of the pp package ( parallel python ) and I am running three different computers and a combination of phython. Unfortunately, I had memory errors all the time with 32-bit python 2.7, so I installed 64-bit WinPython 2.7 and some experimental 64-bit packages with a numeric value.

enter image description here

So, it was very useful for me, and it was not as difficult for me as CUDA .... Good luck.

+2
source

There are several ways you can try:

  • Using numpy-MKL, which uses Intel's high-performance library instead of numpy instead of Intel's high-performance math library;
  • Using Bootleneck for array functions;
  • Using Cpython to calculate.
0
source

All Articles