Fast fuse near points in numpy-2d (no loop)

Question

Fast fuse near points in numpy-2d (no loop)

I have a question similar to the question asked here: an easy way to merge several close points . I want to replace the points located close to each other with the average value of their coordinates. The proximity in the cells is indicated by the user (I'm talking about the Euclidean distance).

In my case, I have a lot of points (about 1 million). This method works, but it takes a lot of time, because it uses a double for loop.

Is there a faster way to detect and smooth close points in a numpy 2d array?

To be complete, I added an example:

points=array([[ 382.49056159, 640.1731949 ], [ 496.44669161, 655.8583119 ], [ 1255.64762859, 672.99699399], [ 1070.16520917, 688.33538171], [ 318.89390168, 718.05989421], [ 259.7106383 , 822.2 ], [ 141.52574427, 28.68594436], [ 1061.13573287, 28.7094536 ], [ 820.57417943, 84.27702407], [ 806.71416007, 108.50307828]])

The dot scatter plot is shown below. The red circle indicates points that are close to each other (in this case, the distance is 27.91 between the last two points in the array). Therefore, if the user will indicate a minimum distance of 30, these points should be fused.

At the output of the fuse function, the last points merge. It will look like this:

 #output array([[ 382.49056159, 640.1731949 ], [ 496.44669161, 655.8583119 ], [ 1255.64762859, 672.99699399], [ 1070.16520917, 688.33538171], [ 318.89390168, 718.05989421], [ 259.7106383 , 822.2 ], [ 141.52574427, 28.68594436], [ 1061.13573287, 28.7094536 ], [ 813.64416975, 96.390051175]])

+5

python arrays numpy scipy distance

Wilmar van ommren May 02, '16 at 14:41

source share

2 answers

You can use scipy distance functions like pdist to quickly find which points should be combined:

 import numpy as np from scipy.spatial.distance import pdist, squareform d = squareform(pdist(a)) d = np.ma.array(d, mask=np.isclose(d, 0)) a[d.min(axis=1) < 30] #array([[ 820.57417943, 84.27702407], # [ 806.71416007, 108.50307828]])

Note

For large samples, this method can cause memory errors because it stores a complete matrix containing relative distances.

+1

Saullo castro May 02, '16 at 15:23

source share

ali_m · Accepted Answer · 2016-05-02T18:46:09+0000

If you have a lot of points, it might be faster to create a kD tree using scipy.spatial.cKDTree , then query it for pairs of points that are closer than some threshold values:

 import numpy as np from scipy.spatial import cKDTree tree = cKDTree(points) rows_to_fuse = tree.query_pairs(r=30) print(repr(rows_to_fuse)) # {(8, 9)} print(repr(points[list(rows_to_fuse)])) # array([[ 820.57417943, 84.27702407], # [ 806.71416007, 108.50307828]])

The main advantage of this approach is that you do not need to calculate the distance between each pair of points in your dataset.

Fast fuse near points in numpy-2d (no loop)

Note

More articles: