NumPy: executing a function on each ndarray element

Question

NumPy: executing a function on each ndarray element

I have a three-dimensional ndarray of two-dimensional coordinates, for example:

[[[1704 1240] [1745 1244] [1972 1290] [2129 1395] [1989 1332]] [[1712 1246] [1750 1246] [1964 1286] [2138 1399] [1989 1333]] [[1721 1249] [1756 1249] [1955 1283] [2145 1399] [1990 1333]]]

The ultimate goal is to remove the point closest to the given point ([1989, 1332]) from each “group” of 5 coordinates. My thought was to create an array of distances of a similar shape, and then use argmin to determine the indices of the values to be removed. However, I'm not sure how to use a function, such as calculating the distance to a given point, for each element in ndarray, at least with NumPythonic.

+4

python arrays numpy multidimensional-array

OneTrickyPony Jun 15 '12 at 23:21

source share

3 answers

If I understand your question correctly, I think you're looking for apply_along_axis . Using numpy built-in broadcast, we can simply subtract the point from the array:

 >>> a - numpy.array([1989, 1332]) array([[[-285, -92], [-244, -88], [ -17, -42], [ 140, 63], [ 0, 0]], [[-277, -86], [-239, -86], [ -25, -46], [ 149, 67], [ 0, 1]], [[-268, -83], [-233, -83], [ -34, -49], [ 156, 67], [ 1, 1]]])

Then we can apply numpy.linalg.norm to it:

 >>> dist = a - numpy.array([1989, 1332]) >>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist) array([[ 299.48121811, 259.38388539, 45.31004304, 153.5219854 , 0. ], [ 290.04310025, 254.0019685 , 52.35456045, 163.37074401, 1. ], [ 280.55837182, 247.34186868, 59.6405902 , 169.77926846, 1.41421356]])

Finally, some logical outline of the mask, as well as a few reshape calls:

 >>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2)) array([[[1704, 1240], [1745, 1244], [1972, 1290], [2129, 1395]], [[1712, 1246], [1750, 1246], [1964, 1286], [2138, 1399]], [[1721, 1249], [1756, 1249], [1955, 1283], [2145, 1399]]])

Joe Kington answers faster. Well. I will leave this for posterity.

 def joes(data, point): dist = data.reshape((-1,2)) - point dist = np.hypot(*dist.T) dist = dist.reshape(data.shape[0], data.shape[1], 1) mask = np.squeeze(dist) != dist.min(axis=1) return data[mask].reshape((3, 4, 2)) def mine(a, point): dist = a - point normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist) return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2)) >>> %timeit mine(data, point) 1000 loops, best of 3: 586 us per loop >>> %timeit joes(data, point) 10000 loops, best of 3: 48.9 us per loop

+1

senderle Jun 16 '12 at 2:20

source share

There are several ways to do this, but here is one way to use lists:

Distance function:

 In [35]: from numpy.linalg import norm In [36]: dist = lambda x,y:norm(xy)

Input data:

 In [39]: GivenMatrix = scipy.rand(3, 5, 2) In [40]: GivenMatrix Out[40]: array([[[ 0.83798666, 0.90294439], [ 0.8706959 , 0.88397176], [ 0.91879085, 0.93512921], [ 0.15989245, 0.57311869], [ 0.82896003, 0.53589968]], [[ 0.0207089 , 0.9521768 ], [ 0.94523963, 0.31079109], [ 0.41929482, 0.88559614], [ 0.87885236, 0.45227422], [ 0.58365369, 0.62095507]], [[ 0.14757177, 0.86101539], [ 0.58081214, 0.12632764], [ 0.89958321, 0.73660852], [ 0.3408943 , 0.45420989], [ 0.42656333, 0.42770216]]]) In [41]: q = scipy.rand(2) In [42]: q Out[42]: array([ 0.03280889, 0.71057403])

Calculate output distances:

 In [44]: distances = [[dist(x, q) for x in SubMatrix] for SubMatrix in GivenMatrix] In [45]: distances Out[45]: [[0.82783910695733931, 0.85564093542511577, 0.91399620574915652, 0.18720096539588818, 0.81508758596405939], [0.24190557184498068, 0.99617079746515047, 0.42426891258164884, 0.88459501973012633, 0.55808740166908177], [0.18921712490174292, 0.80103146210692744, 0.86716521557255788, 0.40079819635686459, 0.48482888965287363]]

To rank the results for each submatrix:

 In [46]: scipy.argsort(distances) Out[46]: array([[3, 4, 0, 1, 2], [0, 2, 4, 3, 1], [0, 3, 4, 1, 2]])

Regarding deletion, I personally find it easiest to convert GivenMatrix to list and then using del :

 >>> GivenList = GivenMatrix.tolist() >>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix

0

Steve tjoa Jun 15 '12 at 23:51

source share

Joe kington · Accepted Answer · 2012-06-16T02:04:19+0000

List enumeration is a very inefficient way to work with numpy arrays. They are a particularly poor choice for distance calculation.

To find the difference between your data and a point, you simply execute data - point . Then you can calculate the distance using np.hypot , or if you want, put a square, sum it up and take the square root.

This is a little easier if you make it an Nx2 array for calculation purposes.

Basically, you want something like this:

 import numpy as np data = np.array([[[1704, 1240], [1745, 1244], [1972, 1290], [2129, 1395], [1989, 1332]], [[1712, 1246], [1750, 1246], [1964, 1286], [2138, 1399], [1989, 1333]], [[1721, 1249], [1756, 1249], [1955, 1283], [2145, 1399], [1990, 1333]]]) point = [1989, 1332] #-- Calculate distance ------------ # The reshape is to make it a single, Nx2 array to make calling `hypot` easier dist = data.reshape((-1,2)) - point dist = np.hypot(*dist.T) # We can then reshape it back to AxBx1 array, similar to the original shape dist = dist.reshape(data.shape[0], data.shape[1], 1) print dist

This gives:

 array([[[ 299.48121811], [ 259.38388539], [ 45.31004304], [ 153.5219854 ], [ 0. ]], [[ 290.04310025], [ 254.0019685 ], [ 52.35456045], [ 163.37074401], [ 1. ]], [[ 280.55837182], [ 247.34186868], [ 59.6405902 ], [ 169.77926846], [ 1.41421356]]])

Deleting the closest element is now a little more complicated than just getting the closest element.

With numpy, you can use boolean indexing to make this pretty easy.

However, you need to worry a bit about aligning your axes.

The key is to understand that numpy "translates" operations along the last axis. In this case, we want to roam the middle axis.

In addition, -1 can be used as a placeholder for axis size. Numpy will calculate the allowable size when -1 is placed as the axis size.

What we need to do, it will look something like this:

 #-- Remove closest point --------------------- mask = np.squeeze(dist) != dist.min(axis=1) filtered = data[mask] # Once again, let reshape things back to the original shape... filtered = filtered.reshape(data.shape[0], -1, data.shape[2])

You can do this in one line, I just break it down into readability. The key is that dist != something produces a boolean array that can then be used to index the original array.

So, all together:

 import numpy as np data = np.array([[[1704, 1240], [1745, 1244], [1972, 1290], [2129, 1395], [1989, 1332]], [[1712, 1246], [1750, 1246], [1964, 1286], [2138, 1399], [1989, 1333]], [[1721, 1249], [1756, 1249], [1955, 1283], [2145, 1399], [1990, 1333]]]) point = [1989, 1332] #-- Calculate distance ------------ # The reshape is to make it a single, Nx2 array to make calling `hypot` easier dist = data.reshape((-1,2)) - point dist = np.hypot(*dist.T) # We can then reshape it back to AxBx1 array, similar to the original shape dist = dist.reshape(data.shape[0], data.shape[1], 1) #-- Remove closest point --------------------- mask = np.squeeze(dist) != dist.min(axis=1) filtered = data[mask] # Once again, let reshape things back to the original shape... filtered = filtered.reshape(data.shape[0], -1, data.shape[2]) print filtered

Productivity:

 array([[[1704, 1240], [1745, 1244], [1972, 1290], [2129, 1395]], [[1712, 1246], [1750, 1246], [1964, 1286], [2138, 1399]], [[1721, 1249], [1756, 1249], [1955, 1283], [2145, 1399]]])

On the side of the note, if more than one point is equally close, this will not work. Massive arrays must have the same number of elements along each dimension, so in this case you will need to re-group.

NumPy: executing a function on each ndarray element

More articles: