Filter numpy array based on highest value

I have a numpy array that contains 4-dimensional vectors that have the following format (x, y, z, w)

The size of the array is 4 x N. Now I have data in which I have (x, y, z) spatial locations, and w takes a specific measurement at that location. Now there can be many dimensions associated with the (x, y, z) position (measured as floats).

What I would like to do is an array filter, so I get a new array where I get the maximum dimension corresponding to each position (x, y, z).

So, if my data is similar:

x, y, z, w1 x, y, z, w2 x, y, z, w3 

where w1 is greater than w2 and w3, the filtered data:

 x, y, z, w1 

So, more specifically, let's say I have data like:

 [[ 0.7732126 0.48649481 0.29771819 0.91622924] [ 0.7732126 0.48649481 0.29771819 1.91622924] [ 0.58294263 0.32025559 0.6925856 0.0524125 ] [ 0.58294263 0.32025559 0.6925856 0.05 ] [ 0.58294263 0.32025559 0.6925856 1.7 ] [ 0.3239913 0.7786444 0.41692853 0.10467392] [ 0.12080023 0.74853649 0.15356663 0.4505753 ] [ 0.13536096 0.60319054 0.82018125 0.10445047] [ 0.1877724 0.96060999 0.39697999 0.59078612]] 

This should return

 [[ 0.7732126 0.48649481 0.29771819 1.91622924] [ 0.58294263 0.32025559 0.6925856 1.7 ] [ 0.3239913 0.7786444 0.41692853 0.10467392] [ 0.12080023 0.74853649 0.15356663 0.4505753 ] [ 0.13536096 0.60319054 0.82018125 0.10445047] [ 0.1877724 0.96060999 0.39697999 0.59078612]] 
+7
python arrays numpy
source share
5 answers

This is collapsed, but it is probably as good as you are going to use only numpy ...

First we use lexsort to put all records with the same coordinates. If a is your pattern array:

 >>> perm = np.lexsort(a[:, 3::-1].T) >>> a[perm] array([[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], [ 0.7732126 , 0.48649481, 0.29771819, 0.91622924], [ 0.7732126 , 0.48649481, 0.29771819, 1.91622924], [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612], [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], [ 0.58294263, 0.32025559, 0.6925856 , 0.0524125 ], [ 0.58294263, 0.32025559, 0.6925856 , 0.05 ], [ 0.58294263, 0.32025559, 0.6925856 , 1.7 ], [ 0.13536096, 0.60319054, 0.82018125, 0.10445047]]) 

Note that by turning the axis, we sort by x , breaking the connections with y , then z , then w .

Since this is the maximum we are looking for, we just need to take the last entry in each group, which is pretty simple:

 >>> a_sorted = a[perm] >>> last = np.concatenate((np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1), [True])) >>> a_unique_max = a_sorted[last] >>> a_unique_max array([[ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], [ 0.13536096, 0.60319054, 0.82018125, 0.10445047], [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612], [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], [ 0.58294263, 0.32025559, 0.6925856 , 1.7 ], [ 0.7732126 , 0.48649481, 0.29771819, 1.91622924]]) 

If you prefer not to sort the result, but keep them in the original order, they are in the original array, you can also get this with perm :

 >>> a_unique_max[np.argsort(perm[last])] array([[ 0.7732126 , 0.48649481, 0.29771819, 1.91622924], [ 0.58294263, 0.32025559, 0.6925856 , 1.7 ], [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], [ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], [ 0.13536096, 0.60319054, 0.82018125, 0.10445047], [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612]]) 

This will only work for the maximum, and it will become a by-product of the sort. If you perform another function, say, the product of all records with the same coordinates, you can do something like:

 >>> first = np.concatenate(([True], np.all(a_sorted[:-1, :3] != a_sorted[1:, :3], axis=1))) >>> a_unique_prods = np.multiply.reduceat(a_sorted, np.nonzero(first)[0]) 

And you will have to play around a bit with these results to assemble the returned array.

+3
source share

I see that you already have a pointer to pandas in the comments. FWIW, here's how you can get the desired behavior, assuming you don't need the final sort order, since groupby changes it.

 In [14]: arr Out[14]: array([[ 0.7732126 , 0.48649481, 0.29771819, 0.91622924], [ 0.7732126 , 0.48649481, 0.29771819, 1.91622924], [ 0.58294263, 0.32025559, 0.6925856 , 0.0524125 ], [ 0.58294263, 0.32025559, 0.6925856 , 0.05 ], [ 0.58294263, 0.32025559, 0.6925856 , 1.7 ], [ 0.3239913 , 0.7786444 , 0.41692853, 0.10467392], [ 0.12080023, 0.74853649, 0.15356663, 0.4505753 ], [ 0.13536096, 0.60319054, 0.82018125, 0.10445047], [ 0.1877724 , 0.96060999, 0.39697999, 0.59078612]]) In [15]: import pandas as pd In [16]: pd.DataFrame(arr) Out[16]: 0 1 2 3 0 0.773213 0.486495 0.297718 0.916229 1 0.773213 0.486495 0.297718 1.916229 2 0.582943 0.320256 0.692586 0.052413 3 0.582943 0.320256 0.692586 0.050000 4 0.582943 0.320256 0.692586 1.700000 5 0.323991 0.778644 0.416929 0.104674 6 0.120800 0.748536 0.153567 0.450575 7 0.135361 0.603191 0.820181 0.104450 8 0.187772 0.960610 0.396980 0.590786 In [17]: pd.DataFrame(arr).groupby([0,1,2]).max().reset_index() Out[17]: 0 1 2 3 0 0.120800 0.748536 0.153567 0.450575 1 0.135361 0.603191 0.820181 0.104450 2 0.187772 0.960610 0.396980 0.590786 3 0.323991 0.778644 0.416929 0.104674 4 0.582943 0.320256 0.692586 1.700000 5 0.773213 0.486495 0.297718 1.916229 
+2
source share

You can start by lex-sorting input array to sequentially enter entries with the identical first three elements. Then create another 2D array to hold the last column entries, so that the elements corresponding to each duplicate triplet fall into the same rows. Then find max along axis=1 for this 2D matrix and thus get the final max output for each such unique triplet. Here's an implementation assuming A as an input array -

 # Lex sort A sortedA = A[np.lexsort(A[:,:-1].T)] # Mask of start of unique first three columns from A start_unqA = np.append(True,~np.all(np.diff(sortedA[:,:-1],axis=0)==0,axis=1)) # Counts of unique first three columns from A counts = np.bincount(start_unqA.cumsum()-1) mask = np.arange(counts.max()) < counts[:,None] # Group A last column into rows based on uniqueness from first three columns grpA = np.empty(mask.shape) grpA.fill(np.nan) grpA[mask] = sortedA[:,-1] # Concatenate unique first three columns from A and # corresponding max values for each such unique triplet out = np.column_stack((sortedA[start_unqA,:-1],np.nanmax(grpA,axis=1))) 

Run Example -

 In [75]: A Out[75]: array([[ 1, 1, 1, 96], [ 1, 2, 2, 48], [ 2, 1, 2, 33], [ 1, 1, 1, 24], [ 1, 1, 1, 94], [ 2, 2, 2, 5], [ 2, 1, 1, 17], [ 2, 2, 2, 62]]) In [76]: sortedA Out[76]: array([[ 1, 1, 1, 96], [ 1, 1, 1, 24], [ 1, 1, 1, 94], [ 2, 1, 1, 17], [ 2, 1, 2, 33], [ 1, 2, 2, 48], [ 2, 2, 2, 5], [ 2, 2, 2, 62]]) In [77]: out Out[77]: array([[ 1., 1., 1., 96.], [ 2., 1., 1., 17.], [ 2., 1., 2., 33.], [ 1., 2., 2., 48.], [ 2., 2., 2., 62.]]) 
+2
source share

You can use logical indexing.

I use random data for an example:

 >>> myarr = np.random.random((6, 4)) >>> print(myarr) [[ 0.7732126 0.48649481 0.29771819 0.91622924] [ 0.58294263 0.32025559 0.6925856 0.0524125 ] [ 0.3239913 0.7786444 0.41692853 0.10467392] [ 0.12080023 0.74853649 0.15356663 0.4505753 ] [ 0.13536096 0.60319054 0.82018125 0.10445047] [ 0.1877724 0.96060999 0.39697999 0.59078612]] 

To get the row or rows where the last column is the largest, do the following:

 >>> greatest = myarr[myarr[:, 3]==myarr[:, 3].max()] >>> print(greatest) [[ 0.7732126 0.48649481 0.29771819 0.91622924]] 

What does this mean, it gets the last column myarr and finds the maximum of this column, finds all the elements of this column equal to the maximum, and then gets the corresponding rows.

-one
source share

You can use np.argmax

x[np.argmax(x[:,3]),:]

 >>> x = np.random.random((5,4)) >>> x array([[ 0.25461146, 0.35671081, 0.54856798, 0.2027313 ], [ 0.17079029, 0.66970362, 0.06533572, 0.31704254], [ 0.4577928 , 0.69022073, 0.57128696, 0.93995176], [ 0.29708841, 0.96324181, 0.78859008, 0.25433235], [ 0.58739451, 0.17961551, 0.67993786, 0.73725493]]) >>> x[np.argmax(x[:,3]),:] array([ 0.4577928 , 0.69022073, 0.57128696, 0.93995176]) 
-one
source share

All Articles