Numpy select a fixed number of values ​​among duplicate values ​​in an array

Starting with a simple array with duplicate values:

a = np.array([2,3,2,2,3,3,2,1])

I am trying to select from it a maximum of 2 unique values. The resulting array will look like:

b = np.array([2,3,2,3,1])

regardless of the order of the positions. So far, I have been trying to find unique values ​​with:

In [20]: c = np.unique(a,return_counts=True)

In [21]: c
Out[21]: (array([1, 2, 3]), array([1, 4, 3]))

which is useful because it also returns the frequency of the values, but I'm fixated on the filtering frequency.

+4
source share
3 answers

You can use list comprehension in np.concatenate()and limit the number of elements by slicing:

>>> np.concatenate([a[a==i][:2] for i in np.unique(a)])
array([1, 2, 2, 3, 3])
+3
source

np.repeat uniques count s:

import numpy as np

a = np.array([2,3,2,2,3,3,2,1])
uniques, count = np.unique(a,return_counts=True)
np.repeat(uniques, np.clip(count, 0, 2))

array([1, 2, 2, 3, 3])

np.clip , count 0 2. , .

+5

Here an approach is used to maintain order, as in the input array -

N = 2    # Number of duplicates to keep for each unique element

sortidx = a.argsort()
_,id_arr = np.unique(a[sortidx],return_index=True)

valid_ind = np.unique( (id_arr[:,None] + np.arange(N)).ravel().clip(max=a.size-1) )
out = a[np.sort(sortidx[valid_ind])]

Run Example -

In [253]: a
Out[253]: array([ 0, -3,  0,  2,  0,  3,  2,  0,  2,  3,  3,  2,  1,  5,  0,  2])

In [254]: N
Out[254]: 3

In [255]: out
Out[255]: array([ 0, -3,  0,  2,  0,  3,  2,  2,  3,  3,  1,  5])

In [256]: np.unique(out,return_counts=True)[1] # Verify the counts to be <= N
Out[256]: array([1, 3, 1, 3, 3, 1])
+3
source

All Articles