Numpy search item index in another array

I have an array / set with unique positive integers, i.e.

>>> unique = np.unique(np.random.choice(100, 4, replace=False)) 

And an array containing several elements selected from this previous array, for example

 >>> A = np.random.choice(unique, 100) 

I want to map the values ​​of array A to the position where these values ​​are in unique .

So far, I have found the best solution through a display array:

 >>> table = np.zeros(unique.max()+1, unique.dtype) >>> table[unique] = np.arange(unique.size) 

The above assigns each element an index in the array and therefore can be used later to display A through extended indexing:

 >>> table[A] array([2, 2, 3, 3, 3, 3, 1, 1, 1, 0, 2, 0, 1, 0, 2, 1, 0, 0, 2, 3, 0, 0, 0, 0, 3, 3, 2, 1, 0, 0, 0, 2, 1, 0, 3, 0, 1, 3, 0, 1, 2, 3, 3, 3, 3, 1, 3, 0, 1, 2, 0, 0, 2, 3, 1, 0, 3, 2, 3, 3, 3, 1, 1, 2, 0, 0, 2, 0, 2, 3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 1, 0, 1, 2, 0, 2, 0, 1, 3, 0, 2, 0, 1, 3, 2, 2, 1, 3, 0, 3, 3], dtype=int32) 

This already gives me the right solution. However, if the unique numbers in unique very sparse and large, this approach involves creating a very large table array only to store several numbers for later matching.

Is there a better solution?

NOTE. Both A and unique are pattern arrays, not real arrays. So the question is not how to create positional indices, but how to efficiently map A elements to indices in unique , the pseudo-code of what I would like to speed up in numpy is as follows:

 B = np.zeros_like(A) for i in range(A.size): B[i] = unique.index(A[i]) 

(it is assumed that unique is a list in the specified pseudocode).

+5
source share
3 answers

The table approach described in your question is the best option if unique , if rather dense, but unique.searchsorted(A) should give the same result and does not require unique be dense. searchsorted works fine with ints, if someone is trying to do such things with floats with precision limitations, consider something like this .

+4
source

The numpy_indexed package (disclaimer: I am the author) contains the vector equivalent of list.index, which does not require memory proportional to max, but only proportional to the input itself:

 import numpy_indexed as npi npi.indices(unique, A) 

Note that it also works for arbitrary types and sizes. In addition, the requested array does not have to be unique; the first index found will be returned, the same as for the list.

+2
source

You can use standard python dict with np.vectorize

 inds = {e:i for i, e in enumerate(unique)} B = np.vectorize(inds.get)(A) 
+1
source

All Articles