Classify elements of a numpy array using the second array as a reference

Let's say I have an array with a finite number of unique values. Let's say

data = array([30, 20, 30, 10, 20, 10, 20, 10, 30, 20, 20, 30, 30, 10, 30])

And I also have a reference array with all the unique values ​​found in data, without repetition and in a specific order. Let's say

reference = array([20, 10, 30])

And I want to create an array with the same form as that datacontaining indexes in the array as values reference, where each element of the array will be found data.

In other words, having dataand reference, I want to create an array indexesso that the following holds.

data = reference[indexes]

A suboptimal approach to computing indexeswill use a for loop, like this

indexes = np.zeros_like(data, dtype=int)
for i in range(data.size):
    indexes[i] = np.where(data[i] == reference)[0]

, (, , !) ... ?

!

+4
3

data reference as -

In [375]: data
Out[375]: array([30, 20, 30, 10, 20, 10, 20, 10, 30, 20, 20, 30, 30, 10, 30])

In [376]: reference
Out[376]: array([20, 10, 30])

- reference -

In [373]: np.sort(reference)
Out[373]: array([10, 20, 30])

np.searchsorted, data , -

In [378]: np.searchsorted(np.sort(reference), data, side='left')
Out[378]: array([2, 1, 2, 0, 1, 0, 1, 0, 2, 1, 1, 2, 2, 0, 2], dtype=int64)

, -

In [379]: indexes
Out[379]: array([2, 0, 2, 1, 0, 1, 0, 1, 2, 0, 0, 2, 2, 1, 2])

, searchsorted , 0's 1s, 1's 0's. , reference. , 0's to 1's , , reference, .. np.argsort(reference). - - ! , :

# Get sorting indices for reference
sort_idx = np.argsort(reference)

# Sort reference and get searchsorted indices for data in reference
pos = np.searchsorted(reference[sort_idx], data, side='left')

# Change pos indices based on sorted indices for reference
out = np.argsort(reference)[pos]

-

In [396]: data = np.random.randint(0,30000,150000)
     ...: reference = np.unique(data)
     ...: reference = reference[np.random.permutation(reference.size)]
     ...: 
     ...: 
     ...: def org_approach(data,reference):
     ...:     indexes = np.zeros_like(data, dtype=int)
     ...:     for i in range(data.size):
     ...:         indexes[i] = np.where(data[i] == reference)[0]
     ...:     return indexes
     ...: 
     ...: def vect_approach(data,reference):
     ...:     sort_idx = np.argsort(reference)
     ...:     pos = np.searchsorted(reference[sort_idx], data, side='left')       
     ...:     return sort_idx[pos]
     ...: 

In [397]: %timeit org_approach(data,reference)
1 loops, best of 3: 9.86 s per loop

In [398]: %timeit vect_approach(data,reference)
10 loops, best of 3: 32.4 ms per loop

-

In [399]: np.array_equal(org_approach(data,reference),vect_approach(data,reference))
Out[399]: True
+4

. - . .

:

import numpy

data = numpy.array([30, 20, 30, 10, 20, 10, 20, 10, 30, 20, 20, 30, 30, 10, 30])
reference = numpy.array([20, 10, 30])
reference_index = dict((value, index) for index, value in enumerate(reference))
indexes = [reference_index[value] for value in data]
assert numpy.all(data == reference[indexes])

, numpy.where, numpy.where O (n), - O (1) .

+1
import numpy as np

data = np.array([30, 20, 30, 10, 20, 10, 20, 10, 30, 20, 20, 30, 30, 10, 30])
reference = {20:0, 10:1, 30:2}
indexes = np.zeros_like(data, dtype=int)

for i in xrange(data.size):
    indexes[i] = reference[data[i]]

Dictionary searches are much faster. Use xrangealso helped marginally.

Using timeit:

Original: 4.01297836938

This version: 1.30972428591

0
source

All Articles