Numpy: Effective Substitution of Values ​​in a 2D Array Using a Map Dictionary

I have an array of 2D Numpy integers, for example:

a = np.array([[ 3, 0, 2, -1], [ 1, 255, 1, 2], [ 0, 3, 2, 2]]) 

and I have a dictionary with integer keys and values ​​that I would like to use to replace the values ​​of a with new values. Bitrate may look like this:

 d = {0: 1, 1: 2, 2: 3, 3: 4, -1: 0, 255: 0} 

I want to replace the values ​​of a corresponding to the key in d with the corresponding value in d . In other words, d defines a map between the old (current) and new (desired) values ​​in a . The result for the toy example above would be the following:

 a_new = np.array([[ 4, 1, 3, 0], [ 2, 0, 2, 3], [ 1, 4, 3, 3]]) 

What would be an effective way to implement this?

This is an example of a toy, but in practice the array will be large, its shape will be, for example, (1024, 2048) , and the dictionary will have the order of dozens of elements (in my case 34), and while the keys are integers, they are not necessarily all in a row, and they can be negative (as in the example above).

I need to do this replacement with hundreds of thousands of such arrays, so it should be fast. However, the dictionary is known in advance and remains constant; therefore, asymptotically, any time used to change the dictionary or transform it into a more suitable data structure does not matter.

I'm currently looping through array entries in two nested for loops (row and column a ), but there should be a better way.

If there were no negative values ​​on the map (for example, -1, as in the example), I would simply create a list or an array from the dictionary when the keys are the indexes of the array, and then use this for efficient Numpy regular indexing. But since there are negative values, this will not work.

+7
python dictionary arrays numpy
source share
4 answers

Here is one way, if you have a small dictionary / min and max, it may be more efficient, you bypass the negative index by adding the min array:

 In [11]: indexer = np.array([d.get(i, -1) for i in range(a.min(), a.max() + 1)]) In [12]: indexer[(a - a.min())] Out[12]: array([[4, 1, 3, 0], [2, 0, 2, 3], [1, 4, 3, 3]]) 

Note. This moves the for loop to the lookup table, but if it is significantly smaller than the actual array, it can be much faster.

+3
source share

Make a copy of the array, then iterate over the dictionary entries, then use Boolean indexing to assign new values ​​to the copy.

 import numpy as np b = np.copy(a) for old, new in d.items(): b[a == old] = new 
+2
source share

This post solves for the case of mutual matching between the array and the keys of the dictionary. The idea would be similar to that proposed by @Andy Hayden smart solution , but we will create a larger array that includes Python negative indexing , which allows us to simply index without any offsets needed for incoming input arrays, which should be a noticeable improvement here.

To get an index that will be one-time as the dictionary will remain the same, use this -

 def getval_array(d): v = np.array(list(d.values())) k = np.array(list(d.keys())) maxv = k.max() minv = k.min() n = maxv - minv + 1 val = np.empty(n,dtype=v.dtype) val[k] = v return val val_arr = getval_array(d) 

For final replacements, just index. So, for the input array a do -

 out = val_arr[a] 

Run Example -

 In [8]: a = np.array([[ 3, 0, 2, -1], ...: [ 1, 255, 1, -16], ...: [ 0, 3, 2, 2]]) ...: ...: d = {0: 1, 1: 2, 2: 3, 3: 4, -1: 0, 255: 0, -16:5} ...: In [9]: val_arr = getval_array(d) # one-time operation In [10]: val_arr[a] Out[10]: array([[4, 1, 3, 0], [2, 0, 2, 5], [1, 4, 3, 3]]) 

Sample time runtime test -

 In [141]: a = np.array([[ 3, 0, 2, -1], ...: [ 1, 255, 1, -16], ...: [ 0, 3, 2, 2]]) ...: ...: d = {0: 1, 1: 2, 2: 3, 3: 4, -1: 10, 255: 89, -16:5} ...: In [142]: a = np.random.choice(a.ravel(), 1024*2048).reshape(1024,2048) # @Andy Hayden soln In [143]: indexer = np.array([d.get(i, -1) for i in range(a.min(), a.max() + 1)]) In [144]: %timeit indexer[(a - a.min())] 100 loops, best of 3: 8.34 ms per loop # Proposed in this post In [145]: val_arr = getval_array(d) In [146]: %timeit val_arr[a] 100 loops, best of 3: 2.69 ms per loop 
+2
source share

Numpy can create vectorized functions to perform mapping operations on arrays. I'm not sure which method will have the best performance here, so I timed my approach with timeit. I would recommend trying a couple of other approaches if you want to find out what has the best performance.

 # Function to be vectorized def map_func(val, dictionary): return dictionary[val] if val in dictionary else val # Vectorize map_func vfunc = np.vectorize(map_func) # Run print(vfunc(a, d)) 

You can do this by following these steps:

 from timeit import Timer t = Timer('vfunc(a, d)', 'from __main__ import a, d, vfunc') print(t.timeit(number=1000)) 

My result for this approach was about 0.014 s.

Edit: for hits I tried this on a (1024, 2048) size numpy array of random numbers from -10 to 10, with the same dictionary. It took about a quarter of a second for one array. If you do not use many of these arrays, you may not need to optimize if this is an acceptable level of performance.

0
source share

All Articles