Numpy mapping operation performance improvement

Question

Numpy mapping operation performance improvement

I have an array with dimensions (4, X, Y), where the first dimension means a quadruplet (R, G, B, A). My goal is to transfer each X*Y RGBA quadrupt to floating-point X*Y values, given the corresponding dictionary.

My current code is as follows:

 codeTable = { (255, 255, 255, 127): 5.5, (128, 128, 128, 255): 6.5, (0 , 0 , 0 , 0 ): 7.5, } for i in range(0, rows): for j in range(0, cols): new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999)

Where data is an array with sizes (4, rows, cols) , and new_data has a size (rows, cols) .

The code works fine, but takes quite a while. How do I optimize this piece of code?

Here is a complete example:

 import numpy codeTable = { (253, 254, 255, 127): 5.5, (128, 129, 130, 255): 6.5, (0 , 0 , 0 , 0 ): 7.5, } # test data rows = 2 cols = 2 data = numpy.array([ [[253, 0], [128, 0], [128, 0]], [[254, 0], [129, 144], [129, 0]], [[255, 0], [130, 243], [130, 5]], [[127, 0], [255, 120], [255, 5]], ]) new_data = numpy.zeros((rows,cols), numpy.float32) for i in range(0, rows): for j in range(0, cols): new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999) # expected result for `new_data`: # array([[ 5.50000000e+00, 7.50000000e+00], # [ 6.50000000e+00, -9.99900000e+03], # [ 6.50000000e+00, -9.99900000e+03], dtype=float32)

+7

performance python numpy

Kévin lesénéchal Jun 04 '16 at 13:15

source share

2 answers

John karasinski · Answer 1 · 2016-06-05T17:31:23+0000

Here's an approach that returns your expected result, but with so little data, it's hard to know if it will be faster for you. However, since I avoided the double for loop, I think you will see pretty decent acceleration.

 import numpy import pandas as pd codeTable = { (253, 254, 255, 127): 5.5, (128, 129, 130, 255): 6.5, (0 , 0 , 0 , 0 ): 7.5, } # test data rows = 3 cols = 2 data = numpy.array([ [[253, 0], [128, 0], [128, 0]], [[254, 0], [129, 144], [129, 0]], [[255, 0], [130, 243], [130, 5]], [[127, 0], [255, 120], [255, 5]], ]) new_data = numpy.zeros((rows,cols), numpy.float32) for i in range(0, rows): for j in range(0, cols): new_data[i,j] = codeTable.get(tuple(data[:,i,j]), -9999) def create_output(data): # Reshape your two data sources to be a bit more sane reshaped_data = data.reshape((4, -1)) df = pd.DataFrame(reshaped_data).T reshaped_codeTable = [] for key in codeTable.keys(): reshaped = list(key) + [codeTable[key]] reshaped_codeTable.append(reshaped) ct = pd.DataFrame(reshaped_codeTable) # Merge on the data, replace missing merges with -9999 result = df.merge(ct, how='left') newest_data = result[4].fillna(-9999) # Reshape output = newest_data.reshape(rows, cols) return output output = create_output(data) print(output) # array([[ 5.50000000e+00, 7.50000000e+00], # [ 6.50000000e+00, -9.99900000e+03], # [ 6.50000000e+00, -9.99900000e+03]) print(numpy.array_equal(new_data, output)) # True

Eelco hoogendoorn · Answer 2 · 2016-06-12T09:27:02+0000

The numpy_indexed package (disclaimer: I am the author) contains a vector version of the list.index nd array, which can be used to solve your problem efficiently and concisely:

 import numpy_indexed as npi map_keys = np.array(list(codeTable.keys())) map_values = np.array(list(codeTable.values())) indices = npi.indices(map_keys, data.reshape(4, -1).T, missing='mask') remapped = np.where(indices.mask, -9999, map_values[indices.data]).reshape(data.shape[1:])

Numpy mapping operation performance improvement

More articles: