Fast array manipulation based on including an element in a binary matrix

Question

Fast array manipulation based on including an element in a binary matrix

For a large set of randomly distributed points in a two-dimensional lattice, I want to efficiently retrieve a submatrix that contains only those elements that, approximated as indices, are assigned non-zero values in a separate two-dimensional binary matrix. Currently my script is as follows:

lat_len = 100 # lattice length input = np.random.random(size=(1000,2)) * lat_len binary_matrix = np.random.choice(2, lat_len * lat_len).reshape(lat_len, -1) def landed(input): output = [] input_as_indices = np.floor(input) for i in range(len(input)): if binary_matrix[input_as_indices[i,0], input_as_indices[i,1]] == 1: output.append(input[i]) output = np.asarray(output) return output

However, I suspect there should be a better way to do this. The above script can take quite some time for 10,000 iterations.

+5

performance python arrays numpy

neither-nor Jun 21 '15 at 18:25

source share

2 answers

It seems you can squeeze a notable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth answer , but using linear indexing -

 # Get floor-ed indices idx = np.floor(input).astype(np.int) # Calculate linear indices lin_idx = idx[:,0]*lat_len + idx[:,1] # Index raveled/flattened version of binary_matrix with lin_idx # to extract and form the desired output out = input[binary_matrix.ravel()[lin_idx] ==1]

Thus, in short, we have:

 out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]

Runtime Tests -

This section compares the proposed approach in this solution against another solution that uses row column indexing.

Case No. 1 (initial data):

 In [62]: lat_len = 100 # lattice length ...: input = np.random.random(size=(1000,2)) * lat_len ...: binary_matrix = np.random.choice(2, lat_len * lat_len). reshape(lat_len, -1) ...: In [63]: idx = np.floor(input).astype(np.int) In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1] 10000 loops, best of 3: 121 µs per loop In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1] 10000 loops, best of 3: 103 µs per loop

Case No. 2 (larger data):

 In [75]: lat_len = 1000 # lattice length ...: input = np.random.random(size=(100000,2)) * lat_len ...: binary_matrix = np.random.choice(2, lat_len * lat_len). reshape(lat_len, -1) ...: In [76]: idx = np.floor(input).astype(np.int) In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1] 100 loops, best of 3: 18.5 ms per loop In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1] 100 loops, best of 3: 13.1 ms per loop

Thus, the increase in productivity with this linear indexing looks approximately 20% - 30% .

+3

Divakar Jun 28 '15 at 13:20

source share

rth · Accepted Answer · 2015-06-21T19:44:42+0000

You're right. The above calculation can be done more efficiently without the for loop in python using advanced numpy indexing ,

 def landed2(input): idx = np.floor(input).astype(np.int) mask = binary_matrix[idx[:,0], idx[:,1]] == 1 return input[mask] res1 = landed(input) res2 = landed2(input) np.testing.assert_allclose(res1, res2)

this results in an acceleration of ~ 150x.

Fast array manipulation based on including an element in a binary matrix

More articles: