It seems you can squeeze a notable performance boost if you work with linearly indexed arrays. Here's a vectorized implementation to solve our case, similar to @rth answer , but using linear indexing -
# Get floor-ed indices idx = np.floor(input).astype(np.int) # Calculate linear indices lin_idx = idx[:,0]*lat_len + idx[:,1] # Index raveled/flattened version of binary_matrix with lin_idx # to extract and form the desired output out = input[binary_matrix.ravel()[lin_idx] ==1]
Thus, in short, we have:
out = input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1]
Runtime Tests -
This section compares the proposed approach in this solution against another solution that uses row column indexing.
Case No. 1 (initial data):
In [62]: lat_len = 100 # lattice length ...: input = np.random.random(size=(1000,2)) * lat_len ...: binary_matrix = np.random.choice(2, lat_len * lat_len). reshape(lat_len, -1) ...: In [63]: idx = np.floor(input).astype(np.int) In [64]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1] 10000 loops, best of 3: 121 µs per loop In [65]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1] 10000 loops, best of 3: 103 µs per loop
Case No. 2 (larger data):
In [75]: lat_len = 1000 # lattice length ...: input = np.random.random(size=(100000,2)) * lat_len ...: binary_matrix = np.random.choice(2, lat_len * lat_len). reshape(lat_len, -1) ...: In [76]: idx = np.floor(input).astype(np.int) In [77]: %timeit input[binary_matrix[idx[:,0], idx[:,1]] == 1] 100 loops, best of 3: 18.5 ms per loop In [78]: %timeit input[binary_matrix.ravel()[idx[:,0]*lat_len + idx[:,1]] ==1] 100 loops, best of 3: 13.1 ms per loop
Thus, the increase in productivity with this linear indexing looks approximately 20% - 30% .