Nested Loop Vectorization

I am looking to vectorize a nested loop that will work in a list of 300,000 lists, with each of these lists containing 3 values. The nested loop compares the values โ€‹โ€‹of each of the lists with the corresponding values โ€‹โ€‹in other lists and adds only the list indices for which the corresponding values โ€‹โ€‹have a maximum difference of 0.1 between them. Thus, a list containing [0.234, 0.456, 0.567] and a list containing [0.246, 0.479, 0.580] falls into this category because their respective values โ€‹โ€‹(ie 0.234 and 0.246, 0.456 and 0.479, 0.567 and 0.580 ) have a difference of less than 0.1 between them.

I currently use the following nested loop for this, but it currently takes about 58 hours (a total of 90 trillion iterations);

import numpy as np variable = np.random.random((300000,3)).tolist() out1=list() out2=list() for i in range(0:300000): for j in range(0:300000): if ((i<j) and ((abs(variable[i][0]-variable[j][0]))<0.1) and ((abs(variable[i][1]-variable[j] [1]))<0.1) and ((abs(variable[i][2]-variable[j][2]))<0.1)): out1.append(i) out2.append(j) 
+5
source share
2 answers

Look at scipy.spatial; it has many functions for effectively solving such spatial queries; KDTrees in particular

 import scipy.spatial out = scipy.spatial.cKDTree(variable).query_pairs(r=0.1, p=np.infinity) 
+3
source

Convert to a NumPy array to make it easier to use NumPy functions after that. Then two approaches could be proposed.

Approach No. 1

NumPy broadcast can be used to expand these capabilities to 3D arrays and perform operations in vector form. So we would have an implementation like this:

 th = 0.1 # Threshold arr = np.asarray(variable) out1,out2 = np.where(np.triu((np.abs(arr[:,None,:] - arr) < th).all(-1),1)) 

Approach # 2

An alternative implementation with a focus on memory efficiency that uses selective indices that will be responsible for such iterations -

 th = 0.1 # Threshold arr = np.asarray(variable) R,C = np.triu_indices(arr.shape[0],1) mask = (np.abs(arr[R] - arr[C])<th).all(-1) out1,out2 = R[mask], C[mask] 
+3
source

All Articles