Nested Loop Vectorization

Question

Nested Loop Vectorization

I am looking to vectorize a nested loop that will work in a list of 300,000 lists, with each of these lists containing 3 values. The nested loop compares the values of each of the lists with the corresponding values in other lists and adds only the list indices for which the corresponding values have a maximum difference of 0.1 between them. Thus, a list containing [0.234, 0.456, 0.567] and a list containing [0.246, 0.479, 0.580] falls into this category because their respective values (ie 0.234 and 0.246, 0.456 and 0.479, 0.567 and 0.580 ) have a difference of less than 0.1 between them.

I currently use the following nested loop for this, but it currently takes about 58 hours (a total of 90 trillion iterations);

import numpy as np variable = np.random.random((300000,3)).tolist() out1=list() out2=list() for i in range(0:300000): for j in range(0:300000): if ((i<j) and ((abs(variable[i][0]-variable[j][0]))<0.1) and ((abs(variable[i][1]-variable[j] [1]))<0.1) and ((abs(variable[i][2]-variable[j][2]))<0.1)): out1.append(i) out2.append(j)

+5

python vectorization numpy

Jborg Aug 3 '16 at 14:12

source share

2 answers

Convert to a NumPy array to make it easier to use NumPy functions after that. Then two approaches could be proposed.

Approach No. 1

NumPy broadcast can be used to expand these capabilities to 3D arrays and perform operations in vector form. So we would have an implementation like this:

 th = 0.1 # Threshold arr = np.asarray(variable) out1,out2 = np.where(np.triu((np.abs(arr[:,None,:] - arr) < th).all(-1),1))

Approach # 2

An alternative implementation with a focus on memory efficiency that uses selective indices that will be responsible for such iterations -

 th = 0.1 # Threshold arr = np.asarray(variable) R,C = np.triu_indices(arr.shape[0],1) mask = (np.abs(arr[R] - arr[C])<th).all(-1) out1,out2 = R[mask], C[mask]

+3

Divakar Aug 3 '16 at 14:19

source share

Eelco hoogendoorn · Accepted Answer · 2016-08-03T14:30:44+0000

Look at scipy.spatial; it has many functions for effectively solving such spatial queries; KDTrees in particular

 import scipy.spatial out = scipy.spatial.cKDTree(variable).query_pairs(r=0.1, p=np.infinity)

Nested Loop Vectorization

More articles: