How can I increase the efficiency of this numpy loop

I have a numpy array containing tags. I would like to calculate the number for each label based on its size and bounding box. How can I write this more efficiently so that it is realistic for use on large arrays (~ 15000 shortcuts)?

A = array([[ 1, 1, 0, 3, 3], [ 1, 1, 0, 0, 0], [ 1, 0, 0, 2, 2], [ 1, 0, 2, 2, 2]] ) B = zeros( 4 ) for label in range(1, 4): # get the bounding box of the label label_points = argwhere( A == label ) (y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1 # assume I've computed the size of each label in a numpy array size_A B[ label ] = myfunc(y0, x0, y1, x1, size_A[label]) 
+8
optimization python numpy
source share
5 answers

I have not been able to effectively implement this with some of the NumPy vector functions, so perhaps a smart Python implementation will be faster.

 def first_row(a, labels): d = {} d_setdefault = d.setdefault len_ = len num_labels = len_(labels) for i, row in enumerate(a): for label in row: d_setdefault(label, i) if len_(d) == num_labels: break return d 

This function returns a dictionary matching each label with the index of the first line in which it appears. Applying the function to A , AT , A[::-1] and AT[::-1] also gives you the first column, as well as the last row and column.

If you prefer a list instead of a dictionary, you can turn the dictionary into a list using map(d.get, labels) . In addition, you can use the NumPy array instead of the dictionary from the very beginning, but you will lose the ability to leave the loop earlier as soon as all the labels are found.

I would be wondering if (and how much) this will speed up your code, but I'm sure it is faster than your original solution.

+7
source share

Algorithm:

  • change array one size
  • get the sort index argsort ()
  • get the sorted version of the dimension array as sorted_A
  • use where () and diff () to find the label change position in sorted_A
  • use the change position and sort index to get the original position of the label in one dimension.
  • calculate the measurement location by size.

for a large array, such as (7000, 9000), can complete the calculation in 30 seconds.

here is the code:

 import numpy as np A = np.array([[ 1, 1, 0, 3, 3], [ 1, 1, 0, 0, 0], [ 1, 0, 0, 2, 2], [ 1, 0, 2, 2, 2]] ) def label_range(A): from itertools import izip_longest h, w = A.shape tmp = A.reshape(-1) index = np.argsort(tmp) sorted_A = tmp[index] pos = np.where(np.diff(sorted_A))[0]+1 for p1,p2 in izip_longest(pos,pos[1:]): label_index = index[p1:p2] y = label_index // w x = label_index % w x0 = np.min(x) x1 = np.max(x)+1 y0 = np.min(y) y1 = np.max(y)+1 label = tmp[label_index[0]] yield label,x0,y0,x1,y1 for label,x0,y0,x1,y1 in label_range(A): print "%d:(%d,%d)-(%d,%d)" % (label, x0,y0,x1,y1) #B = np.random.randint(0, 100, (7000, 9000)) #list(label_range(B)) 
+5
source share

Another method:

use bincount () to count the labels in each row and column and store the information in the rows and cols array.

For each label, you only need to search for a range in rows and columns. This is faster than sorting, on my computer, it can do the calculation in a few seconds.

 def label_range2(A): maxlabel = np.max(A)+1 h, w = A.shape rows = np.zeros((h, maxlabel), np.bool) for row in xrange(h): rows[row,:] = np.bincount(A[row,:], minlength=maxlabel) > 0 cols = np.zeros((w, maxlabel), np.bool) for col in xrange(w): cols[col,:] =np.bincount(A[:,col], minlength=maxlabel) > 0 for label in xrange(1, maxlabel): row = rows[:, label] col = cols[:, label] y = np.where(row)[0] x = np.where(col)[0] x0 = np.min(x) x1 = np.max(x)+1 y0 = np.min(y) y1 = np.max(y)+1 yield label, x0,y0,x1,y1 
+5
source share

The execution bottleneck is apparently a call to argmax . This can be avoided by changing the cycle as follows (only calculating y0, y1, but easy to generalize to x0, x1):

 for label in range(1, 4): comp = (A == label) yminind = comp.argmax(0) ymin = comp.max(0) ymaxind = comp.shape[0] - comp[::-1].argmax(0) y0 = yminind[ymin].min() y1 = ymaxind[ymin].max() 

I'm not sure about the difference in performance, but one reason may be that all operations like == , argmax and max can pre-allocate their output array directly from the input array form, which is impossible for argwhere .

+1
source share

Using PyPy, you can just start the loop and not worry about vectorization. He must be fast.

+1
source share

All Articles