A quick way to turn a marked image into a dictionary {label: [coordinates]}

Let's say I designated the image scipy.ndimage.measurements.label as follows:

[[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]] 

What is a quick way to collect the coordinates belonging to each label? That is, something like:

 { 1: [[0, 1], [1, 1], [2, 1]], 2: [[4, 0], [4, 1], [5, 0], [5, 1]], 3: [[3, 4]] } 

I work with images of ~ 15,000 x 5000 pixels in size, and about half of each pixel in the image is labeled (i.e., nonzero).

Instead of repeating the whole image with nditer , would it be faster to do something like np.where(img == label) for each label?

EDIT:

Which algorithm works the fastest depends on how much the marked image is compared to the number of labels it has. Warren Walkesser and the Salvador Dali / BHAT IRSHAD methods (based on np.nonzero and np.where ) seem to be linearly scalable with the number of labels, while repeating each image element with nditer obviously linearly scales with the size of the marked image.

Results of a small test:

 size: 1000 x 1000, num_labels: 10 weckesser ... 0.214357852936s dali ... 0.650229930878s nditer ... 6.53645992279s size: 1000 x 1000, num_labels: 100 weckesser ... 0.936990022659s dali ... 1.33582305908s nditer ... 6.81486487389s size: 1000 x 1000, num_labels: 1000 weckesser ... 8.43906402588s dali ... 9.81333303452s nditer ... 7.47897100449s size: 1000 x 1000, num_labels: 10000 weckesser ... 100.405524015s dali ... 118.17239809s nditer ... 9.14583897591s 

So the question is becoming more specific:

For tagged images in which the number of labels is of order sqrt(size(image)) , is there an algorithm for collecting the coordinates of labels, which is faster than iterating through each image element (i.e. with nditer )?

+6
source share
4 answers

Here is the opportunity:

 import numpy as np a = np.array([[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]]) # If the array was computed using scipy.ndimage.measurements.label, you # already know how many labels there are. num_labels = 3 nz = np.nonzero(a) coords = np.column_stack(nz) nzvals = a[nz[0], nz[1]] res = {k:coords[nzvals == k] for k in range(1, num_labels + 1)} 

I called this script get_label_indices.py . Here is an example run:

 In [97]: import pprint In [98]: run get_label_indices.py In [99]: pprint.pprint(res) {1: array([[0, 1], [1, 1], [2, 1]]), 2: array([[4, 0], [4, 1], [5, 0], [5, 1]]), 3: array([[3, 4]])} 
+4
source

You can do something like this (let img be your original nd.array)

 res = {} for i in np.unique(img)[1:]: x, y = np.where(a == i) res[i] = zip(list(x), list(y)) 

which will give you what you want:

 { 1: [(0, 1), (1, 1), (2, 1)], 2: [(4, 0), (4, 1), (5, 0), (5, 1)], 3: [(3, 4)] } 

Whether it will be faster - it depends on the standard.

In Warren’s proposal I don’t need to use unique ones and just do

 res = {} for i in range(1, num_labels + 1) x, y = np.where(a == i) res[i] = zip(list(x), list(y)) 
+1
source

Try the following:

 >>> z array([[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]]) >>> {i:zip(*np.where(z==i)) for i in np.unique(z) if i} {1: [(0, 1), (1, 1), (2, 1)], 2: [(4, 0), (4, 1), (5, 0), (5, 1)], 3: [(3, 4)]} 
0
source

This is basically an argsort operation, with some extra work to get the format you want:

 def sorting_based(img, nlabels): img_flat = img.ravel() label_counts = np.bincount(img_flat) lin_idx = np.argsort(img_flat)[label_counts[0]:] coor = np.column_stack(np.unravel_index(lin_idx, img.shape)) ptr = np.cumsum(label_counts[1:-1]) out = dict(enumerate(np.split(coor, ptr), start=1)) return out 

As you know, executing np.where(img == label) for each label results in a squared O(m*n) m=n_pixels , with m=n_pixels and n=n_labels . The sorting method reduces complexity to O(m*log(m) + n) .

You can perform this operation in linear time, but I don't think it is possible to vectorize using Numpy. You can abuse scipy.sparse.csr_matrix similar to this answer , but at this point you are probably better off writing code that really makes sense in Numba, Cython, etc.

0
source

All Articles