A quick way to turn a marked image into a dictionary {label: [coordinates]}

Question

A quick way to turn a marked image into a dictionary {label: [coordinates]}

Let's say I designated the image scipy.ndimage.measurements.label as follows:

[[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]]

What is a quick way to collect the coordinates belonging to each label? That is, something like:

 { 1: [[0, 1], [1, 1], [2, 1]], 2: [[4, 0], [4, 1], [5, 0], [5, 1]], 3: [[3, 4]] }

I work with images of ~ 15,000 x 5000 pixels in size, and about half of each pixel in the image is labeled (i.e., nonzero).

Instead of repeating the whole image with nditer , would it be faster to do something like np.where(img == label) for each label?

EDIT:

Which algorithm works the fastest depends on how much the marked image is compared to the number of labels it has. Warren Walkesser and the Salvador Dali / BHAT IRSHAD methods (based on np.nonzero and np.where ) seem to be linearly scalable with the number of labels, while repeating each image element with nditer obviously linearly scales with the size of the marked image.

Results of a small test:

 size: 1000 x 1000, num_labels: 10 weckesser ... 0.214357852936s dali ... 0.650229930878s nditer ... 6.53645992279s size: 1000 x 1000, num_labels: 100 weckesser ... 0.936990022659s dali ... 1.33582305908s nditer ... 6.81486487389s size: 1000 x 1000, num_labels: 1000 weckesser ... 8.43906402588s dali ... 9.81333303452s nditer ... 7.47897100449s size: 1000 x 1000, num_labels: 10000 weckesser ... 100.405524015s dali ... 118.17239809s nditer ... 9.14583897591s

So the question is becoming more specific:

For tagged images in which the number of labels is of order sqrt(size(image)) , is there an algorithm for collecting the coordinates of labels, which is faster than iterating through each image element (i.e. with nditer )?

+6

python numpy scipy image-processing

bennlich Sep 23 '15 at 20:33

source share

4 answers

Warren weckesser · Answer 1 · 2015-09-23T22:06:24+0000

Here is the opportunity:

 import numpy as np a = np.array([[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]]) # If the array was computed using scipy.ndimage.measurements.label, you # already know how many labels there are. num_labels = 3 nz = np.nonzero(a) coords = np.column_stack(nz) nzvals = a[nz[0], nz[1]] res = {k:coords[nzvals == k] for k in range(1, num_labels + 1)}

I called this script get_label_indices.py . Here is an example run:

 In [97]: import pprint In [98]: run get_label_indices.py In [99]: pprint.pprint(res) {1: array([[0, 1], [1, 1], [2, 1]]), 2: array([[4, 0], [4, 1], [5, 0], [5, 1]]), 3: array([[3, 4]])}

Salvador dali · Answer 2 · 2015-09-23T21:12:34+0000

You can do something like this (let img be your original nd.array)

 res = {} for i in np.unique(img)[1:]: x, y = np.where(a == i) res[i] = zip(list(x), list(y))

which will give you what you want:

 { 1: [(0, 1), (1, 1), (2, 1)], 2: [(4, 0), (4, 1), (5, 0), (5, 1)], 3: [(3, 4)] }

Whether it will be faster - it depends on the standard.

In Warren’s proposal I don’t need to use unique ones and just do

 res = {} for i in range(1, num_labels + 1) x, y = np.where(a == i) res[i] = zip(list(x), list(y))

Irshad bhat · Answer 3 · 2015-09-23T21:20:36+0000

Try the following:

 >>> z array([[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]]) >>> {i:zip(*np.where(z==i)) for i in np.unique(z) if i} {1: [(0, 1), (1, 1), (2, 1)], 2: [(4, 0), (4, 1), (5, 0), (5, 1)], 3: [(3, 4)]}

user2379410 · Answer 4 · 2015-10-01T10:33:18+0000

This is basically an argsort operation, with some extra work to get the format you want:

 def sorting_based(img, nlabels): img_flat = img.ravel() label_counts = np.bincount(img_flat) lin_idx = np.argsort(img_flat)[label_counts[0]:] coor = np.column_stack(np.unravel_index(lin_idx, img.shape)) ptr = np.cumsum(label_counts[1:-1]) out = dict(enumerate(np.split(coor, ptr), start=1)) return out

As you know, executing np.where(img == label) for each label results in a squared O(m*n) m=n_pixels , with m=n_pixels and n=n_labels . The sorting method reduces complexity to O(m*log(m) + n) .

You can perform this operation in linear time, but I don't think it is possible to vectorize using Numpy. You can abuse scipy.sparse.csr_matrix similar to this answer , but at this point you are probably better off writing code that really makes sense in Numba, Cython, etc.

A quick way to turn a marked image into a dictionary {label: [coordinates]}

More articles: