Let's say I designated the image scipy.ndimage.measurements.label as follows:
[[0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 1, 0, 0, 0, 0], [0, 0, 0, 0, 3, 0], [2, 2, 0, 0, 0, 0], [2, 2, 0, 0, 0, 0]]
What is a quick way to collect the coordinates belonging to each label? That is, something like:
{ 1: [[0, 1], [1, 1], [2, 1]], 2: [[4, 0], [4, 1], [5, 0], [5, 1]], 3: [[3, 4]] }
I work with images of ~ 15,000 x 5000 pixels in size, and about half of each pixel in the image is labeled (i.e., nonzero).
Instead of repeating the whole image with nditer , would it be faster to do something like np.where(img == label) for each label?
EDIT:
Which algorithm works the fastest depends on how much the marked image is compared to the number of labels it has. Warren Walkesser and the Salvador Dali / BHAT IRSHAD methods (based on np.nonzero and np.where ) seem to be linearly scalable with the number of labels, while repeating each image element with nditer obviously linearly scales with the size of the marked image.
Results of a small test:
size: 1000 x 1000, num_labels: 10 weckesser ... 0.214357852936s dali ... 0.650229930878s nditer ... 6.53645992279s size: 1000 x 1000, num_labels: 100 weckesser ... 0.936990022659s dali ... 1.33582305908s nditer ... 6.81486487389s size: 1000 x 1000, num_labels: 1000 weckesser ... 8.43906402588s dali ... 9.81333303452s nditer ... 7.47897100449s size: 1000 x 1000, num_labels: 10000 weckesser ... 100.405524015s dali ... 118.17239809s nditer ... 9.14583897591s
So the question is becoming more specific:
For tagged images in which the number of labels is of order sqrt(size(image)) , is there an algorithm for collecting the coordinates of labels, which is faster than iterating through each image element (i.e. with nditer )?