How to count continuous numbers in numpy

I have a one-dimensional array of Numpy 1 and 0. For example,

a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0]) 

I want to count continuous 0s and 1s in an array and output something like this

 [1,3,7,1,1,2,3,2,2] 

What am i doing atm

 np.diff(np.where(np.abs(np.diff(a)) == 1)[0]) 

and displays

 array([3, 7, 1, 1, 2, 3, 2]) 

As you can see, the first counter is missing.

I tried np.split and then got the sizes of each segment, but it does not look optimistic.

Is there a more elegant "python" solution?

+3
python arrays numpy
source share
2 answers

Here is one vector approach -

 np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size]) 

Run Example -

 In [208]: a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0]) In [209]: np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size]) Out[209]: array([1, 3, 7, 1, 1, 2, 3, 2, 2]) 

Faster with boolean concatenation -

 np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] )))) 

Runtime test

For setup, let me create a larger dataset with islands 0s and 1s and for a fair comparative analysis, as with this sample, let the length of the island vary between 1 and 7 -

 In [257]: n = 100000 # thus would create 100000 pair of islands In [258]: a = np.repeat(np.arange(n)%2, np.random.randint(1,7,(n))) # Approach #1 proposed in this post In [259]: %timeit np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size]) 100 loops, best of 3: 2.13 ms per loop # Approach #2 proposed in this post In [260]: %timeit np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] )))) 1000 loops, best of 3: 1.21 ms per loop # @Vineet Jain soln In [261]: %timeit [ sum(1 for i in g) for k,g in groupby(a)] 10 loops, best of 3: 61.3 ms per loop 
+5
source share

Using groupby from itertools

 from itertools import groupby a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0]) grouped_a = [ sum(1 for i in g) for k,g in groupby(a)] 
+4
source share

All Articles