Count the number of clusters of non-zero values in Python?

Question

Count the number of clusters of non-zero values in Python?

My data looks something like this:

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]

Essentially, there are lots of zeros before non-zero numbers, and I expect to count the number of groups of non-zero numbers separated by zeros. In the examples above there are 3 groups of non-zero data, so the code should return 3.

The number of zeros between non-zero groups is a variable

Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)

+7

python numpy pandas

Timbo slice Dec 31 '16 at 21:56

source share

5 answers

You can achieve this using itertools.groupby() with the following expression:

 >>> from itertools import groupby >>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true]) 3

+4

Moinuddin quadri Dec 31 '16 at 10:06

source share

python simple solution, just recount values from 0 to non-zero, tracking the previous value (leading edge detection):

 a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1] previous = 0 count = 0 for c in a: if previous==0 and c!=0: count+=1 previous = c print(count) # 3

+2

Jean-François Fabre Dec 31 '16 at 10:06

source share

an array of folders with zero on both sides using np.concatenate
find zero with a == 0
find borders using np.diff
summarize the boundaries found with sum
divide by two because we will find twice as much as we want

 def nonzero_clusters(a): return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)

demonstration

 nonzero_clusters( [0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1] ) 3

 nonzero_clusters([0, 1, 2, 0, 1, 2]) 2

 nonzero_clusters([0, 1, 2, 0, 1, 2, 0]) 2

 nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2]) 3

time
a = np.random.choice((0, 1), 100000)
the code

 from itertools import groupby def div(a): m = a != 0 return (m[1:] > m[:-1]).sum() + m[0] def pir(a): return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2) def jean(a): previous = 0 count = 0 for c in a: if previous==0 and c!=0: count+=1 previous = c return count def moin(a): return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true]) def user(a): return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

+2

piRSquared Dec 31 '16 at 23:45

source share

 sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

+1

user7342539 Dec 31 '16 at 22:14

source share

Divakar · Accepted Answer · 2016-12-31T22:04:32+0000

With a as an input array, we could have a vectorized solution -

 m = a!=0 out = (m[1:] > m[:-1]).sum() + m[0]

As an alternative to performance, we can use np.count_nonzero , which is very effective for counting bools, like here, for example:

 out = np.count_nonzero(m[1:] > m[:-1]) + m[0]

Basically, we get a non-zeros mask and count the rising edges. To take into account the first element, which could also be non-zero, and would not have any rising edge, we need to check it and add to the total.

Also note that if input a is a list, we should use m = np.asarray(a)!=0 instead.

Examples of runs for three cases -

 In [92]: a # Case1 :Given sample Out[92]: array([ 0, 0, 0, 0, 0, 0, 10, 15, 16, 12, 11, 9, 10, 0, 0, 0, 0, 0, 6, 9, 3, 7, 5, 4, 0, 0, 0, 0, 0, 0, 4, 3, 9, 7, 1]) In [93]: m = a!=0 In [94]: (m[1:] > m[:-1]).sum() + m[0] Out[94]: 3 In [95]: a[0] = 7 # Case2 :Add a non-zero elem/group at the start In [96]: m = a!=0 In [97]: (m[1:] > m[:-1]).sum() + m[0] Out[97]: 4 In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end In [100]: m = a!=0 In [101]: (m[1:] > m[:-1]).sum() + m[0] Out[101]: 5

Count the number of clusters of non-zero values ​​in Python?

More articles:

Count the number of clusters of non-zero values in Python?