Count the number of clusters of non-zero values ​​in Python?

My data looks something like this:

a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1] 

Essentially, there are lots of zeros before non-zero numbers, and I expect to count the number of groups of non-zero numbers separated by zeros. In the examples above there are 3 groups of non-zero data, so the code should return 3.

  • The number of zeros between non-zero groups is a variable

Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)

+7
python numpy pandas
source share
5 answers

With a as an input array, we could have a vectorized solution -

 m = a!=0 out = (m[1:] > m[:-1]).sum() + m[0] 

As an alternative to performance, we can use np.count_nonzero , which is very effective for counting bools, like here, for example:

 out = np.count_nonzero(m[1:] > m[:-1]) + m[0] 

Basically, we get a non-zeros mask and count the rising edges. To take into account the first element, which could also be non-zero, and would not have any rising edge, we need to check it and add to the total.

Also note that if input a is a list, we should use m = np.asarray(a)!=0 instead.

Examples of runs for three cases -

 In [92]: a # Case1 :Given sample Out[92]: array([ 0, 0, 0, 0, 0, 0, 10, 15, 16, 12, 11, 9, 10, 0, 0, 0, 0, 0, 6, 9, 3, 7, 5, 4, 0, 0, 0, 0, 0, 0, 4, 3, 9, 7, 1]) In [93]: m = a!=0 In [94]: (m[1:] > m[:-1]).sum() + m[0] Out[94]: 3 In [95]: a[0] = 7 # Case2 :Add a non-zero elem/group at the start In [96]: m = a!=0 In [97]: (m[1:] > m[:-1]).sum() + m[0] Out[97]: 4 In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end In [100]: m = a!=0 In [101]: (m[1:] > m[:-1]).sum() + m[0] Out[101]: 5 
+5
source share

You can achieve this using itertools.groupby() with the following expression:

 >>> from itertools import groupby >>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true]) 3 
+4
source share

python simple solution, just recount values ​​from 0 to non-zero, tracking the previous value (leading edge detection):

 a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1] previous = 0 count = 0 for c in a: if previous==0 and c!=0: count+=1 previous = c print(count) # 3 
+2
source share
  • an array of folders with zero on both sides using np.concatenate
  • find zero with a == 0
  • find borders using np.diff
  • summarize the boundaries found with sum
  • divide by two because we will find twice as much as we want

 def nonzero_clusters(a): return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2) 

demonstration

 nonzero_clusters( [0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1] ) 3 

 nonzero_clusters([0, 1, 2, 0, 1, 2]) 2 

 nonzero_clusters([0, 1, 2, 0, 1, 2, 0]) 2 

 nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2]) 3 

time
a = np.random.choice((0, 1), 100000)
the code

 from itertools import groupby def div(a): m = a != 0 return (m[1:] > m[:-1]).sum() + m[0] def pir(a): return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2) def jean(a): previous = 0 count = 0 for c in a: if previous==0 and c!=0: count+=1 previous = c return count def moin(a): return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true]) def user(a): return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]]) 

enter image description here

+2
source share
 sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]]) 
+1
source share

All Articles