Duration of continuous operation without unnecessary values

Question

Duration of continuous operation without unnecessary values

Looking for a fast vector function that returns a moving number of consecutive nonzero values. The count should start from zero when zero occurs. The result should have the same shape as the input array.

For such an array:

x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])

The function should return this:

 array([1, 2, 3, 0, 0, 1, 0, 1, 2])

+5

performance python arrays vectorization numpy

steve Apr 26 '15 at 2:27

source share

2 answers

You can use itertools.groupby and np.hstack :

 >>> import numpy as np >>> x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1]) >>> from itertools import groupby >>> np.hstack([[i if j!=0 else j for i,j in enumerate(g,1)] for _,g in groupby(x,key=lambda x: x!=0)]) array([ 1., 2., 3., 0., 0., 1., 0., 1., 2.])

We can group the array based on nonzero elements, and then use a list comprehension and list the replacement of nonzero submatrices with this index, then smooth the list with np.hstack .

+2

Kasramvd Apr 26 '15 at 2:55

source share

Divakar · Accepted Answer · 2015-04-26T04:16:44+0000

This post presents a vectorial approach that consists mainly of two steps:

Initialize a zero vector of the same size as the input vector, x, and set it in places corresponding to non-zeros x .
Then, in this vector, we must place minus the long lines of each island immediately after the end / stop positions for each "island". The goal is to use cumsum again later, which will result in serial numbers for the “islands” and zeros elsewhere.

Here's the implementation -

 import numpy as np #Append zeros at the start and end of input array, x xa = np.hstack([[0],x,[0]]) # Get an array of ones and zeros, with ones for nonzeros of x and zeros elsewhere xa1 =(xa!=0)+0 # Find consecutive differences on xa1 xadf = np.diff(xa1) # Find start and stop+1 indices and thus the lengths of "islands" of non-zeros starts = np.where(xadf==1)[0] stops_p1 = np.where(xadf==-1)[0] lens = stops_p1 - starts # Mark indices where "minus ones" are to be put for applying cumsum put_m1 = stops_p1[[stops_p1 < x.size]] # Setup vector with ones for nonzero x's, "minus lens" at stops +1 & zeros elsewhere vec = xa1[1:-1] # Note: this will change xa1, but it okay as not needed anymore vec[put_m1] = -lens[0:put_m1.size] # Perform cumsum to get the desired output out = vec.cumsum()

Run Example -

 In [116]: x Out[116]: array([ 0. , 2.3, 1.2, 4.1, 0. , 0. , 5.3, 0. , 1.2, 3.1, 0. ]) In [117]: out Out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)

Runtime Tests -

Here some runtime tests compare the proposed approach with another itertools.groupby based approach -

 In [21]: N = 1000000 ...: x = np.random.rand(1,N) ...: x[x>0.5] = 0.0 ...: x = x.ravel() ...: In [19]: %timeit sumrunlen_vectorized(x) 10 loops, best of 3: 19.9 ms per loop In [20]: %timeit sumrunlen_loopy(x) 1 loops, best of 3: 2.86 s per loop

Duration of continuous operation without unnecessary values

More articles: