I would like to slice a dataframe to return rows where the element x = 0 appears sequentially at least n = 3 times, and then discarding the first i = 2 instance in each mini-sequence
is there an effective way to achieve in pandas, and if not, using numpy or scipy?
import pandas as pd import numpy as np
Example 1
df=pd.DataFrame({'A':[0,1,0,0,1,1,0,0,0,0,1,1,0,0,0,1,1],'B':np.random.randn(17)}) AB 0 0 0.748958 1 1 0.254730 2 0 0.629609 3 0 0.272738 4 1 -1.885906 5 1 1.206371 6 0 -0.332471 7 0 0.217553 8 0 0.768986 9 0 -1.607236 10 1 1.613650 11 1 -1.096892 12 0 -0.435762 13 0 0.131284 14 0 -0.177188 15 1 1.393890 16 1 0.174803
Required Conclusion:
AB 8 0 0.768986 9 0 -1.607236 14 0 -0.177188
Example 2
x = 0 (element of interest)
n = 5 (minimum sequence length)
i = 2 (discard first two in each sequence)
df2=pd.DataFrame({'A':[0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0],'B':np.random.randn(20)}) AB 0 0 0.703803 1 0 -0.144088 2 0 0.635577 3 0 -0.834611 4 0 1.472271 5 0 -0.554860 6 0 -0.167016 7 1 0.578847 8 1 -1.873663 9 0 0.197062 10 0 1.458845 11 0 -1.921660 12 0 -1.301481 13 0 0.240197 14 0 -1.425058 15 1 -2.801151 16 0 0.766757 17 0 1.249806 18 0 0.595366 19 0 -1.447632
Required Conclusion:
AB 2 0 0.635577 3 0 -0.834611 4 0 1.472271 5 0 -0.554860 6 0 -0.167016 11 0 -1.921660 12 0 -1.301481 13 0 0.240197 14 0 -1.425058