Finding data spaces (not padding) in pandas?

I have temporary timers stored in the pandas framework with datetimeindex. Now I want to identify gaps in timeseries to identify continuous segments, to process them individually (and in some cases, glue segments with sufficiently short gaps between them). A.

There are two main ways I can do this. The first is re-indexing using various approaches to obtain regular time and to observe the filled NA values ​​in the break areas. In my case, this leads to a lot of extra lines (i.e., Some long breaks). Then you need to take an extra step to identify continuous segments.

Another approach and what I'm currently using is to use np.diff to split the index and find spaces using np.where. But is there a more natural pandas approach to this? This seems like a pretty common task. I note that there are problems with np.diff and pandas with some combinations of numpy and pandas versions, so pandas solution would be preferable.

Which would be perfect, would be something like

for segment in data.continuous_segments(): # Process each segment 

for data data.

+5
source share
1 answer

This might work for you:

 df = pd.DataFrame([["2015-01-01",1],["2015-01-02",1],[np.nan,1],[np.nan,1],["2015-01-10",1],["2015-01-11",1]], columns = ['timestamp','value']) continuous_segments = df[df.timestamp.notnull()].groupby(df.timestamp.isnull().cumsum()) for segment in continuous_segments: print (segment[1]) timestamp value 0 2015-01-01 1 1 2015-01-02 1 timestamp value 4 2015-01-10 1 5 2015-01-11 1 
+3
source

All Articles