Finding data spaces (not padding) in pandas?

Question

Finding data spaces (not padding) in pandas?

I have temporary timers stored in the pandas framework with datetimeindex. Now I want to identify gaps in timeseries to identify continuous segments, to process them individually (and in some cases, glue segments with sufficiently short gaps between them). A.

There are two main ways I can do this. The first is re-indexing using various approaches to obtain regular time and to observe the filled NA values in the break areas. In my case, this leads to a lot of extra lines (i.e., Some long breaks). Then you need to take an extra step to identify continuous segments.

Another approach and what I'm currently using is to use np.diff to split the index and find spaces using np.where. But is there a more natural pandas approach to this? This seems like a pretty common task. I note that there are problems with np.diff and pandas with some combinations of numpy and pandas versions, so pandas solution would be preferable.

Which would be perfect, would be something like

for segment in data.continuous_segments(): # Process each segment

for data data.

+5

python numpy pandas

Bogdanovist May 20 '15 at 12:16

source share

1 answer

maxymoo · Accepted Answer · 2015-05-20T00:41:42+0000

This might work for you:

 df = pd.DataFrame([["2015-01-01",1],["2015-01-02",1],[np.nan,1],[np.nan,1],["2015-01-10",1],["2015-01-11",1]], columns = ['timestamp','value']) continuous_segments = df[df.timestamp.notnull()].groupby(df.timestamp.isnull().cumsum()) for segment in continuous_segments: print (segment[1]) timestamp value 0 2015-01-01 1 1 2015-01-02 1 timestamp value 4 2015-01-10 1 5 2015-01-11 1

Finding data spaces (not padding) in pandas?

More articles: