I have an irregularly indexed time series of data with a resolution of seconds, for example:
import pandas as pd idx = ['2012-01-01 12:43:35', '2012-03-12 15:46:43', '2012-09-26 18:35:11', '2012-11-11 2:34:59'] status = [1, 0, 1, 0] df = pd.DataFrame(status, index=idx, columns = ['status']) df = df.reindex(pd.to_datetime(df.index)) In [62]: df Out[62]: status 2012-01-01 12:43:35 1 2012-03-12 15:46:43 0 2012-09-26 18:35:11 1 2012-11-11 02:34:59 0
and I'm interested in the fraction of the year when the status is 1. As I do now, I review df every second of the year and use formatting as follows:
full_idx = pd.date_range(start = '1/1/2012', end = '12/31/2012', freq='s') df1 = df.reindex(full_idx, method='ffill')
which returns a DataFrame that contains every second for a year, which I can then calculate the average to see the percentage of time in status 1 , for example:
In [66]: df1 Out[66]: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 31536001 entries, 2012-01-01 00:00:00 to 2012-12-31 00:00:00 Freq: S Data columns: status 31490186 non-null values dtypes: float64(1) In [67]: df1.status.mean() Out[67]: 0.31953371123308066
The problem is that I have to do this for a lot of data, and reindexing it every second of the year is the most expensive operation.
What are the best ways to do this?
source share