Pandas is an efficient way to get the first filtered row for every DatetimeIndex record

Question

Pandas is an efficient way to get the first filtered row for every DatetimeIndex record

I have a DataFrame with the following structure:

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3333 entries, 2000-01-03 00:00:00+00:00 to 2012-11-21 00:00:00+00:00 Data columns: open 3333 non-null values high 3333 non-null values low 3333 non-null values close 3333 non-null values volume 3333 non-null values amount 3333 non-null values pct_change 3332 non-null values dtypes: float64(7)

The pct_change column contains percent change data.

Given the filtered DatetimeIndex from the DataFrame above:

 <class 'pandas.tseries.index.DatetimeIndex'> [2000-03-01 00:00:00, ..., 2012-11-01 00:00:00] Length: 195, Freq: None, Timezone: UTC

I want to filter the beginning of each entry and return the first row where the pct_change column pct_change less than 0.015.

I came up with this solution, but it is very slow:

 stops = [] #dates = DatetimeIndex for d in dates: #check if pct_change is below -0.015 starting from date of signal. return date of first match match = df[df["pct_change"] < -0.015].ix[d:][:1].index stops.append([df.ix[d]["close"], df.ix[match]["close"].values[0]])

Any suggestions on how I can improve this?

+4

python numpy pandas time-series

trbck Dec 29 '12 at 17:37

source share

2 answers

You can quickly find the index as a column and use apply and bfill .
Something like that:

 df['datetime'] = df.index df['stops'] = df.apply(lambda x: x['datetime'] if x['pct_change'] < -0.015 else np.nan, axis=1) df['stops'] = df['stops'].bfill()

+2

Andy hayden Dec 29 '12 at 21:41

source share

Wes mckinney · Accepted Answer · 2013-01-02T19:58:16+0000

How about this:

result = df[df.pct_change < -0.015].reindex(filtered_dates, method='bfill')

The only problem is that if the interval does NOT contain a value below -0.015, it will retrieve one of the next interval. If you add a column containing the date, you will see the time at which each row arrived, and then set the rows to NA if the found timestamp exceeds the next “hopper edge”.

Pandas is an efficient way to get the first filtered row for every DatetimeIndex record

More articles: