Pandas is an efficient way to get the first filtered row for every DatetimeIndex record

I have a DataFrame with the following structure:

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3333 entries, 2000-01-03 00:00:00+00:00 to 2012-11-21 00:00:00+00:00 Data columns: open 3333 non-null values high 3333 non-null values low 3333 non-null values close 3333 non-null values volume 3333 non-null values amount 3333 non-null values pct_change 3332 non-null values dtypes: float64(7) 

The pct_change column contains percent change data.

Given the filtered DatetimeIndex from the DataFrame above:

 <class 'pandas.tseries.index.DatetimeIndex'> [2000-03-01 00:00:00, ..., 2012-11-01 00:00:00] Length: 195, Freq: None, Timezone: UTC 

I want to filter the beginning of each entry and return the first row where the pct_change column pct_change less than 0.015.

I came up with this solution, but it is very slow:

 stops = [] #dates = DatetimeIndex for d in dates: #check if pct_change is below -0.015 starting from date of signal. return date of first match match = df[df["pct_change"] < -0.015].ix[d:][:1].index stops.append([df.ix[d]["close"], df.ix[match]["close"].values[0]]) 

Any suggestions on how I can improve this?

+4
source share
2 answers

How about this:

result = df[df.pct_change < -0.015].reindex(filtered_dates, method='bfill')

The only problem is that if the interval does NOT contain a value below -0.015, it will retrieve one of the next interval. If you add a column containing the date, you will see the time at which each row arrived, and then set the rows to NA if the found timestamp exceeds the next “hopper edge”.

+2
source

You can quickly find the index as a column and use apply and bfill .
Something like that:

 df['datetime'] = df.index df['stops'] = df.apply(lambda x: x['datetime'] if x['pct_change'] < -0.015 else np.nan, axis=1) df['stops'] = df['stops'].bfill() 
+2
source

All Articles