I have a large numeric Pandas dataframe df, and I want to highlight rows whose specific column value is in the range min_valueand max_value.
I can do it:
filtered_df = df[(df[col_name].values >= min_value) & (df[col_name].values <= max_value)]
And I'm looking for methods to speed it up. I am trying to do the following:
df.sort(col_name, inplace=True)
left_idx = np.searchsorted(df[col_name].values, min_value, side='left')
right_idx = np.searchsorted(df[col_name].values, max_value, side='right')
filtered_df = df[left_idx:right_idx]
But this does not work for df.sort () costs more time.
So, any tips for speeding up the selection?
(Pandas 0.11)
source
share