Here is another approach. It is cleaner, more efficient, and has the advantage that columns can be empty (in this case, the entire data frame is returned).
def filter(df, value, *columns): return df.loc[df.loc[:, columns].eq(value).all(axis=1)]
Explanation
values = df.loc[:, columns] selects only the columns of interest to us.masks = values.eq(value) provides a buffer data frame indicating equality with the target value.mask = masks.all(axis=1) applies AND over the columns (returns the index mask). Note that you can use masks.any(axis=1) for OR.return df.loc[mask] applies the index mask to the data frame.
Demo
import numpy as np import pandas as pd df = pd.DataFrame(np.random.randint(0, 2, (100, 3)), columns=list('ABC'))
Alternative
For a small number of columns (<5), the next solution based on steven answer is more efficient than above, although less flexible. As-is, it will not work for an empty set of columns and will not work using different values โโfor each column.
from operator import and_ def filter(df, value, *columns): return df.loc[reduce(and_, (df[column] == value for column in columns))]
Getting the Series object with the key ( df[column] ) is much faster than creating a DataFrame object around a subset of columns ( df.loc[:, columns] ).
In [4]: %timeit df['A'] == 1 100 loops, best of 3: 17.3 ms per loop In [5]: %timeit df.loc[:, ['A']] == 1 10 loops, best of 3: 48.6 ms per loop
However, this acceleration becomes negligible when working with a large number of columns. The bottleneck becomes ANDing masks together, for which reduce(and_, ...) much slower than Pandas builtin all(axis=1) .
Igor Raush
source share