Select rows containing specific values from pandas dataframe

Question

Select rows containing specific values from pandas dataframe

I have a pandas dataframe whose records are all rows:

ABC 1 apple banana pear 2 pear pear apple 3 banana pear pear 4 apple apple pear

etc .. I want to select all the lines containing a specific line, say, "banana". I do not know in which column he will appear every time. Of course, I can write a for loop and iterate over all the lines. But is there an easier or faster way to do this?

+6

python pandas

mcglashan Jul 04 '16 at 13:14

source share

3 answers

For single search value

 df[df.values == "banana"]

or

  df[df.isin(['banana'])]

For several search terms:

  df[(df.values == "banana")|(df.values == "apple" ) ]

or

 df[df.isin(['banana', "apple"])] # ABC # 1 apple banana NaN # 2 NaN NaN apple # 3 banana NaN NaN # 4 apple apple NaN

From Divakar: rows with both are returned.

 select_rows(df,['apple','banana']) # ABC # 0 apple banana pear

+4

Merlin Jul 04 '16 at 15:06

source share

You can create a logical mask from comparing the whole df with your string and call dropna to pass param how='all' to remove the lines where your string does not appear in all cols:

 In [59]: df[df == 'banana'].dropna(how='all') Out[59]: ABC 1 NaN banana NaN 3 banana NaN NaN

To check multiple values, you can use several masks:

 In [90]: banana = df[(df=='banana')].dropna(how='all') banana Out[90]: ABC 1 NaN banana NaN 3 banana NaN NaN In [91]: apple = df[(df=='apple')].dropna(how='all') apple Out[91]: ABC 1 apple NaN NaN 2 NaN NaN apple 4 apple apple NaN

You can use index.intersection to index only general index values:

 In [93]: df.loc[apple.index.intersection(banana.index)] Out[93]: ABC 1 apple banana pear

+3

Edchum Jul 04 '16 at 13:15

source share

Divakar · Accepted Answer · 2016-07-04T13:41:05+0000

In NumPy, it can be vectorized to find as many lines as possible, for example:

 def select_rows(df,search_strings): unq,IDs = np.unique(df,return_inverse=True) unqIDs = np.searchsorted(unq,search_strings) return df[((IDs.reshape(df.shape) == unqIDs[:,None,None]).any(-1)).all(0)]

Run Example -

 In [393]: df Out[393]: ABC 0 apple banana pear 1 pear pear apple 2 banana pear pear 3 apple apple pear In [394]: select_rows(df,['apple','banana']) Out[394]: ABC 0 apple banana pear In [395]: select_rows(df,['apple','pear']) Out[395]: ABC 0 apple banana pear 1 pear pear apple 3 apple apple pear In [396]: select_rows(df,['apple','banana','pear']) Out[396]: ABC 0 apple banana pear

Select rows containing specific values ​​from pandas dataframe

More articles:

Select rows containing specific values from pandas dataframe