Select rows containing specific values ​​from pandas dataframe

I have a pandas dataframe whose records are all rows:

ABC 1 apple banana pear 2 pear pear apple 3 banana pear pear 4 apple apple pear 

etc .. I want to select all the lines containing a specific line, say, "banana". I do not know in which column he will appear every time. Of course, I can write a for loop and iterate over all the lines. But is there an easier or faster way to do this?

+6
source share
3 answers

In NumPy, it can be vectorized to find as many lines as possible, for example:

 def select_rows(df,search_strings): unq,IDs = np.unique(df,return_inverse=True) unqIDs = np.searchsorted(unq,search_strings) return df[((IDs.reshape(df.shape) == unqIDs[:,None,None]).any(-1)).all(0)] 

Run Example -

 In [393]: df Out[393]: ABC 0 apple banana pear 1 pear pear apple 2 banana pear pear 3 apple apple pear In [394]: select_rows(df,['apple','banana']) Out[394]: ABC 0 apple banana pear In [395]: select_rows(df,['apple','pear']) Out[395]: ABC 0 apple banana pear 1 pear pear apple 3 apple apple pear In [396]: select_rows(df,['apple','banana','pear']) Out[396]: ABC 0 apple banana pear 
+3
source

For single search value

 df[df.values == "banana"] 

or

  df[df.isin(['banana'])] 

For several search terms:

  df[(df.values == "banana")|(df.values == "apple" ) ] 

or

 df[df.isin(['banana', "apple"])] # ABC # 1 apple banana NaN # 2 NaN NaN apple # 3 banana NaN NaN # 4 apple apple NaN 

From Divakar: rows with both are returned.

 select_rows(df,['apple','banana']) # ABC # 0 apple banana pear 
+4
source

You can create a logical mask from comparing the whole df with your string and call dropna to pass param how='all' to remove the lines where your string does not appear in all cols:

 In [59]: df[df == 'banana'].dropna(how='all') Out[59]: ABC 1 NaN banana NaN 3 banana NaN NaN 

To check multiple values, you can use several masks:

 In [90]: banana = df[(df=='banana')].dropna(how='all') banana Out[90]: ABC 1 NaN banana NaN 3 banana NaN NaN In [91]: apple = df[(df=='apple')].dropna(how='all') apple Out[91]: ABC 1 apple NaN NaN 2 NaN NaN apple 4 apple apple NaN 

You can use index.intersection to index only general index values:

 In [93]: df.loc[apple.index.intersection(banana.index)] Out[93]: ABC 1 apple banana pear 
+3
source

All Articles