Pandas compare two data frames and delete matches in one column

I have two separate pandas dataframes ( df1 and df2 ) that have several columns, but only one thing in common ("text").

I would like to find every row in df2 that does not have a match in any of the rows in the column that have df2 and df1 .

df1

 AB text 45 2 score 33 5 miss 20 1 score 

df2

 CD text .5 2 shot .3 2 shot .3 1 miss 

Result df (delete the line containing the gaps, as it occurs in df1)

 CD text .5 2 shot .3 2 shot 

Is it possible to use the isin method in this scenario?

+8
python pandas
source share
3 answers

As you requested, you can do this efficiently using isin (without resorting to expensive merge s).

 >>> df2[~df2.text.isin(df1.text.values)] CD text 0 0.5 2 shot 1 0.3 2 shot 
+8
source share

EDIT:

 import numpy as np mergeddf = pd.merge(df2,df1, how="left") result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']] 
+1
source share

You can combine them and save only strings with NaN.

 df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)] 

or you can use isin :

 df2[~df2.text.isin(df1.text)] 
+1
source share

All Articles