Pandas compare two data frames and delete matches in one column

Question

I have two separate pandas dataframes ( df1 and df2 ) that have several columns, but only one thing in common ("text").

I would like to find every row in df2 that does not have a match in any of the rows in the column that have df2 and df1 .

df1

 AB text 45 2 score 33 5 miss 20 1 score

df2

 CD text .5 2 shot .3 2 shot .3 1 miss

Result df (delete the line containing the gaps, as it occurs in df1)

 CD text .5 2 shot .3 2 shot

Is it possible to use the isin method in this scenario?

+8

python pandas

GNMO11 Dec 22 '15 at 14:21

source share

3 answers

EDIT:

 import numpy as np mergeddf = pd.merge(df2,df1, how="left") result = mergeddf[(np.isnan(mergeddf['A']))][['C','D','text']]

+1

Shahram Dec 22 '15 at 14:42

source share

You can combine them and save only strings with NaN.

 df2[pd.merge(df1, df2, how='outer').isnull().any(axis=1)]

or you can use isin :

 df2[~df2.text.isin(df1.text)]

+1

Julien Spronck Dec 22 '15 at 14:45

source share

Ami tavory · Accepted Answer · 2015-12-22T14:39:00+0000

As you requested, you can do this efficiently using isin (without resorting to expensive merge s).

 >>> df2[~df2.text.isin(df1.text.values)] CD text 0 0.5 2 shot 1 0.3 2 shot