Find the same data in two DataFrames of different shapes

Question

Find the same data in two DataFrames of different shapes

I have two Pandas DataFrames that I would like to compare. for instance

abc A na na na B na 1 1 C na 1 na

and

  abc A 1 na 1 B na na na C na 1 na D na 1 na

I want to find the index-column coordinates for any common values, in this case

  b C 1

Is it possible?

+6

python pandas

gwg Nov 10 '15 at 10:01

source share

2 answers

If you just need different indexes, you can do: different_indices = [(i,j) for i in range(len((df1 != df2).columns)) for j in range(len(df1 != df2)) if (df1 != df2)[i][j]]

Or, read a little:

 m = (df1 != df2) different_indices = [(i,j) for i in range(len(m.columns)) for j in range(len(m)) if m[i][j]]

0

rofls Nov 10 '15 at 10:23

source share

maxymoo · Accepted Answer · 2015-11-10T22:22:04+0000

If you pass the keys parameter to concat , the columns of the resulting frame will consist of a multi-index that tracks the source data:

 In [1]: c=pd.concat([df,df2],axis=1,keys=['df1','df2']) c Out[1]: df1 df2 abcabc A na na na 1 na 1 B na 1 1 na na na C na 1 na na 1 na D NaN NaN NaN na 1 na

Since base arrays now have the same shape, you can now use == to translate your comparison and use it as a mask to return all the relevant values:

 In [171]: m=c.df1[c.df1==c.df2];m Out[171]: abc A NaN NaN NaN B NaN NaN NaN C NaN 1 NaN D NaN NaN NaN

If your "na" value is actually zero, you can use a sparse matrix to reduce it to the coordinates of the corresponding values (you will lose the index and column names):

 import scipy.sparse as sp print(sp.coo_matrix(m.where(m.notnull(),0))) (2, 1) 1.0

Find the same data in two DataFrames of different shapes

More articles: