Find the same data in two DataFrames of different shapes

I have two Pandas DataFrames that I would like to compare. for instance

abc A na na na B na 1 1 C na 1 na 

and

  abc A 1 na 1 B na na na C na 1 na D na 1 na 

I want to find the index-column coordinates for any common values, in this case

  b C 1 

Is it possible?

+6
source share
2 answers

If you pass the keys parameter to concat , the columns of the resulting frame will consist of a multi-index that tracks the source data:

 In [1]: c=pd.concat([df,df2],axis=1,keys=['df1','df2']) c Out[1]: df1 df2 abcabc A na na na 1 na 1 B na 1 1 na na na C na 1 na na 1 na D NaN NaN NaN na 1 na 

Since base arrays now have the same shape, you can now use == to translate your comparison and use it as a mask to return all the relevant values:

 In [171]: m=c.df1[c.df1==c.df2];m Out[171]: abc A NaN NaN NaN B NaN NaN NaN C NaN 1 NaN D NaN NaN NaN 

If your "na" value is actually zero, you can use a sparse matrix to reduce it to the coordinates of the corresponding values ​​(you will lose the index and column names):

 import scipy.sparse as sp print(sp.coo_matrix(m.where(m.notnull(),0))) (2, 1) 1.0 
+4
source

If you just need different indexes, you can do: different_indices = [(i,j) for i in range(len((df1 != df2).columns)) for j in range(len(df1 != df2)) if (df1 != df2)[i][j]]

Or, read a little:

 m = (df1 != df2) different_indices = [(i,j) for i in range(len(m.columns)) for j in range(len(m)) if m[i][j]] 
0
source

All Articles