Comparing two DataFrames per column with returning three different outputs from Panadas

I start in Python and code. I need help comparing two data frames of different lengths and different column labels except one. A column that is the same between two datasets is the column that I want to compare using a dataframe. My data is as follows:

df: 'fruits' 'trees' 'sports' 'countries' bananas mongolia basketball Spain grapes Oak rugby Thailand oranges Osage Orange baseball Egypt apples Maple golf Chile df2: 'cars' 'flowers' 'countries' 'vegetables' Audi Rose Spain Carrots BMW Tulip Nigeria Celery Honda Dandelion Egypt Onion 

I would like to compare these two data blocks based on the column countries and create three separate outputs each in its own data frame. I used Pandas and used pd.concat to combine df1 and df2 into one. I would also like to keep the rest of the framework lines even if they do not match.

Here are my desired outputs:

Output # 1: values ​​in df NOT in df2:

  d3: 'fruits' 'trees' 'sports' 'countries' grapes Oak rugby Thailand apples Maple golf Chile 

Output # 2: values ​​in df2 NOT in df

  df4: 'cars' 'flowers' 'countries' 'vegetables' BMW Tulip Nigeria Celery 

Output # 3: Values ​​as in df and df2 (with columns from different data combinations).

 df5: 'fruits' 'trees' 'sports' 'cars' 'flowers' 'countries' 'vegetables' bananas mongolia basketball Audi Rose Spain Carrots Oranges Osage Orange baseball Honda Dandelion Egypt Onion 

Hope this all makes sense. I tried so many different things (isin, DataFrame.diff and .difference, df-df2, numpy arrays, etc.). I looked through everything and I can not find what I am looking for. Any help would be greatly appreciated! Thanks!

+6
source share
1 answer

Setup Reference

 from StringIO import StringIO import pandas as pd txt1 = """fruits,trees,sports,countries bananas,mongolia,basketball,Spain grapes,Oak,rugby,Thailand oranges,Osage,Orange baseball,Egypt apples,Maple,golf,Chile""" txt2 = """cars,flowers,countries,vegetables Audi,Rose,Spain,Carrots BMW,Tulip,Nigeria,Celery Honda,Dandelion,Egypt,Onion""" df = pd.read_csv(StringIO(txt1)) df2 = pd.read_csv(StringIO(txt2)) 

Decision

 def outer_parts(df1, df2): df3 = df1.merge(df2, indicator=True, how='outer') return {n: g.drop('_merge', 1) for n, g in df3.groupby('_merge')} dfs = outer_parts(df, df2) 

Demonstration

 dfs['both'] 

enter image description here

 dfs['left_only'] 

enter image description here

 dfs['right_only'] 

enter image description here

+3
source

All Articles