I start in Python and code. I need help comparing two data frames of different lengths and different column labels except one. A column that is the same between two datasets is the column that I want to compare using a dataframe. My data is as follows:
df: 'fruits' 'trees' 'sports' 'countries' bananas mongolia basketball Spain grapes Oak rugby Thailand oranges Osage Orange baseball Egypt apples Maple golf Chile df2: 'cars' 'flowers' 'countries' 'vegetables' Audi Rose Spain Carrots BMW Tulip Nigeria Celery Honda Dandelion Egypt Onion
I would like to compare these two data blocks based on the column countries and create three separate outputs each in its own data frame. I used Pandas and used pd.concat to combine df1 and df2 into one. I would also like to keep the rest of the framework lines even if they do not match.
Here are my desired outputs:
Output # 1: values ββin df NOT in df2:
d3: 'fruits' 'trees' 'sports' 'countries' grapes Oak rugby Thailand apples Maple golf Chile
Output # 2: values ββin df2 NOT in df
df4: 'cars' 'flowers' 'countries' 'vegetables' BMW Tulip Nigeria Celery
Output # 3: Values ββas in df and df2 (with columns from different data combinations).
df5: 'fruits' 'trees' 'sports' 'cars' 'flowers' 'countries' 'vegetables' bananas mongolia basketball Audi Rose Spain Carrots Oranges Osage Orange baseball Honda Dandelion Egypt Onion
Hope this all makes sense. I tried so many different things (isin, DataFrame.diff and .difference, df-df2, numpy arrays, etc.). I looked through everything and I can not find what I am looking for. Any help would be greatly appreciated! Thanks!
source share