Pandas: replace column values ​​based on matching from another column

I have a column in the first data frame df1["ItemType"] , as shown below,

Dataframe1

 ItemType1 redTomato whitePotato yellowPotato greenCauliflower yellowCauliflower yelloSquash redOnions YellowOnions WhiteOnions yellowCabbage GreenCabbage 

I need to replace this based on a dictionary created from another data frame.

Dataframe2

 ItemType2 newType whitePotato Potato yellowPotato Potato redTomato Tomato yellowCabbage GreenCabbage yellowCauliflower yellowCauliflower greenCauliflower greenCauliflower YellowOnions Onions WhiteOnions Onions yelloSquash Squash redOnions Onions 

note that

  • In dataframe2 some of the ItemType are the same as ItemType in dataframe1 .
  • Some ItemType in dataframe2 are null , such as yellowCabbage.
  • ItemType in dataframe2 do not match the order of ItemType in dataframe

I need to replace the values ​​in the column dataframe1 ItemType if there is a match for the value in the corresponding dataframe2 ItemType with newType containing the above exceptions listed in bullet points. If there is no match, then the values ​​should be what they are [no change].

I have so far.

 import pandas as pd #read second `csv-file` df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"]) #conver to dict df2=df2.set_index('ItemType').T.to_dict('list') 

The following replacement by coincidence does not work. They insert NaN values ​​instead of the actual. They are based on a discussion here on SO.

 df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']] 

OR

 df1['ItemType']=df2['ItemType'].map(df2) 

Thank you in advance

EDIT
The two column headings in both data frames have different names. So the column dataframe1 on is ItemType1, and the first column in the second data frame is ItemType2. I skipped this on my first edit.

+5
source share
3 answers

Use map

All the necessary logic:

 def update_type(t1, t2, dropna=False): return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1) 

Denote 'ItemType2' index Dataframe2

 update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType) 0 Tomato 1 Potato 2 Potato 3 greenCauliflower 4 yellowCauliflower 5 Squash 6 Onions 7 Onions 8 Onions 9 yellowCabbage 10 GreenCabbage Name: ItemType1, dtype: object 

 update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType, dropna=True) 0 Tomato 1 Potato 2 Potato 3 greenCauliflower 4 yellowCauliflower 5 Squash 6 Onions 7 Onions 8 Onions Name: ItemType1, dtype: object 

Verify

 updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType) pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new']) 

enter image description here


Timing

 def root(Dataframe1, Dataframe2): return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna()) def piRSquared(Dataframe1, Dataframe2): t1 = Dataframe1.ItemType1 t2 = Dataframe2.set_index('ItemType2').newType return update_type(t1, t2) 

enter image description here

+3
source

You can convert df2 to a string with the index 'ItemType2' and then use replace in df1 :

 # Make df2 a Series indexed by 'ItemType'. df2 = df2.set_index('ItemType2')['newType'].dropna() # Replace values in df1. df1['ItemType1'] = df1['ItemType1'].replace(df2) 

Or in one line if you do not want to change df2 :

 df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna()) 
+4
source

This method requires you to set the column names to "type", then you can specify using merge and np.where

 df3 = df1.merge(df2,how='inner',on='type')['type','newType'] df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType']) 
+3
source

All Articles