Pandas: replace column values based on matching from another column

Question

Pandas: replace column values based on matching from another column

I have a column in the first data frame df1["ItemType"] , as shown below,

Dataframe1

 ItemType1 redTomato whitePotato yellowPotato greenCauliflower yellowCauliflower yelloSquash redOnions YellowOnions WhiteOnions yellowCabbage GreenCabbage

I need to replace this based on a dictionary created from another data frame.

Dataframe2

 ItemType2 newType whitePotato Potato yellowPotato Potato redTomato Tomato yellowCabbage GreenCabbage yellowCauliflower yellowCauliflower greenCauliflower greenCauliflower YellowOnions Onions WhiteOnions Onions yelloSquash Squash redOnions Onions

note that

In dataframe2 some of the ItemType are the same as ItemType in dataframe1 .
Some ItemType in dataframe2 are null , such as yellowCabbage.
ItemType in dataframe2 do not match the order of ItemType in dataframe

I need to replace the values in the column dataframe1 ItemType if there is a match for the value in the corresponding dataframe2 ItemType with newType containing the above exceptions listed in bullet points. If there is no match, then the values should be what they are [no change].

I have so far.

 import pandas as pd #read second `csv-file` df2 = pd.read_csv('mappings.csv',names = ["ItemType", "newType"]) #conver to dict df2=df2.set_index('ItemType').T.to_dict('list')

The following replacement by coincidence does not work. They insert NaN values instead of the actual. They are based on a discussion here on SO.

 df1.loc[df1['ItemType'].isin(df2['ItemType'])]=df2[['NewType']]

OR

 df1['ItemType']=df2['ItemType'].map(df2)

Thank you in advance

EDIT
The two column headings in both data frames have different names. So the column dataframe1 on is ItemType1, and the first column in the second data frame is ItemType2. I skipped this on my first edit.

+5

python python-2.7 pandas dataframe

Anil_m Jul 19 '16 at 19:11

source share

3 answers

You can convert df2 to a string with the index 'ItemType2' and then use replace in df1 :

 # Make df2 a Series indexed by 'ItemType'. df2 = df2.set_index('ItemType2')['newType'].dropna() # Replace values in df1. df1['ItemType1'] = df1['ItemType1'].replace(df2)

Or in one line if you do not want to change df2 :

 df1['ItemType1'] = df1['ItemType1'].replace(df2.set_index('ItemType2')['newType'].dropna())

+4

root Jul 19 '16 at 19:28

source share

This method requires you to set the column names to "type", then you can specify using merge and np.where

 df3 = df1.merge(df2,how='inner',on='type')['type','newType'] df3['newType'] = np.where(df['newType'].isnull(),df['type'],df['newType'])

+3

draco_alpine Jul 19 '16 at 19:17

source share

piRSquared · Accepted Answer · 2016-07-19T19:42:25+0000

Use map

All the necessary logic:

 def update_type(t1, t2, dropna=False): return t1.map(t2).dropna() if dropna else t1.map(t2).fillna(t1)

Denote 'ItemType2' index Dataframe2

 update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType) 0 Tomato 1 Potato 2 Potato 3 greenCauliflower 4 yellowCauliflower 5 Squash 6 Onions 7 Onions 8 Onions 9 yellowCabbage 10 GreenCabbage Name: ItemType1, dtype: object

 update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType, dropna=True) 0 Tomato 1 Potato 2 Potato 3 greenCauliflower 4 yellowCauliflower 5 Squash 6 Onions 7 Onions 8 Onions Name: ItemType1, dtype: object

Verify

 updated = update_type(Dataframe1.ItemType1, Dataframe2.set_index('ItemType2').newType) pd.concat([Dataframe1, updated], axis=1, keys=['old', 'new'])

Timing

 def root(Dataframe1, Dataframe2): return Dataframe1['ItemType1'].replace(Dataframe2.set_index('ItemType2')['newType'].dropna()) def piRSquared(Dataframe1, Dataframe2): t1 = Dataframe1.ItemType1 t2 = Dataframe2.set_index('ItemType2').newType return update_type(t1, t2)

Pandas: replace column values ​​based on matching from another column

Verify

Timing

More articles:

Pandas: replace column values based on matching from another column