Python Pandas map Vlookup columns based on header values

Question

Python Pandas map Vlookup columns based on header values

I have the following dataframe df:

Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing ABC 5 6 10 2015 BCD 6 7 3 2016 DEF 10 4 5 2017 GHI 8 7 10 2016

I would like to see customer value the year they joined the mailing list and save it in a new column.

Output:

 Customer_ID | 2015 | 2016 |2017 | Year_joined_mailing | Purchases_1st_year ABC 5 6 10 2015 5 BCD 6 7 3 2016 7 DEF 10 4 5 2017 5 GHI 8 9 10 2016 9

I found several solutions for vlookup match in python, but none of them would use other column headers.

+7

python pandas match lookup

jeangelj Jul 19 '17 at 17:44

source share

3 answers

you can apply "apply" to each row

 df.apply(lambda x: x[x['Year_joined_mailing']],axis=1)

+2

galaxyan Jul 19 '17 at 17:52

source share

I would do it this way, assuming that the column headers and Year_joined_mailing are the same data type and that all Year_joined_mailing values are valid columns. If the data types do not match, you can convert them by adding str() or int() where necessary.

 df['Purchases_1st_year'] = [df[df['Year_joined_mailing'][i]][i] for i in df.index]

What we are doing here is iterating over the indices in the dataframe to get the 'Year_joined_mailing' field for that index, then using it to get the column you want and again selecting that index from the column, clicking everything in the list and assigning it to our new column 'Year_joined_mailing'

If your column 'Year_joined_mailing' not always the correct column name, try:

 from numpy import nan new_col = [] for i in df.index: try: new_col.append(df[df['Year_joined_mailing'][i]][i]) except IndexError: new_col.append(nan) #or whatever null value you want here) df['Purchases_1st_year'] = new_col

This longer piece of code does the same thing, but does not 'Year_joined_mailing' if 'Year_joined_mailing' not in df.columns

+1

Jeremy barnes Jul 19 '17 at 17:57

source share

piRSquared · Accepted Answer · 2017-07-19T17:49:11+0000

Use pd.DataFrame.lookup
Keep in mind that I am assuming Customer_ID is an index.

 df.lookup(df.index, df.Year_joined_mailing) array([5, 7, 5, 7])

 df.assign( Purchases_1st_year=df.lookup(df.index, df.Year_joined_mailing) ) 2015 2016 2017 Year_joined_mailing Purchases_1st_year Customer_ID ABC 5 6 10 2015 5 BCD 6 7 3 2016 7 DEF 10 4 5 2017 5 GHI 8 7 10 2016 7

However, you should be careful when comparing possible rows in column names and integers in a first year column ...

The nuclear option to ensure type comparisons is respected.

 df.assign( Purchases_1st_year=df.rename(columns=str).lookup( df.index, df.Year_joined_mailing.astype(str) ) ) 2015 2016 2017 Year_joined_mailing Purchases_1st_year Customer_ID ABC 5 6 10 2015 5 BCD 6 7 3 2016 7 DEF 10 4 5 2017 5 GHI 8 7 10 2016 7

Python Pandas map Vlookup columns based on header values

More articles: