I am very new to Pandas and Python, so forgive me if this is the main question. In an attempt to solve my problem: Download a few csv files, find the missing product identifier in the subsequent files, calculate the sold date based on it , I made some changes to the way I clean these files. I have the following columns in a data frame loaded from multiple csv files.
store_id stock_number merchandise_id date_acquired color price MSRP csv_date 12973 7382 UISN78008 04/11/2017 Red $3200 $3650 01/31/2017 45973 9889 YHAN79807 08/09/2017 White $3600 $3650 01/31/2017 ... 45973 9889 YHAN79807 08/09/2017 White $3600 $3650 03/31/2017
The last column is the last occurrence of the item with the item 'YHAN79807'. I managed to find the last event, following How to detect the first occurrence of duplicate rows in the Python Pandas Dataframe and change it a bit. I used
df1['dup_index'] = df1.index.map(lambda ind: g.indices[ind][len(g.indices[ind])-1])
However, I want to set this value for the dup_index column only for the last occurrence of YHAN79807 as the product identifier. I donβt want the rest of the duplicate rows for βYHAN79807β to be like the product ID to have this value. They must be empty. This identifier should only have the last case. I have not been able to do this yet. I tried several things, one of them:
group = df1.groupby(['merchandiseID']) df1_index = df1.set_index(['merchandiseID']) df1[ (((len(group.indices[ind])-1)==group.indices[df1.merchandiseID])]['dup_index'] = 'succeed'
I tried adding βsuccessβ as a first step to see if comparing the columns would produce the result, but this gave me the following error:
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
result = getattr (x, name) (y) ... raise TypeError ('Cannot compare% s with series'%
I am on my way. What am I missing? Any pointers are appreciated.
it's better
Alice
source share