Download some csv files, find the missing product identifier in the following files, calculate the date sold on it

I am relatively new to Pandas and Python, so forgive me if this is the main question. I looked around, but could not find a solution.

I have several csv files, one for each month, which, among other things:

inventory_01312017.csv store_id stock_number merchandise_id date_acquired color price MSRP photo url 12973 7382 UISN78008 04/11/2017 Red $3200 $3650 ... 45973 9889 YHAN79807 08/09/2017 White $3600 $3650 ... inventory_02282017.csv store_id stock_number merchandise_id date_acquired color price MSRP photo url 45973 9889 YHAN79807 08/09/2017 White $3600 $3650 ... 

I need to download these files, which I could bypass without problems using

 import pandas as pd ... prep_data1 = pd.read_csv("../data/inventory_0131170401.csv") prep_data2 = pd.read_csv("../data/inventory_0201170456.csv") prep_data = pd.concat([prep_data1, prep_data2], ignore_index=True) ... prep_data['age_months'] = age_count(prep_data['date_acquired']) #Adding a new column to get integer value for age 

Now I need to scan these files and find out when a certain item was sold with the seller merchandise_id, and create a new column in the data frame and put it in the last CSV file. If an item has been sold, it will not be in stock during this month and will not be displayed in the csv inventory this month.

For example, if an item was sold in January 2010, it will not appear in the February 2010 inventory file. I have to add this data to the data frame of January 2000 or csv. The example above is added, in which case I should add a column to the first csv file, something like this:

 inventory_01312017.csv store_id stock_number merchandise_id date_acquired color price MSRP date_sold 12973 7382 UISN78008 04/11/2017 Red $3200 $3650 01/31/2017 45973 9889 YHAN79807 08/09/2017 White $3600 $3650 

I need this data to find out how long the product has been in stock before it is sold (the difference between data_acquired and date_sold) and how this affects its price. I thought about it manually, but it will take me several weeks to finish this for so many files, and this will be an ongoing effort.

If I missed any information for a solution, let me know. I can update. Update: I updated some data and code examples. Hope my question becomes clearer. Any pointers are welcome.

it's better

Alice

0
source share

All Articles