I ran into this error while trying to parse multiple dates via parse_dates from pandas.read_csv() . In the following code snippet, I am trying to parse dates with the format dd/mm/yy , which leads to an incorrect conversion. In some cases, the date field is considered a month and vice versa.
To keep it simple, in some cases dd/mm/yy converted to yyyy-dd-mm instead of yyyy-mm-dd .
Case 1:
04/10/96 is parsed as 1996-04-10, which is wrong.
Case 2:
15/07/97 is parsed as 1997-07-15, which is correct.
Case 3:
10/12/97 is parsed as 1997-10-12, which is wrong.
Code example
import pandas as pd df = pd.read_csv('date_time.csv') print 'Data in csv:' print df print df['start_date'].dtypes print '----------------------------------------------' df = pd.read_csv('date_time.csv', parse_dates = ['start_date']) print 'Data after parsing:' print df print df['start_date'].dtypes
Current output
---------------------- Data in csv: ---------------------- start_date 0 04/10/96 1 15/07/97 2 10/12/97 3 06/03/99 4
Expected Result
---------------------- Data in csv: ---------------------- start_date 0 04/10/96 1 15/07/97 2 10/12/97 3 06/03/99 4
Other comments:
I could use date_parser or pandas.to_datetime() to indicate the correct format for the date. But in my case, I have several date fields, for example ['//1997', '/02/1967'] , for which I need to convert ['01/01/1997','01/02/1967'] . parse_dates helps me convert these types of date fields to the expected format, without forcing me to write an extra line of code.
Is there any solution for this?
Link to @GitHub bug: https://github.com/pydata/pandas/issues/13063