Can pandas automatically recognize dates?

Question

Can pandas automatically recognize dates?

Today I was pleasantly surprised that when reading data from a data file (for example) pandas can recognize value types:

df = pandas.read_csv('test.dat', delimiter=r"\s+", names=['col1','col2','col3'])

For example, this can be verified as follows:

 for i, r in df.iterrows(): print type(r['col1']), type(r['col2']), type(r['col3'])

In particular, the integer, floats and strings were correctly recognized. However, I have a column with dates in the following format: 2013-6-4 . These dates were recognized as strings (and not as python data objects). Is there a way to “recognize” pandas to recognized dates?

+120

python date types pandas dataframe

Roman Jul 04 '13 at 8:08

source share

9 answers

Perhaps the pandas interface has changed since @Rutger answered, but in the version I use (0.15.2), the date_parser function gets a list of dates instead of a single value. In this case, its code should be updated as follows:

 dateparse = lambda dates: [pd.datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in dates] df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

+16

Sean Mar 11 '15 at 16:03

source share

pandas The read_csv method is great for parsing dates. Full documentation at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html

you can even have different parts of the date in different columns and pass a parameter:

 parse_dates : boolean, list of ints or names, list of lists, or dict If True -> try parsing the index. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo'

By default, date sensitivity works fine, but it seems to be biased against date formats in North America. If you live in another place, you can sometimes be caught in the results. As far as I remember, 1/6/2000 means January 6th in the USA, and not June 1st where I live. He's smart enough to rock them if dates like 23/6/2000 are used. It is probably safer to stay with the date options YYYYMMDD. I apologize to pandas developers, but I have not recently tested it with local dates.

you can use the date_parser parameter to pass a function to convert your format.

 date_parser : function Function to use for converting a sequence of string columns to an array of datetime instances. The default uses dateutil.parser.parser to do the conversion.

+11

Joop Jul 04 '13 at 10:38

source share

When combining two columns into one datetime column, the received answer generates an error (pandas version 0.20.3), since the columns are sent separately for the date_parser function.

The following works:

 def dateparse(d,t): dt = d + " " + t return pd.datetime.strptime(dt, '%d/%m/%Y %H:%M:%S') df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

+8

IamTheWalrus Oct 25 '17 at 8:54 on

source share

Yes - according to the pandas.read_csv documentation :

Note. There is a quick way for dates in iso8601 format .

Therefore, if your csv has a column named datetime and the dates look like 2013-01-01T01:01 for example, pandas (I am on v0.19.2) will do this automatically selects the date and time:

df = pd.read_csv('test.csv', parse_dates=['datetime'])

Note that you need to explicitly pass parse_dates , without work.

Check with:

df.dtypes

You should see that the column data type is datetime64[ns]

+7

Gaurav Apr 10 '17 at 2:46 on

source share

You can use pandas.to_datetime() , as recommended in the documentation for pandas.read_csv() :

If the column or index contains an illegible date, the entire column or index will be returned unchanged as the data type of the object. For non-standard parsing of date and time, use pd.to_datetime after pd.read_csv .

Demonstration:

 >>> D = {'date': '2013-6-4'} >>> df = pd.DataFrame(D, index=[0]) >>> df date 0 2013-6-4 >>> df.dtypes date object dtype: object >>> df['date'] = pd.to_datetime(df.date, format='%Y-%m-%d') >>> df date 0 2013-06-04 >>> df.dtypes date datetime64[ns] dtype: object

+7

Eugene Yarmash Sep 24 '17 at 12:52 on

source share

If performance matters to you, make sure you time:

 import sys import timeit import pandas as pd print('Python %s on %s' % (sys.version, sys.platform)) print('Pandas version %s' % pd.__version__) repeat = 3 numbers = 100 def time(statement, _setup=None): print (min( timeit.Timer(statement, setup=_setup or setup).repeat( repeat, numbers))) print("Format %m/%d/%y") setup = """import pandas as pd import io data = io.StringIO('''\ ProductCode,Date ''' + '''\ x1,07/29/15 x2,07/29/15 x3,07/29/15 x4,07/30/15 x5,07/29/15 x6,07/29/15 x7,07/29/15 y7,08/05/15 x8,08/05/15 z3,08/05/15 ''' * 100)""" time('pd.read_csv(data); data.seek(0)') time('pd.read_csv(data, parse_dates=["Date"]); data.seek(0)') time('pd.read_csv(data, parse_dates=["Date"],' 'infer_datetime_format=True); data.seek(0)') time('pd.read_csv(data, parse_dates=["Date"],' 'date_parser=lambda x: pd.datetime.strptime(x, "%m/%d/%y")); data.seek(0)') print("Format %Y-%m-%d %H:%M:%S") setup = """import pandas as pd import io data = io.StringIO('''\ ProductCode,Date ''' + '''\ x1,2016-10-15 00:00:43 x2,2016-10-15 00:00:56 x3,2016-10-15 00:00:56 x4,2016-10-15 00:00:12 x5,2016-10-15 00:00:34 x6,2016-10-15 00:00:55 x7,2016-10-15 00:00:06 y7,2016-10-15 00:00:01 x8,2016-10-15 00:00:00 z3,2016-10-15 00:00:02 ''' * 1000)""" time('pd.read_csv(data); data.seek(0)') time('pd.read_csv(data, parse_dates=["Date"]); data.seek(0)') time('pd.read_csv(data, parse_dates=["Date"],' 'infer_datetime_format=True); data.seek(0)') time('pd.read_csv(data, parse_dates=["Date"],' 'date_parser=lambda x: pd.datetime.strptime(x, "%Y-%m-%d %H:%M:%S")); data.seek(0)')

prints:

 Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28) [Clang 6.0 (clang-600.0.57)] on darwin Pandas version 0.23.4 Format %m/%d/%y 0.19123052499999993 8.20691274 8.143124389 1.2384357139999977 Format %Y-%m-%d %H:%M:%S 0.5238807110000039 0.9202787830000005 0.9832778819999959 12.002349824999996

So, with a date in iso8601 format ( %Y-%m-%d %H:%M:%S is apparently a date in iso8601 format, I think T can be removed and replaced with a space), you shouldn't specify infer_datetime_format (which has nothing to do with the more common or, apparently,) and passing your own analyzer only as crippled performance. On the other hand, date_parser really matters with non-standard day formats. Be sure to time before you optimize as usual.

+1

Mr_and_Mrs_D Dec 04 '18 at 22:32

source share

df = pd.read_csv ("/home/manoj/Desktop/train_aWnotuB.csv", parse_dates = ['DateTime'])

Functions = list (map (lambda x: [x.hour, x.day, x.weekday (), x.month, x.year], df ['DateTime']))

0

Manoj Kumar Singh Nov 19 '17 at 20:02

source share

At boot time, the CSV file contains a date column. We have two approaches to the date recognition column, i.e.

Pandas explicitly recognize the format by arg date_parser=mydateparser
Pandas implicitly recognize the format by infer_datetime_format=True

Some date column data

01/01/18

02/01/18

Here we do not know the first two things. It can be a month or a day. So in this case, we should use Method 1: - Explicit transfer in the format

  mydateparser = lambda x: pd.datetime.strptime(x, "%m/%d/%y") df = pd.read_csv(file_name, parse_dates=['date_col_name'], date_parser=mydateparser)

Method 2: - implicit or automatic format recognition

 df = pd.read_csv(file_name, parse_dates=[date_col_name],infer_datetime_format=True)

0

kamran kausar Sep 20 '19 at 19:30

source share

Rutger Kassies · Accepted Answer · 2013-07-04 10:32

You should add parse_dates=True or parse_dates=['column name'] when reading, which is usually enough to magically parse it. But there are always strange formats that need to be defined manually. In this case, you can also add a date parser function, which is the most flexible way.

Suppose you have a "datetime" column with your row, and then:

 dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

Thus, you can combine several columns into one datetime column, this combines the "date" and "time" column into one "datetime" column:

 dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

You can find directives (i.e. letters that will be used for different formats) for strptime and strftime on this page .

Can pandas automatically recognize dates?

More articles: