Python datetime.strptime () Using a lot of CPU time

Question

Python datetime.strptime () Using a lot of CPU time

I have a password analysis code that should turn a timestamp into a datetime object. I use datetime.strptime, but this function uses a lot of cputime according to cProfile cProfile column. Timestamps are in the format 01/Nov/2010:07:49:33 .

Current function:

 new_entry['time'] = datetime.strptime( parsed_line['day'] + parsed_line['month'] + parsed_line['year'] + parsed_line['hour'] + parsed_line['minute'] + parsed_line['second'] , "%d%b%Y%H%M%S" )

Does anyone know how I can optimize this?

+7

optimization python datetime

Kyle brandt Nov 01 '10 at 16:28

source share

4 answers

It appears that using strptime () on a Windows platform uses the Python implementation (_strptime.py in the Lib directory). not C. It might be faster to process the string yourself.

 from datetime import datetime import timeit def f(): datetime.strptime ("2010-11-01", "%Y-%m-%d") n = 100000 print "%.6f" % (timeit.timeit(f, number=n)/n)

returns 0.000049 on my system whereas

 from datetime import date import timeit def f(): parts = [int (x) for x in "2010-11-01".split ("-")] return date (parts[0], parts[1], parts[2]) n = 100000 print "%.6f" % (timeit.timeit(f, number=n)/n)

returns 0.000009

+3

Andrew Miller Oct 11 '11 at 13:53

source share

The most recent answer: if switching to direct strptime() did not improve the running time, then my suspicion is that there are no problems: you just wrote a program, one of whose main goals in life is to call strptime() many times, and you wrote it well enough - with so few other things that it does - that calls to strptime() quite correctly allow you to dominate at runtime. I think you could consider this a success rather than a failure if you did not find that (a) some Unicode or LANG settings do strptime() additional work, or (b) you call it more often than you need. Try, of course, to call it only once for each parsed date. :-)

The following answer after looking at the example date string: Wait! Stay on the line! Why are you parsing a string instead of just using a format string, for example:

 "%d/%b/%Y:%H:%M:%S"

The original response to the cuff . If the month was an integer, you could do something like this:

 new_entry['time'] = datetime.datetime( int(parsed_line['year']), int(parsed_line['month']), int(parsed_line['day']), int(parsed_line['hour']), int(parsed_line['minute']), int(parsed_line['second']) )

and don't create a large string to make strptime() split again. I wonder if there is a way to access the logic of the month name directly to do this one text conversion?

+2

Brandon rhodes Nov 01 '10 at 16:33

source share

What is a lot of time? strptime takes about 30 microseconds here:

 from datetime import datetime import timeit def f(): datetime.strptime("01/Nov/2010:07:49:33", "%d/%b/%Y:%H:%M:%S") n = 100000 print "%.6f" % (timeit.timeit(f, number=n)/n)

prints 0.000031.

+2

Glenn maynard Nov 01 '10 at 17:18

source share

Mark ransom · Accepted Answer · 2010-11-02T16:48:27+0000

If these are fixed-width formats, then there is no need to parse the string - you can use slicing and dictionary search to get the fields directly.

 month_abbreviations = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} year = int(line[7:11]) month = month_abbreviations[line[3:6]] day = int(line[0:2]) hour = int(line[12:14]) minute = int(line[15:17]) second = int(line[18:20]) new_entry['time'] = datetime.datetime(year, month, day, hour, minute, second)

Testing by the Glenn Maynard method shows that it is about 3 times faster.

Python datetime.strptime () Using a lot of CPU time

More articles: