Python datetime.strptime () Using a lot of CPU time

I have a password analysis code that should turn a timestamp into a datetime object. I use datetime.strptime, but this function uses a lot of cputime according to cProfile cProfile column. Timestamps are in the format 01/Nov/2010:07:49:33 .

Current function:

 new_entry['time'] = datetime.strptime( parsed_line['day'] + parsed_line['month'] + parsed_line['year'] + parsed_line['hour'] + parsed_line['minute'] + parsed_line['second'] , "%d%b%Y%H%M%S" ) 

Does anyone know how I can optimize this?

+7
optimization python datetime
source share
4 answers

If these are fixed-width formats, then there is no need to parse the string - you can use slicing and dictionary search to get the fields directly.

 month_abbreviations = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} year = int(line[7:11]) month = month_abbreviations[line[3:6]] day = int(line[0:2]) hour = int(line[12:14]) minute = int(line[15:17]) second = int(line[18:20]) new_entry['time'] = datetime.datetime(year, month, day, hour, minute, second) 

Testing by the Glenn Maynard method shows that it is about 3 times faster.

+13
source share

It appears that using strptime () on a Windows platform uses the Python implementation (_strptime.py in the Lib directory). not C. It might be faster to process the string yourself.

 from datetime import datetime import timeit def f(): datetime.strptime ("2010-11-01", "%Y-%m-%d") n = 100000 print "%.6f" % (timeit.timeit(f, number=n)/n) 

returns 0.000049 on my system whereas

 from datetime import date import timeit def f(): parts = [int (x) for x in "2010-11-01".split ("-")] return date (parts[0], parts[1], parts[2]) n = 100000 print "%.6f" % (timeit.timeit(f, number=n)/n) 

returns 0.000009

+3
source share

The most recent answer: if switching to direct strptime() did not improve the running time, then my suspicion is that there are no problems: you just wrote a program, one of whose main goals in life is to call strptime() many times, and you wrote it well enough - with so few other things that it does - that calls to strptime() quite correctly allow you to dominate at runtime. I think you could consider this a success rather than a failure if you did not find that (a) some Unicode or LANG settings do strptime() additional work, or (b) you call it more often than you need. Try, of course, to call it only once for each parsed date. :-)

The following answer after looking at the example date string: Wait! Stay on the line! Why are you parsing a string instead of just using a format string, for example:

 "%d/%b/%Y:%H:%M:%S" 

The original response to the cuff . If the month was an integer, you could do something like this:

 new_entry['time'] = datetime.datetime( int(parsed_line['year']), int(parsed_line['month']), int(parsed_line['day']), int(parsed_line['hour']), int(parsed_line['minute']), int(parsed_line['second']) ) 

and don't create a large string to make strptime() split again. I wonder if there is a way to access the logic of the month name directly to do this one text conversion?

+2
source share

What is a lot of time? strptime takes about 30 microseconds here:

 from datetime import datetime import timeit def f(): datetime.strptime("01/Nov/2010:07:49:33", "%d/%b/%Y:%H:%M:%S") n = 100000 print "%.6f" % (timeit.timeit(f, number=n)/n) 

prints 0.000031.

+2
source share

All Articles