Which one is pythonic? and pythonic vs. speed

Question

Which one is pythonic? and pythonic vs. speed

I am new to python and just wrote this module level function:

def _interval(patt): """ Converts a string pattern of the form '1y 42d 14h56m' to a timedelta object. y - years (365 days), M - months (30 days), w - weeks, d - days, h - hours, m - minutes, s - seconds""" m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt) args = {'weeks': 0.0, 'days': 0.0, 'hours': 0.0, 'minutes': 0.0, 'seconds': 0.0} for (n,q) in m: if q=='y': args['days'] += float(n)*365 elif q=='M': args['days'] += float(n)*30 elif q=='w': args['weeks'] += float(n) elif q=='d': args['days'] += float(n) elif q=='h': args['hours'] += float(n) elif q=='m': args['minutes'] += float(n) elif q=='s': args['seconds'] += float(n) return _dt.timedelta(**args)

My problem is with the for loop here, namely with the long if elif block, and wondered if there is a more pythonic way to do this.
So I rewrote the function as follows:

 def _interval2(patt): m = _re.findall(r'([+-]?\d*(?:\.\d+)?)([yMwdhms])', patt) args = {'weeks': 0.0, 'days': 0.0, 'hours': 0.0, 'minutes': 0.0, 'seconds': 0.0} argsmap = {'y': ('days', lambda x: float(x)*365), 'M': ('days', lambda x: float(x)*30), 'w': ('weeks', lambda x: float(x)), 'd': ('days', lambda x: float(x)), 'h': ('hours', lambda x: float(x)), 'm': ('minutes', lambda x: float(x)), 's': ('seconds', lambda x: float(x))} for (n,q) in m: args[argsmap[q][0]] += argsmap[q][1](n) return _dt.timedelta(**args)

I tested the execution time of both codes using the timeit module and found that the second takes about 5-6 seconds longer (for the default number of retries).

So my question is:
1. What code is considered more pythonic? 2. Are there any more pythons in writing this function?
3. What about the trade-offs between pythonism and other aspects (for example, speed in this case) of programming?

ps I have an OCD for elegant code.

EDITED _interval2 after looking at this answer :

 argsmap = {'y': ('days', 365), 'M': ('days', 30), 'w': ('weeks', 1), 'd': ('days', 1), 'h': ('hours', 1), 'm': ('minutes', 1), 's': ('seconds', 1)} for (n,q) in m: args[argsmap[q][0]] += float(n)*argsmap[q][1]

+6

python datetime timedelta

Kashyap nadig Jan 17 '11 at 14:23

source share

3 answers

9000 · Answer 1 · 2011-01-17T14:39:45+0000

It seems you create a lot of lambda every time you parse. You really don't need lambda, just a multiplier. Try the following:

 def _factor_for(what): if what == 'y': return 365 elif what == 'M': return 30 elif what in ('w', 'd', 'h', 's', 'm'): return 1 else raise ValueError("Invalid specifier %r" % what) for (n,q) in m: args[argsmap[q][0]] += _factor_for([q][1]) * n

Do not use _factor_for local method function or method to speed up the process.

Shawn chin · Answer 2 · 2011-01-17T15:11:27+0000

(I have not timed it, but) if you are going to use this function often, it may be worthwhile to precompile the regex expression.

Here I will take your function:

 re_timestr = re.compile(""" ((?P<years>\d+)y)?\s* ((?P<months>\d+)M)?\s* ((?P<weeks>\d+)w)?\s* ((?P<days>\d+)d)?\s* ((?P<hours>\d+)h)?\s* ((?P<minutes>\d+)m)?\s* ((?P<seconds>\d+)s)? """, re.VERBOSE) def interval3(patt): p = {} match = re_timestr.match(patt) if not match: raise ValueError("invalid pattern : %s" % (patt)) for k,v in match.groupdict("0").iteritems(): p[k] = int(v) # cast string to int p["days"] += p.pop("years") * 365 # convert years to days p["days"] += p.pop("months") * 30 # convert months to days return datetime.timedelta(**p)

Update

From this question , it seems that precompiling regex patterns does not lead to a noticeable performance improvement since Python caches and reuses them. You save only the time needed to check the cache, which, if you do not repeat it several times, is negligible.

Update2

As you rightly pointed out, this solution does not support interval3("1h 30s" + "2h 10m") . However, timedelta support arithmetic, which means you can still express it as interval3("1h 30s") + interval3("2h 10m") .

In addition, as mentioned in some comments on this subject, you can avoid supporting “years” and “months” on inputs. There is a reason why timedelta does not support these arguments; it cannot be processed correctly (and incorrect code is almost never elegant).

Here is another version, this time with support for float, negative values and some error checking.

 re_timestr = re.compile(""" ^\s* ((?P<weeks>[+-]?\d+(\.\d*)?)w)?\s* ((?P<days>[+-]?\d+(\.\d*)?)d)?\s* ((?P<hours>[+-]?\d+(\.\d*)?)h)?\s* ((?P<minutes>[+-]?\d+(\.\d*)?)m)?\s* ((?P<seconds>[+-]?\d+(\.\d*)?)s)?\s* $ """, re.VERBOSE) def interval4(patt): p = {} match = re_timestr.match(patt) if not match: raise ValueError("invalid pattern : %s" % (patt)) for k,v in match.groupdict("0").iteritems(): p[k] = float(v) # cast string to int return datetime.timedelta(**p)

Examples of using:

 >>> print interval4("1w 2d 3h4m") # basic use 9 days, 3:04:00 >>> print interval4("1w") - interval4("2d 3h 4m") # timedelta arithmetic 4 days, 20:56:00 >>> print interval4("0.3w -2.d +1.01h") # +ve and -ve floats 3:24:36 >>> print interval4("0.3x") # reject invalid input Traceback (most recent call last): File "date.py", line 19, in interval4 raise ValueError("invalid pattern : %s" % (patt)) ValueError: invalid pattern : 0.3x >>> print interval4("1h 2w") # order matters Traceback (most recent call last): File "date.py", line 19, in interval4 raise ValueError("invalid pattern : %s" % (patt)) ValueError: invalid pattern : 1h 2w

ulidtko · Answer 3 · 2011-01-17T14:26:01+0000

Yes there is. Use time.strptime :

Parsing a string representing the time according to the format. The return value is equal to struct_time as returned by gmtime() or localtime() .

Which one is pythonic? and pythonic vs. speed

Update

Update2

More articles: