Why is max (iterable) running much slower than an equivalent loop?

Question

Why is max (iterable) running much slower than an equivalent loop?

I noticed a strange performance hit from minor refactoring, which replaced the loop with a call to the built-in max inside the recursive function.

Here is the simplest reproduction that I could produce:

 import time def f1(n): if n <= 1: return 1 best = 0 for k in (1, 2): current = f1(nk)*n if current > best: best = current return best def f2(n): if n <= 1: return 1 return max(f2(nk)*n for k in (1, 2)) t = time.perf_counter() result1 = f1(30) print('loop:', time.perf_counter() - t) # 0.45 sec t = time.perf_counter() result2 = f2(30) print('max:', time.perf_counter() - t) # 1.02 sec assert result1 == result2

Both f1 and f2 calculate the factorial using standard recursion, but with the addition of unnecessary maximization (I just get the opportunity to use max in recursion, while preserving recursion simply):

 # pseudocode factorial(0) = 1 factorial(1) = 1 factorial(n) = max(factorial(n-1)*n, factorial(n-2)*n)

It is implemented without memoization, so there is an exponential number of calls.

An implementation with max(iterable) more than two times slower than with a loop.

~~Oddly enough, a direct comparison of the max vs loop did not show the effect~~ (edit: it doesn't matter, see @TimPeters answer). Also, if I use max(a, b) instead of max(iterable) , the performance mismatch disappears.

+8

performance python python-3.x python-internals

max Jun 17 '17 at 17:36

source share

2 answers

Conducting this as an “answer” because useful formatting is not possible in the comments:

 $ python -m timeit "max(1, 2)" # straight 10000000 loops, best of 3: 0.148 usec per loop $ python -m timeit "max([i for i in (1, 2)])" # list comp 1000000 loops, best of 3: 0.328 usec per loop $ python -m timeit "max(i for i in (1, 2))" # genexp 1000000 loops, best of 3: 0.402 usec per loop

Which shows that recursion is a red herring. This is usually true, as these results show that the xp gene is slower than the listcomp, which in turn is slower than using none. Since your code does more than just max, the time differences are not so extreme, but since it does a little more than just max, the speed of the maximum part is still very significant.

+7

Tim peters Jun 17 '17 at 18:39

source share

Jim fasarakis hilliard · Accepted Answer · 2017-06-17T19:07:11+0000

This is really unfair to the max function because of the generator expression you feed it.

For each call to f2 you need to create a new closure for n , you need to execute a new function (that both expression expressions and expressions in lists with Python 3 are implemented, see 'Details' of PEP 289 ), which completes the code object representing gen-exp. Then this function, iteratively calling other functions, is called again.

A small piece of byte code:

 14 LOAD_CLOSURE 0 (n) 16 BUILD_TUPLE 1 18 LOAD_CONST 2 (<code object <genexpr> at 0x7f1b667e1f60, file "", line 16>) 20 LOAD_CONST 3 ('f2.<locals>.<genexpr>') 22 MAKE_FUNCTION 8 24 LOAD_CONST 5 ((1, 2)) 26 GET_ITER 28 CALL_FUNCTION 1

Of course, you do not see any instructions like these in the case of f1 , since it just makes calls.

When you then call your function max , f2 , a significant number of times, as you do when recursively calculating factorial 30 , the overhead just accumulates.

The version with the concept of a function function list suffers to a large extent from the same problem. This is a little faster because comprehension of lists is faster than generator expressions.

If I use max(a, b) instead of max(iterable) , the performance mismatch disappears.

Precisely, no functions are created for each call in this case, so you do not see that overhead is accumulating. You just provide the arguments here.

Why is max (iterable) running much slower than an equivalent loop?

More articles: