Nested functions faster than global functions in Python?

I prefer to use nested functions instead of methods or global functions in Python whenever possible. Therefore, I decided to check their performance, because this means that when defining a function in another function, there will be an overhead in defining the inner function in each call to the outer function.

At best, I was hoping the global function would be a little faster, but surprisingly the nested function was faster. Does anyone know why?

This is my code:

from time import clock def a(n): return n + 1 def b1(loopcount): return sum([a(n) for n in range(loopcount)]) def b2(loopcount): def a(n): return n + 1 return sum([a(n) for n in range(loopcount)]) powers = [5, 6, 7] b1times = [] b2times = [] print " ", "".join(["{:^10d}".format(n) for n in powers]) for i in range(5): for power in powers: t = clock() b1(10**power) b1times.append(clock() - t) for power in powers: t = clock() b2(10**power) b2times.append(clock() - t) print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times]) print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times]) print "" b1times = [] b2times = [] 

And this is the result on my computer:

  5 6 7 b1: 0.08200 0.82773 8.47946 b2: 0.06914 0.79637 8.18571 b1: 0.07332 0.82139 8.68262 b2: 0.06547 0.82088 8.19606 b1: 0.07963 0.82625 9.65037 b2: 0.06617 0.82027 8.21412 b1: 0.07630 0.82112 8.49082 b2: 0.06541 0.80686 8.20532 b1: 0.12328 0.87034 8.42964 b2: 0.07059 0.79717 8.24620 

UPDATE: using the comment by @Janne Karila

Now when I call b1 and b2 more, b1 gets faster. As @Kos and @Pavel Anossov said in their answers, some factors affect the speed here, and you cannot make a general statement.
Thanks everyone!

 from time import * def a1(n): return n + 1 def b1(n): return a1(n) def b2(n): def a2(): return n + 1 return a2() powers = [4, 5, 6] b1times = [] b2times = [] print " ", "".join(["{:^10d}".format(n) for n in powers]) for i in range(5): for power in powers: t = clock() sum([b1(n) for n in range(10**power)]) b1times.append(clock() - t) for power in powers: t = clock() sum([b2(n) for n in range(10**power)]) b2times.append(clock() - t) print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times]) print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times]) print "" b1times = [] b2times = [] 
+4
source share
4 answers

There are several parts here:

1) Time for defining a function (creating a function object),
2) Search time of a function object by name,
3) The time the function was actually called.

An example of your global function is faster: 1) (there is no need to override a in every call to b1 ). However, it is slower than in 2) since the search for global variables is slower than the local search.

Why don't we have both?

I expanded your test with a solution using a global function, but avoids a global search using a local variable. This seems to be the fastest of the three on my machine:

  5 6 7 b1: 0.04147 0.44421 4.46508 b2: 0.03399 0.43321 4.41121 b3: 0.03258 0.41821 4.25542 b1: 0.03240 0.42998 4.39774 b2: 0.03320 0.43465 4.42229 b3: 0.03155 0.42109 4.23669 b1: 0.03273 0.43321 4.37266 b2: 0.03326 0.43551 4.42208 b3: 0.03137 0.42356 4.25341 b1: 0.03253 0.43104 4.40466 b2: 0.03401 0.43719 4.42996 b3: 0.03155 0.41681 4.24132 b1: 0.03244 0.42965 4.37192 b2: 0.03310 0.43629 4.42727 b3: 0.03117 0.41701 4.23932 
+7
source

LOAD_GLOBAL is much slower than LOAD_FAST.

b1

  5 0 LOAD_GLOBAL 0 (sum) 3 BUILD_LIST 0 6 LOAD_GLOBAL 1 (range) 9 LOAD_FAST 0 (loopcount) 12 CALL_FUNCTION 1 15 GET_ITER >> 16 FOR_ITER 18 (to 37) 19 STORE_FAST 1 (n) 22 LOAD_GLOBAL 2 (a) 25 LOAD_FAST 1 (n) 28 CALL_FUNCTION 1 31 LIST_APPEND 2 34 JUMP_ABSOLUTE 16 >> 37 CALL_FUNCTION 1 40 RETURN_VALUE 

b2

  8 0 LOAD_CONST 1 (<code object a at 01E57A40, ...>) 3 MAKE_FUNCTION 0 6 STORE_FAST 1 (a) 10 9 LOAD_GLOBAL 0 (sum) 12 BUILD_LIST 0 15 LOAD_GLOBAL 1 (range) 18 LOAD_FAST 0 (loopcount) 21 CALL_FUNCTION 1 24 GET_ITER >> 25 FOR_ITER 18 (to 46) 28 STORE_FAST 2 (n) 31 LOAD_FAST 1 (a) 34 LOAD_FAST 2 (n) 37 CALL_FUNCTION 1 40 LIST_APPEND 2 43 JUMP_ABSOLUTE 25 >> 46 CALL_FUNCTION 1 49 RETURN_VALUE 

Whether the difference is to create a function each time depends on the function (and how many times you use LOAD_GLOBAL).

Creating a local reference to the global function used in the inner loop is a relatively general optimization:

 def a(n): return n + 1 def b1(loopcount): local_a = a return sum([local_a(n) for n in range(loopcount)]) 

This should be faster than finding a global and more supported than nested function.

+7
source

In the general case, no, because every time an external function is executed, the internal function must be redefined.

This suggests that performance really doesn't matter if you can't show it (for example, a particular loop is too slow and causes performance problems) - so I would recommend using what is most readable - premature optimization is a bad thing.

However, I would say that global functions are probably the best solution, since they are easier to repeat according to your code, and I would say that it is more readable. After all, an apartment is better than an enclosed one.

0
source

I think this is caused by the order in which the object definition is searched in python.

Whenever the interpreter encounters the name of an object, it will first search for a definition of the definition of a local object that records the mapping between the name of the object and the object itself. If it is not found in the local dict, then global and then inline.

And all things in python are objects, including a function.

0
source

All Articles