Optimizing module imports in Python

Question

Optimizing module imports in Python

I am reading a book by David Basley Python, and he concludes:

For example, if you performed many square root operations, it is faster to use 'from math import sqrt' and 'sqrt (x)' instead of entering the text 'Math.sqrt (x)'.

and

For calculations involving the intensive use of methods or the search for modules, it is almost always better to eliminate the attribute search by putting which you want to execute in a local variable.

I decided to try:

first()

def first(): from collections import defaultdict x = defaultdict(list)

second()

 def second(): import collections x = collections.defaultdict(list)

Results:

 2.15461492538 1.39850616455

Optimizations like these probably don't matter to me. But I'm curious why the opposite of what Basley wrote is true. And note that there is a difference of 1 second, which is very important, given that the task is trivial.

Why is this happening?

UPDATE:

I get timings such as:

 print timeit('first()', 'from __main__ import first'); print timeit('second()', 'from __main__ import second');

+4

python premature-optimization

user225312 May 08 '11 at 7:10

source share

6 answers

I also get the same relationship between first(.) And second(.) , Only the difference is that the timings are at the microsecond level.

I don't think your timings measure anything useful. Try to find the best test cases!

Update:
FWIW, here are some tests to support David Basley's point.

 import math from math import sqrt def first(n= 1000): for k in xrange(n): x= math.sqrt(9) def second(n= 1000): for k in xrange(n): x= sqrt(9) In []: %timeit first() 1000 loops, best of 3: 266 us per loop In [: %timeit second() 1000 loops, best of 3: 221 us per loop In []: 266./ 221 Out[]: 1.2036199095022624

So first() is 20% slower than second() .

+4

eat May 08 '11 at 7:53

source share

first() doesn’t save anything, because to access it you still need to access to import the name.

In addition, you do not give a methodology for selecting the time, but, given the names of the functions, it seems that first() performs the initial import, which is always longer than the subsequent import, since the module must be compiled and executed.

+1

Ignacio Vazquez-Abrams May 08 '11 at 7:14

source share

My guess is, your test is biased, and the second implementation benefits from the first one that has already loaded the module, or simply because of its recent loading.

How many times have you tried this? You switched the order, etc.

+1

Mantas vidutis May 08 '11 at 7:14

source share

There is also a question about the efficiency of reading / understanding source code. Here is a real live example (code from the stacking question )

Original:

 import math def midpoint(p1, p2): lat1, lat2 = math.radians(p1[0]), math.radians(p2[0]) lon1, lon2 = math.radians(p1[1]), math.radians(p2[1]) dlon = lon2 - lon1 dx = math.cos(lat2) * math.cos(dlon) dy = math.cos(lat2) * math.sin(dlon) lat3 = math.atan2(math.sin(lat1) + math.sin(lat2), math.sqrt((math.cos(lat1) + dx) * (math.cos(lat1) + dx) + dy * dy)) lon3 = lon1 + math.atan2(dy, math.cos(lat1) + dx) return(math.degrees(lat3), math.degrees(lon3))

Alternative:

 from math import radians, degrees, sin, cos, atan2, sqrt def midpoint(p1, p2): lat1, lat2 = radians(p1[0]), radians(p2[0]) lon1, lon2 = radians(p1[1]), radians(p2[1]) dlon = lon2 - lon1 dx = cos(lat2) * cos(dlon) dy = cos(lat2) * sin(dlon) lat3 = atan2(sin(lat1) + sin(lat2), sqrt((cos(lat1) + dx) * (cos(lat1) + dx) + dy * dy)) lon3 = lon1 + atan2(dy, cos(lat1) + dx) return(degrees(lat3), degrees(lon3))

+1

John machin May 09 '11 at 12:04

source share

Write your code as usual, importing a module and referencing its modules and constants as module.attribute . Then either prefix your functions with a decorator for binding constants or bind all the modules in your program using the bind_all_modules function below:

 def bind_all_modules(): from sys import modules from types import ModuleType for name, module in modules.iteritems(): if isinstance(module, ModuleType): bind_all(module) def bind_all(mc, builtin_only=False, stoplist=[], verbose=False): """Recursively apply constant binding to functions in a module or class. Use as the last line of the module (after everything is defined, but before test code). In modules that need modifiable globals, set builtin_only to True. """ try: d = vars(mc) except TypeError: return for k, v in d.items(): if type(v) is FunctionType: newv = _make_constants(v, builtin_only, stoplist, verbose) try: setattr(mc, k, newv) except AttributeError: pass elif type(v) in (type, ClassType): bind_all(v, builtin_only, stoplist, verbose)

0

tzot May 08 '11 at 23:25

source share

Douglas leeder · Accepted Answer · 2011-05-08T07:14:14+0000

from collections import defaultdict and import collections should be outside of repeating synchronization cycles, since you will not repeat them.

I assume that the from syntax should do more work than the import syntax.

Using this test code:

 #!/usr/bin/env python import timeit from collections import defaultdict import collections def first(): from collections import defaultdict x = defaultdict(list) def firstwithout(): x = defaultdict(list) def second(): import collections x = collections.defaultdict(list) def secondwithout(): x = collections.defaultdict(list) print "first with import",timeit.timeit('first()', 'from __main__ import first'); print "second with import",timeit.timeit('second()', 'from __main__ import second'); print "first without import",timeit.timeit('firstwithout()', 'from __main__ import firstwithout'); print "second without import",timeit.timeit('secondwithout()', 'from __main__ import secondwithout');

I get the results:

 first with import 1.61359190941 second with import 1.02904295921 first without import 0.344709157944 second without import 0.449721097946

Which shows how much re-import costs.

Optimizing module imports in Python

More articles: