How to speed up python startup and / or reduce file search when loading libraries?

I have a structure consisting of various tools written in python in a multi-user environment.

When you log in for the first time and run one command, it only takes 6 seconds to show several help lines. If I issue the same command again, it will take 0.1 s. After a couple of minutes, he returns for 6 seconds. (proof of short-term cache)

The system is on GPFS, so disk bandwidth should be fine, although access may be low due to the number of files in the system.

strace -e open python tool | wc -l 

displays 2154 files that can be obtained when the tool starts.

 strace -e open python tool | grep ENOENT | wc -l 

shows 1945 missing files. (A very bad hit / miss ratio you ask me :-)

I suspect that the excessive time it takes to download the tool is consumed by requesting GPFS about all these files, and they are cached for the next call (at the system or GPFS level), although I do not know how to verify / prove this. I do not have root access to the system, and I can only write to GPFS and / tmp.

Can this python quest for missing files be improved?

Any idea on how to test this in a simple way? (Reinstalling everything on / tmp is not easy, since there are many packages, virtualenv will not help me either (I think), since it just links files in the gpfs ​​system).

Of course, the option would have a demon, which forks, but this is far from "simple" and would be the last solution.

Thanks for reading.

+7
source share
2 answers

How about using the imp module? In particular, there is a function: imp.find_module (module, path) here http://docs.python.org/2.7/library/imp.html

At least this example (see below) reduces the number of open () syscalls vs simple 'import numpy, scipy': (update: but it doesn't seem like you can achieve a significant reduction in syscalls this way ...)

 import imp import sys def loadm(name, path): fp, pathname, description = imp.find_module(name,[path]) try: _module = imp.load_module(name, fp, pathname, description) return _module finally: # Since we may exit via an exception, close fp explicitly. if fp: fp.close() numpy = loadm("numpy", "/home/username/py-virtual27/lib/python2.7/site-packages/") scipy = loadm("scipy", "/home/username/py-virtual27/lib/python2.7/site-packages/") 

I think you should also check that your PYTHONPATH is empty or small, because it can also increase load times.

+2
source

Python 2 first looks for modules compared to the current package. If your library code has a lot of imports for a large number of top-level modules, all of them are first considered relative. So, if the package is foo.bar import os , then Python first looks for foo/bar/os.py This error is also cached by Python itself.

In Python 3, by default it switched to absolute import instead; you can switch Python 2.5 and use absolute import for each module with:

 from __future__ import absolute_import 

Another source of file search omissions is downloading .pyc bytecode cache files; if for some reason they are missing (the file system is not writable in the current Python process), then Python will continue to look for them in every run. You can create these caches with the compileall module :

 python -m compileall /path/to/directory/with/pythoncode 

if you run it with the correct write permissions.

+2
source

All Articles