Python thread for pre-importing modules

I am writing a Python application in the field of scientific computing. Currently, when a user works with a graphical interface and launches a new physical simulation, the interpreter immediately imports several necessary modules for this simulation, such as Traits and Mayavi . These modules are heavy and take too long to import, and the user must wait ~ 10 seconds before he can continue, which is bad.

I was thinking of something that could fix this. I will describe it, and maybe someone else has already implemented it, if so, please give me a link. If not, I can do it myself.

I want a separate thread that will import modules asynchronously. This will probably be a subclass of threading.Thread .

Here is a usage example:

 importer_thread = ImporterThread() importer_thread.start() # ... importer_thread.import('Mayavi') importer_thread.import('Traits') # A thread-safe method that will put the module name # into a queue which the thread in an inifine loop # ... # When the user actually needs the modules: import Mayavi, Traits # If they were already loaded by importer_thread, we're good. # If not, we'll just have to wait as usual. 

Do you know about this? If not, do you have any suggestions regarding design?

+4
source share
4 answers

The problem is that imports must be completed before they are used. Depending on when they are first used, the application will still have to block for 10 seconds before it can start anyway. It would be much more productive to profile modules and find out why they have been imported for so long.

+2
source

Why not just do it when you start the application?

 def background_imports(): import Traits import Mayavi thread = threading.Thread(target=background_imports) thread.setDaemon(True) thread.start() 
+2
source

The general idea is good, but a Python / GUI session may not be as responsive as long as the background import thread is removed; unfortunately, import inherently and inevitably “blocks” Python significantly (this is not only GIL, there is a special additional lock for import).

It is still worth a try, as it can make things a little better - it is also very simple, since Queue are thread safe and, apart from Queue put and get , all you need is basically __import__ . However, do not be surprised if this does not help, and you still need additional skill.

If you have a disk that is inherently very fast, but with limited space, for example, a "RAM disk" or especially bright, solid-state, it might be worth saving the necessary packages in .tar.bz2 (or another archive form) and unpack it to a fast disk when the program starts (which, in fact, is just an I / O, and therefore it will not block the situation much - I / O operations quickly release the GIL), and it is also especially easy to delegate to a subprocess running tar xjf or the like )

If some import slowness is associated with a huge number of .py/.pyc/.pyo , you should try to save them (only in the form of .pyc , not like .py )). zipfile and import from there (but it only helps with I / O overhead, depending on your OS, file system and disk: it doesn’t help with delays due to loading huge DLLs or executing initialization code in packages at boot time, which I suspect that it’s rather the culprits of slowness).

You can also consider splitting the application into multiprocessing - again using Queues (but a multiprocessor view) for communication - so that both the import and some heavy calculations are delegated to several auxiliary processes and thus be made asynchronous (this can also help to fully use several cores immediately). I suspect that, unfortunately, it can be difficult to properly configure for visualization tasks (for example, you are supposedly doing it with Mayavi), but it can help if you also have some “clean heavy computing” packages and tasks.

+1
source

"the user works with a graphical interface and launches a new physical simulation"

Not quite clear. "Works with GUI" means double click? Double tap what? Some wxWidgets GUI application? Or GOING?

If so, what does “start a new physical simulation” mean? Click a button elsewhere? GUI button to call the panel, where do they write the code? Or do they import the script that they wrote off?

Why does import take place before the start of the simulation? How long does the simulation take? What does the GUI show?

I suspect there is a way to be much, much more lazy by making a lot of imports. But from the description it is difficult to determine whether there is a point in time when the import does not have special significance for the user.

Threads don't really help. Which helps rethink the user interface experience.

-one
source

All Articles