Matplotlib: faster PDF generation?

I would like to use matplotlib to create multiple PDF files. My main problem is that matplotlib is slow, on the order of 0.5 seconds per file.

I tried to understand why it took so long, and I wrote the following test program, which simply draws a very simple curve as a PDF file:

import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt X = range(10) Y = [ x**2 for x in X ] for n in range(100): fig = plt.figure(figsize=(6,6)) ax = fig.add_subplot(111) ax.plot(X, Y) fig.savefig("test.pdf") 

But even that as simple as it takes a lot of time: 15-20 seconds for a total of 100 PDF files (modern Intel platforms, I tried both Mac OS X and Linux systems).

Are there any tricks and methods that I can use to speed up PDF generation in matplotlib? Obviously, I can use multiple parallel threads on multi-core platforms, but is there anything else I can do?

+6
source share
4 answers

If this is practical, you can use a multiprocessor to do this (assuming there are several cores on your computer):

NOTE. The following code will contain 40 pdf files in this directory on your computer.

 import matplotlib.pyplot as plt import multiprocessing def do_plot(y_pos): fig = plt.figure() ax = plt.axes() ax.axhline(y_pos) fig.savefig('%s.pdf' % y_pos) pool = multiprocessing.Pool() for i in xrange(40): pool.apply_async(do_plot, [i]) pool.close() pool.join() 

It doesnโ€™t scale perfectly, but I get a significant boost by doing this on my 4 cores (dual core with hypertension):

 $> time python multi_pool_1.py done real 0m5.218s user 0m4.901s sys 0m0.205s $> time python multi_pool_n.py done real 0m2.935s user 0m9.022s sys 0m0.420s 

I am sure that there are many opportunities to improve performance in the mpl pdf backend, but this does not apply to the time frame you are in.

NTN

+3
source

Matplotlib has a lot of overhead for creating a shape, etc. even before saving it to pdf. Therefore, if your plots are similar, you can safely "tune" a lot by reusing elements, just as you will find in the animation examples for matplotlib.

You can reuse the shape and axes in this example:

 import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt X = range(10) Y = [ x**2 for x in X ] fig = plt.figure(figsize=(6,6)) ax = fig.add_subplot(111) for n in range(100): ax.clear() # or even better just line.remove() # but should interfere with autoscaling see also below about that line = ax.plot(X, Y)[0] fig.savefig("test.pdf") 

Please note that this does not help much. You can save a little more by reusing the lines:

 line = ax.plot(X, Y)[0] for n in range(100): # Now instead of plotting, we update the current line: line.set_xdata(X) line.set_ydata(Y) # If autoscaling is necessary: ax.relim() ax.autoscale() fig.savefig("test.pdf") 

This is almost twice as fast as the original example for me. This is only an option if you make similar stories, but if they are very similar, it can speed up a lot. matplotlib animation examples may inspire such optimization.

+3
source

You can use Report Lab . An open source version should be sufficient to do what you are trying to do. This should be much faster than using matplotlib to create PDF files.

0
source

I assume that changing the library (matplotlib) is not an option for you because you really like what matplotlib produces :-). I also assume - and some people here have already commented that other matplotlib backends are not much faster. I think that these days, from many cores to a machine and operating systems with good task schedulers, it is quite normal to run tasks such as yours in parallel to optimize throughput, that is, the speed of creating a PDF file. I think you can create many files per second with reasonable processing power. This is a way, so I honestly think that your question is very interesting, but actually not very relevant in practice.

0
source

Source: https://habr.com/ru/post/923231/


All Articles