Matplotlib is slow with large data sets, how to enable thinning?

I use matplotlib for a signal processing application, and I noticed that it suffocates on large data sets. This is what I really need to improve to make it usable.

I am looking to allow matplotlib to decrypt my data. Is there a setting, property, or other simple way to enable this? Any suggestion on how to implement this is welcome.

Some code:

import numpy as np import matplotlib.pyplot as plt n=100000 # more then 100000 points makes it unusable slow plt.plot(np.random.random_sample(n)) plt.show() 

Some background information

I worked on a large C ++ application where we needed to build large data arrays and to solve this problem we used the data structure as follows:

In most cases, if we need a line graph, the data is ordered and often even equidistant. If it is equidistant, then you can calculate the start and end index in the data array directly from the scaling rectangle and the inverse axis transformation. If it is ordered but not equidistant, you can use binary search.

Then the enlarged slice is omitted, and since the data is ordered, we can simply iterate over the block of points that fall inside one pixel. And for each block, the average, maximum and minimum are calculated. Instead of a single pixel, we draw a stroke on the graph.

For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly an average with a different color.

To avoid smoothing, the plot is resampled with a factor of two.

If it is a scatter plot, the data can be sorted by sorting, because the plot sequence does not matter.

The good thing about this simple recipe is that the more you increase the speed, the faster it gets. In my experience, as long as the data fits into the memory, the graphs remain very responsive. For example, 20 data graphs with historical data with 10 million points should not be a problem.

+7
performance python matplotlib signal-processing
source share
2 answers

It seems you just need to decrypt the data before you enter it.

 import numpy as np import matplotlib.pyplot as plt n=100000 # more then 100000 points makes it unusable slow X=np.random.random_sample(n) i=10*array(range(n/10)) plt.plot(X[i]) plt.show() 
+1
source share

Decimation is not better, for example, if you destroy sparse data, they can all appear as zeros.

Thinning should be so smart that each horizontal LCD pixel is plotted with the minimum and maximum amount of data between the thinning points. Then, when you zoom in, you see more details.

With scaling, this cannot be done outside of matplotlib and therefore is better handled internally.

0
source share

All Articles