I use matplotlib for a signal processing application, and I noticed that it suffocates on large data sets. This is what I really need to improve to make it usable.
I am looking to allow matplotlib to decrypt my data. Is there a setting, property, or other simple way to enable this? Any suggestion on how to implement this is welcome.
Some code:
import numpy as np import matplotlib.pyplot as plt n=100000
Some background information
I worked on a large C ++ application where we needed to build large data arrays and to solve this problem we used the data structure as follows:
In most cases, if we need a line graph, the data is ordered and often even equidistant. If it is equidistant, then you can calculate the start and end index in the data array directly from the scaling rectangle and the inverse axis transformation. If it is ordered but not equidistant, you can use binary search.
Then the enlarged slice is omitted, and since the data is ordered, we can simply iterate over the block of points that fall inside one pixel. And for each block, the average, maximum and minimum are calculated. Instead of a single pixel, we draw a stroke on the graph.
For example: if the x axis is ordered, a vertical line will be drawn for each block, possibly an average with a different color.
To avoid smoothing, the plot is resampled with a factor of two.
If it is a scatter plot, the data can be sorted by sorting, because the plot sequence does not matter.
The good thing about this simple recipe is that the more you increase the speed, the faster it gets. In my experience, as long as the data fits into the memory, the graphs remain very responsive. For example, 20 data graphs with historical data with 10 million points should not be a problem.