Reduce the size of a large dataset by fetching / interpolating to improve chart performance

I have a large set (> 2000) of time series data that I would like to display using d3 in a browser. D3 works great for displaying a subset of data (~ 100 points) for the user, but I also need a “contextual” view ( like this one ) to show the entire data set and allow users to choose as a subregion for more detailed viewing.

However, the performance is terrible when trying to display this many points in d3. I believe that a good solution would be to select a sample of the data, and then use some interpolation (spline, polynomial, etc., This is the part that I know how to do) to draw a curve that is quite similar to the actual data.

However, it is not clear to me how I should choose a subset. The data (shown below) have rather flat areas where fewer samples are needed for decent interpolation and other regions, where the absolute derivative is high enough, where more frequent sampling is required.

To complicate matters even further, the data has gaps (when a sensor generates it, it crashes or goes out of range), and I would like to keep these gaps on the graph, rather than interpolating them. Finding spaces is pretty simple, but just cutting them out after the entire interpolated dataset seems like a smart solution.

I do this in JavaScript, but the solution is in any language or mathematical answer to the problem.

the data in question

+4
source share
3 answers

You can use the d3fc-sample module , which provides a number of different algorithms for fetching data. Here's what the API looks like:

// Create the sampler
var sampler = fc_sample.largestTriangleThreeBucket();

// Configure the x / y value accessors
sampler.x(function (d) { return d.x; })
    .y(function (d) { return d.y; });

// Configure the size of the buckets used to downsample the data.
sampler.bucketSize(10);

// Run the sampler
var sampledData = sampler(data);

You can see an example of work on the website:

http://d3fc.imtqy.com/d3fc-sample/

, "". , /, .

+4

- ( ) . , - - , , . , , , .

, , .

As for the time intervals for selection, you can try (1) fixed intervals, such as 1 second, 15 seconds, 1 minute, 15 minutes, hours, days, or something else; which can be understood by the user or (2) select an interval to make a fixed number of units in the entire time range, for example. if you decide to display 7 hours of data in 100 units, then each block = 252 seconds.

+1
source

All Articles