MATLAB: using interpolation to replace missing values ​​(NaN)

I have an array of cells, each of which contains a sequence of values ​​as a row vector. The sequences contain some missing values ​​represented by NaN .

I would like to replace all NaN using some kind of interpolation method, how can I do this in MATLAB? I am also open to other suggestions on how to deal with these missing values.

Consider this sample data to illustrate the problem:

 seq = {randn(1,10); randn(1,7); randn(1,8)}; for i=1:numel(seq) %# simulate some missing values ind = rand( size(seq{i}) ) < 0.2; seq{i}(ind) = nan; end 

The resulting sequence:

 seq{1} ans = -0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006 seq{2} ans = 0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417 seq{3} ans = NaN NaN 0.42639 -0.37281 -0.23645 2.0237 -2.2584 2.2294 

Edit:

Based on the answers, I think there was confusion: it’s obvious that I don’t work with random data, the above code is just an example of how the data is structured.

Actual data is some form of processed signal. The problem is that during the analysis, my solution would fail if the sequences contain missing values, therefore, the need for filtering / interpolation (I already thought that I use the average of each sequence to fill in the gaps, but I hope for something more powerful)

+7
matlab nan interpolation missing-data
source share
6 answers

Well, if you work with time series data, you can use Matlab built into the interpolation function.

Something like this should work for your situation, but you need to adapt it a bit ... i.e. if you do not have the same selection, you need to change the times line.

 nseq = cell(size(seq)) for i = 1:numel(seq) times = 1:length(seq{i}); mask = ~isnan(seq{i}); nseq{i} = seq{i}; nseq{i}(~mask) = interp1(times(mask), seq{i}(mask), times(~mask)); end 

You will need to play interp1 with the interp1 options to find out which ones are best for your situation.

+8
source share

I would use inpaint_nans , a tool designed to replace nan elements in 1st or 2nd matrices by interpolation.

 seq{1} = [-0.50782 -0.32058 NaN -3.0292 -0.45701 1.2424 NaN 0.93373 NaN -0.029006]; seq{2} = [0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417]; seq{3} = [NaN NaN 0.42639 -0.37281 -0.23645 2.0237]; for i = 1:3 seq{i} = inpaint_nans(seq{i}); end seq{:} ans = -0.50782 -0.32058 -2.0724 -3.0292 -0.45701 1.2424 1.4528 0.93373 0.44482 -0.029006 ans = 0.18245 -1.5651 -0.084539 1.6039 0.098348 0.041374 -0.73417 ans = 2.0248 1.2256 0.42639 -0.37281 -0.23645 2.0237 
+6
source share

If you have access to the System Identification Toolbox, you can use the MISDATA function to evaluate missing values. According to the documentation :

This command linearly interpolates the missing values ​​to evaluate the first model. He then uses this model to evaluate the missing data as parameters by minimizing the output of prediction errors obtained from the reconstructed data.

Basically, the algorithm alternates between estimating the missing data and evaluating the models, similar to the algorithm for maximizing expectations (EM).

The evaluated model can be any of the linear idmodel models (AR / ARX / ..), or if it is not specified, uses the default state-state model.

Here's how to apply it to your data:

 for i=1:numel(seq) dat = misdata( iddata(seq{i}(:)) ); seq{i} = dat.OutputData; end 
+1
source share

As JudoWill says, you need to accept some kind of relationship between your data.

One simple option is to calculate the average of your total series and use those that are not available for the data. Another trivial option would be to take the average of n previous and n next values.

But be very careful: if you do not have enough data, you better deal with these missing data than compile some fake data that can ruin your analysis.

0
source share

Consider the following example.

X = some array Nx1 Y = F (X) with some NaN in it

then use

X1 = X (find (~ IsNaN (Y))); Y1 = Y (find (~ IsNaN (Y)));

Now interpolate along X1 and Y1 to calculate all the values ​​on all X.

0
source share

Use griddedInterpolant

There are also some other functions, such as interp1. For curve plots, spline is the best way to find missing data.

0
source share

All Articles