Find the longest sequence of nonannes, but allow a threshold

Question

Find the longest sequence of nonannes, but allow a threshold

Is it possible to find nonzero values of a vector, but also allow n the number of nans? For example, if I have the following data:

X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array thres = 1; % this is the number of nans to allow

and I would like to keep only the longest sequence of values with non nans, but allow nn the number of nans in the data. So, say I'm ready to save 1 nand, I will have a way out

 X_out = [8 10 11 nan 9 14 6 1 4 23 24]; %// output array

Thats is, the two nans at the beginning were deleted due to the fact that they exceed the values in 'thres' above, but the third nan by itself, so it can be stored in the data. I would like to develop a method where thres can be defined as any value.

I can find non nan values with

 Y = ~isnan(X); %// convert to zeros and ones

Any ideas?

+8

matlab

Emma tebbs Sep 08 '15 at 11:40

source share

4 answers

Approach 1: Convolution

One possible approach is the convolution Y = double(~isnan(X)); with window n , where n decreases until an acceptable subsequence is found. “Acceptable” means that the subsequence contains at least n-thres units, i.e. Convolution gives at least n-thres .

 Y = double(~isnan(X)); for n = numel(Y):-1:1 %// try all possible sequence lengths w = find(conv(Y,ones(1,n),'valid')>=n-thres); %// is there any acceptable subsequence? if ~isempty(w) break end end result = X(w:w+n-1);

Aproach 2: Total

Escorting Y with window n (as in approach 1) is equivalent to calculating the total sum of Y , and then taking into account the difference n . This is more efficient in terms of the number of operations.

 Y = double(~isnan(X)); Z = cumsum(Y); for n = numel(Y):-1:1 w = find([Z(n) Z(n+1:end)-Z(1:end-n)]>=n-thres); if ~isempty(w) break end end result = X(w:w+n-1);

Approach 3: 2D convolution

This essentially computes all iterations of the loop in approach 1 at once.

 Y = double(~isnan(X)); z = conv2(Y, tril(ones(numel(Y)))); [nn, ww] = find(bsxfun(@ge, z, (1:numel(Y)).'-thres)); %' [n, ind] = max(nn); w = ww(ind)-n+1; result = X(w:w+n-1);

+5

Luis mendo Sep 08 '15 at 13:53

source share

Try my favorite tool: RLE. Matlab does not have a direct function, so use my "seqle" to exchange centerpieces. By default, Seqle returns the reverse encoding. So:

 >> foo = [ nan 1 2 3 nan nan 4 5 6 nan 5 5 5 ]; >> seqle(isnan(foo)) ans = run: [1 3 2 3 1 3] val: [1 0 1 0 1 0]

"Run" indicates the length of the current run; "val" indicates the value. In this case, val==1 indicates that nan and val==0 indicate numeric values. You can see that it will be quite easy to extract the longest sequence of "run" values satisfying the condition val==0 | run < 2 val==0 | run < 2 to get no more than one nan per line. Then just take the cumulative indices of this run and that subset of foo you want.

EDIT: Unfortunately, what is trivial to find by eye may not be so easy to extract from the code. I suspect that there is a much faster way to use indexes identified by longrun to get the desired subsequence.

 >> foo = [ nan 1 2 3 nan nan 4 5 6 nan nan 5 5 nan 5 nan 4 7 4 nan ]; >> sfoo= seqle(isnan(foo)) sfoo = run: [1 3 2 3 2 2 1 1 1 3 1] val: [1 0 1 0 1 0 1 0 1 0 1] >> longrun = sfoo.run<2 |sfoo.val==0 longlong = run: [2 1 1 1 6] val: [1 0 1 0 1] % longrun identifies which indices might be part of a run % longlong identifies the longest sequence of valid run >> longlong = seqle(longrun) >> lfoo = find(sfoo.run<2 |sfoo.val==0); >> sbar = seqle(lfoo,1); >> maxind=find(sbar.run==max(sbar.run),1,'first'); >> getlfoo = lfoo( sum(sbar.run(1:(maxind-1)))+1 ); % first value in longrun , which is part of max run % getbar finds end of run indices >> getbar = getlfoo:(getlfoo+sbar.run(maxind)-1); >> getsbar = sfoo.run(getbar); % retrieve indices of input vector >> startit = sum(sfoo.run(1:(getbar(1)-1))) +1; >> endit = startit+ ((sum(sfoo.run(getbar(1):getbar(end ) ) ) ) )-1; >> therun = foo( startit:endit ) therun = 5 5 NaN 5 NaN 4 7 4 NaN

+1

Carl Witthoft Sep 08 '15 at 15:41

source share

Hmmm, who doesn't like problems, my solution is not as good as mss, but it is an alternative.

 X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array thresh =1; X(isnan(X))= 0 ; for i = 1:thresh Y(i,:) = circshift(X',-i); %//circular shift end

For some reason, the Matlab "'" conversion makes formatting weird.

 D = X + sum(Y,1); Discard = find(D==0)+thresh; %//give you the index of the part that needs to be discarded chunk = find(X==0); %//Segment the Vector into segments delimited by NaNs seriesOfZero = circshift(chunk',-1)' - chunk; bigchunk =[1 chunk( find(seriesOfZero ~= 1)) size(X,2)]; %//Convert series of NaNs into 1 chunk [values,DiscardChunk] = intersect(bigchunk,Discard); DiscardChunk = sort(DiscardChunk,'descend') for t = 1:size(DiscardChunk,2) X(bigchunk(DiscardChunk(t)-1):bigchunk(DiscardChunk(t))) = []; %//Discard the data end X(X == 0) = NaN %//End of Code

8 10 11 NaN 9 14 6 1 4 23 24

When: X = [18 3 nan nan nan 8 10 11 nan nan 9 14 6 1 nan nan nan 4 23 24]; % // input array thresh = 2;

8 10 11 NaN 4 23 24

0

Gameofffrows Sep 08 '15 at 13:31

source share

ms · Accepted Answer · 2015-09-08T12:42:24+0000

To find the longest sequence containing no more than threshold times NaN , we must find the beginning and end of the specified sequence.

To create all possible starting points, we can use hankel :

 H = hankel(X) H = 18 3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0 NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0 0 8 10 11 NaN 9 14 6 1 4 23 24 0 0 0 0 10 11 NaN 9 14 6 1 4 23 24 0 0 0 0 0 11 NaN 9 14 6 1 4 23 24 0 0 0 0 0 0 NaN 9 14 6 1 4 23 24 0 0 0 0 0 0 0 9 14 6 1 4 23 24 0 0 0 0 0 0 0 0 14 6 1 4 23 24 0 0 0 0 0 0 0 0 0 6 1 4 23 24 0 0 0 0 0 0 0 0 0 0 1 4 23 24 0 0 0 0 0 0 0 0 0 0 0 4 23 24 0 0 0 0 0 0 0 0 0 0 0 0 23 24 0 0 0 0 0 0 0 0 0 0 0 0 0 24 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Now we need to find the last valid element in each row. For this we can use cumsum :

 C = cumsum(isnan(H),2) C = 0 0 1 2 2 2 2 3 3 3 3 3 3 3 3 0 1 2 2 2 2 3 3 3 3 3 3 3 3 3 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The endpoint for each line is the one where the corresponding element in C at most threshold :

 threshold = 1; T = C<=threshold T = 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The last valid item was found with:

 [~,idx]=sort(T,2); lastone=idx(:,end) lastone = 3 2 1 4 15 15 15 15 15 15 15 15 15 15 15

We need to make sure that the actual length of each line is respected:

 lengths = length(X):-1:1; real_length = min(lastone,lengths); [max_length,max_idx] = max(real_length) max_length = 11 max_idx = 5

If there are more sequences of equal maximum length, we just take the first and display it:

 selected_max_idx = max_idx(1); H(selected_max_idx, 1:max_length) ans = 8 10 11 NaN 9 14 6 1 4 23 24

full script

 X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; H = hankel(X); C = cumsum(isnan(H),2); threshold = 1; T = C<=threshold; [~,idx]=sort(T,2); lastone=idx(:,end)'; lengths = length(X):-1:1; real_length = min(lastone,lengths); [max_length,max_idx] = max(real_length); selected_max_idx = max_idx(1); H(selected_max_idx, 1:max_length)

Find the longest sequence of nonannes, but allow a threshold

Approach 1: Convolution

Aproach 2: Total

Approach 3: 2D convolution

More articles: