I have a vector that contains a sequence of 1 and 0. Suppose it has a length of 166 and it
y <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1, 1,1,1,1,1,0,1,1,0,1,0,1,0,0,0,0,0,1,0,0,0,1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,0,0,0,0,0,1,1,1,1)
Now I want to extract the LONGEST POSSIBLE sub vector from the above vector so that it satisfies two properties
(1) The sub-vector must start at 1 and end at 1.
(2) It can contain up to 5% of zeros of the total length of a subvector.
I started with the rle function. It calculates 1 and 0 at each step. So it will be like
z <- rle(y) d <- data.frame(z$values, z$lengths) colnames(d) <- c("value", "length")
It gives me
> d value length 1 1 22 2 0 1 3 1 13 4 0 1 5 1 2 6 0 1 7 1 1 8 0 1 9 1 1 10 0 5 11 1 1 12 0 3 13 1 2 14 0 1 15 1 1 16 0 1 17 1 74 18 0 2 19 1 17 20 0 1 21 1 2 22 0 1 23 1 3 24 0 5 25 1 4
In this case, 74 + 2 + 17 + 1 + 2 + 3 = 99 is a necessary subsequence, since it contains 2 + 1 + 1 = 4 zeros, which is less than 5% of 99. If we move forward and the sequence is 99 + 5 + 4 = 108, and the zeros will be 4 + 5 = 9, which will be more than 5% of 108.