Find the first higher item with a higher index.

I have two vectors, A and B For each element in A I want to find the index of the first element in B , which is larger and has a higher index. The lengths A and B same.

So for vectors:

 A <- c(10, 5, 3, 4, 7) B <- c(4, 8, 11, 1, 5) 

I want a result vector:

 R <- c(3, 3, 5, 5, NA) 

Of course, I can do this with two loops, but it is very slow, and I don’t know how to use apply () in this situation, when the values ​​matter. My dataset has 20,000 vectors, so speed is really important in this case.

A few bonus questions:

  • What if I have a sequence of numbers (e.g. seq = 2:10 ) and I want to find the first number in B that is greater than a + s for every a from A and every s seq.

  • As with question 1), but I want to know the first higher and the first lower value and create a matrix that stores the one that was the first. So for example, I have A and 10 from seq. I want to find the first value of B that is greater than + 10 or lower than a-10, and then store its index and value.

+7
source share
2 answers
 sapply(sapply(seq_along(a),function(x) which(b[-seq(x)]>a[x])+x),"[",1) [1] 3 3 5 5 NA 
+6
source

This is a great example of when sapply is less efficient than loops. Although sapply makes code tidier, you pay for that tidiness over time.

Instead, you can wrap a while loop inside a for loop inside a nice, neat function.

Here are tests comparing a nested application loop with a nested while-while loop (and a mixed while-while loop, for a good grade). Update: Added vapply..match.. , indicated in the comments. Faster than simple, but still much slower than the while loop.

REFERENCE:

  test elapsed relative 1 for.while 0.069 1.000 2 sapply.while 0.080 1.159 3 vapply.match 0.101 1.464 4 nested.sapply 0.104 1.507 

Note that you save a third of your time ; The savings are likely to be greater if you start adding sequences to A.



For the second part of your question:

If you have it all completed in a nice feature, it's easy to add seq to A

 # Sample data A <- c(10, 5, 3, 4, 7, 100, 2) B <- c(4, 8, 11, 1, 5, 18, 20) # Sample sequence S <- seq(1, 12, 3) # marix with all index values (with names cleaned up) indexesOfB <- t(sapply(S, function(s) findIndx(A+s, B))) dimnames(indexesOfB) <- list(S, A) 

Finally, if you want instead to find values ​​of B less than A, just replace the operation in the function.
(You can include an if clause in a function and use only one function. I find it more efficient to have two separate functions)

 findIndx.gt(A, B) # [1] 3 3 5 5 6 NA 8 NA NA findIndx.lt(A, B) # [1] 2 4 4 NA 8 7 NA NA NA 

Then you can wrap it in one beautiful pacakge

 rangeFindIndx(A, B, S) # AS indxB.gt indxB.lt # 10 1 3 2 # 5 1 3 4 # 3 1 5 4 # 4 1 5 NA # 7 1 6 NA # 100 1 NA NA # 2 1 NA NA # 10 4 6 4 # 5 4 3 4 # ... 



Functions

(Note that they are dependent on reshape2 )

 rangeFindIndx <- function(A, B, S) { # For each s in S, and for each a in A, # find the first value of B, which is higher than a+s, or lower than as require(reshape2) # Create gt & lt matricies; add dimnames for melting function indexesOfB.gt <- sapply(S, function(s) findIndx.gt(A+s, B)) indexesOfB.lt <- sapply(S, function(s) findIndx.lt(As, B)) dimnames(indexesOfB.gt) <- dimnames(indexesOfB.gt) <- list(A, S) # melt the matricies and combine into one gtltMatrix <- cbind(melt(indexesOfB.gt), melt(indexesOfB.lt)$value) # clean up their names names(gtltMatrix) <- c("A", "S", "indxB.gt", "indxB.lt") return(gtltMatrix) } findIndx.gt <- function(A, B) { lng <- length(A) ret <- integer(0) b <- NULL for (j in seq(lng-1)) { i <- j + 1 while (i <= lng && ((b <- B[[i]]) < A[[j]]) ) { i <- i + 1 } ret <- c(ret, ifelse(i<lng, i, NA)) } c(ret, NA) } findIndx.lt <- function(A, B) { lng <- length(A) ret <- integer(0) b <- NULL for (j in seq(lng-1)) { i <- j + 1 while (i <= lng && ((b <- B[[i]]) > A[[j]]) ) { # this line contains the only difference from findIndx.gt i <- i + 1 } ret <- c(ret, ifelse(i<lng, i, NA)) } c(ret, NA) } 
+6
source

All Articles