I have a large full time series in one data frame and a list of timestamps in another test data frame. I need to multiply full with data points associated with timestamps in test . My first instinct (like R noob) was to write below what was wrong
subs <- subset(full,(full$dt>test$dt-i) & (full$dt<test$dt+i))
Looking at the result, I realized that R goes through both vectors at the same time, giving the wrong result. My option is to write a loop as shown below:
subs<-data.frame() for (j in test$dt) subs <- rbind(subs,subset(full,full$dt>(ji) & full$dt<(j+i)))
I feel that there may be a better way to do loops, and in this article begs us to avoid R-loops as much as possible. Another reason is that I might run into performance issues, as that would be the core of the optimization algorithm. Any suggestions from the guru would be very helpful.
EDIT:
Here is some reproducible code that shows the wrong approach, as well as an approach that works, but could be better.
#create a times series full <- data.frame(seq(1:200),rnorm(200,0,1)) colnames(full)<-c("dt","val")
EDIT: I updated the values ββto better reflect my usecase, and I see that the @mrdwab solution is moving forward unexpectedly and by a wide margin.
I am using the control code from @mrdwab and the initialization is as follows:
set.seed(1) full <- data.frame( dt = 1:15000000, val = floor(rnorm(15000000,0,1)) ) test <- data.frame(dt = floor(runif(24,1,15000000))) i <- 500
Criteria:
test replications elapsed relative 2 mrdwab 2 1.31 1.00000 3 spacedman 2 69.06 52.71756 1 andrie 2 93.68 71.51145 4 original 2 114.24 87.20611
Totally unexpected. Mind = blown up. Can someone shed light in this dark corner and talk about what is happening.
Important: As @mrdwab notes below, its solution only works if the vectors are integers. If not, @spacedman has the right solution