Data transfer. Tab.

How can I find the last value, before test.day , for each pair ( loc.x , loc.y )?

 dt <- data.table( loc.x = as.integer(c(1, 1, 3, 1, 3, 1)), loc.y = as.integer(c(1, 2, 1, 2, 1, 2)), time = as.IDate(c("2015-03-11", "2015-05-10", "2015-09-27", "2015-11-25", "2014-09-13", "2015-08-19")), value = letters[1:6] ) setkey(dt, loc.x, loc.y, time) test.day <- as.IDate("2015-10-01") 

Required Conclusion:

  loc.x loc.y value 1: 1 1 a 2: 1 2 f 3: 3 1 c 
+7
join r data.table
source share
3 answers

Another option is to use the last function:

 dt[, last(value[time < test.day]), by = .(loc.x, loc.y)] 

which gives:

  loc.x loc.y V1 1: 1 1 a 2: 1 2 f 3: 3 1 c 
+6
source share

You can first multiply the lines where time < test.day (which should be efficient enough because it is not executed by the group), and then select the last value for each group. For this you can use tail(value, 1L) or, as suggested by Floo0, value[.N] , resulting in:

 dt[time < test.day, tail(value, 1L), by = .(loc.x, loc.y)] # loc.x loc.y V1 #1: 1 1 a #2: 1 2 f #3: 3 1 c 

or

 dt[time < test.day, value[.N], by = .(loc.x, loc.y)] 

Note that this works because the data is sorted due to setkey(dt, loc.x, loc.y, time) .

+6
source share

Here's another option using a sliding join after creating a lookup table

 indx <- data.table(unique(dt[ ,.(loc.x, loc.y)]), time = test.day) dt[indx, roll = TRUE, on = names(indx)] # loc.x loc.y time value # 1: 1 1 2015-10-01 a # 2: 1 2 2015-10-01 f # 3: 3 1 2015-10-01 c 

Or a very similar option suggested by @eddi

 dt[dt[, .(time = test.day), by = .(loc.x, loc.y)], roll = T, on = c('loc.x', 'loc.y', 'time')] 

Or one liner that will be less efficient, as it will call [.data.table by group

 dt[, .SD[data.table(test.day), value, roll = TRUE, on = c(time = "test.day")], by = .(loc.x, loc.y) ] # loc.x loc.y V1 # 1: 1 1 a # 2: 1 2 f # 3: 3 1 c 
+5
source share

All Articles