Use the value from the previous line in the calculation of R.table.table

I want to create a new column in a data table, calculated from the current value of one column and the previous one. Is it possible to access the previous lines?

eg:.

> DT <- data.table(A=1:5, B=1:5*10, C=1:5*100) > DT ABC 1: 1 10 100 2: 2 20 200 3: 3 30 300 4: 4 40 400 5: 5 50 500 > DT[, D := C + BPreviousRow] # What is the correct code here? 

The correct answer should be

 > DT ABCD 1: 1 10 100 NA 2: 2 20 200 210 3: 3 30 300 320 4: 4 40 400 430 5: 5 50 500 540 
+68
r data.table
04 Feb '13 at 14:59
source share
8 answers

With shift() implemented in v1.9.6 , this is pretty simple.

 DT[ , D := C + shift(B, 1L, type="lag")] # or equivalently, in this case, DT[ , D := C + shift(B)] 



From NEWS :

  1. The new shift() function implements a fast lead/lag vector, list, data.frames or data.tables. It takes a type argument, which can be either "lagging" (the default) or leading. This makes it very convenient to use along with := or set() . For example: DT[, (cols) := shift(.SD, 1L), by=id] . Please see ?shift for more information.



See the history of previous answers.

+86
Feb 04 '13 at 15:02
source share

Using dplyr , you can do:

 mutate(DT, D = lag(B) + C) 

What gives:

 # ABCD #1: 1 10 100 NA #2: 2 20 200 210 #3: 3 30 300 320 #4: 4 40 400 430 #5: 5 50 500 540 
+20
Apr 27 '15 at 1:52
source share

Several people answered a specific question. See the code below for a general purpose function that I use in situations that may be useful. Instead of just getting the previous line, you can go through as many lines in the "past" or "future" as you want.

 rowShift <- function(x, shiftLen = 1L) { r <- (1L + shiftLen):(length(x) + shiftLen) r[r<1] <- NA return(x[r]) } # Create column D by adding column C and the value from the previous row of column B: DT[, D := C + rowShift(B,-1)] # Get the Old Faithul eruption length from two events ago, and three events in the future: as.data.table(faithful)[1:5,list(eruptLengthCurrent=eruptions, eruptLengthTwoPrior=rowShift(eruptions,-2), eruptLengthThreeFuture=rowShift(eruptions,3))] ## eruptLengthCurrent eruptLengthTwoPrior eruptLengthThreeFuture ##1: 3.600 NA 2.283 ##2: 1.800 NA 4.533 ##3: 3.333 3.600 NA ##4: 2.283 1.800 NA ##5: 4.533 3.333 NA 
+19
Aug 01 '14 at 16:24
source share

Based on @Steve Lianoglou's comment above, why not just:

 DT[, D:= C + c(NA, B[.I - 1]) ] # ABCD # 1: 1 10 100 NA # 2: 2 20 200 210 # 3: 3 30 300 320 # 4: 4 40 400 430 # 5: 5 50 500 540 

And do not use seq_len or head or any other function.

+12
May 04 '14 at 4:25
source share

Following Arun’s decision, similar results can be obtained without reference to .N

 > DT[, D := C + c(NA, head(B, -1))][] ABCD 1: 1 10 100 NA 2: 2 20 200 210 3: 3 30 300 320 4: 4 40 400 430 5: 5 50 500 540 
+9
Feb 04 '13 at 15:53
source share

I added an addition argument and changed some names and named it shift . https://github.com/geneorama/geneorama/blob/master/R/shift.R

+1
Nov 03 '14 at 22:03
source share

Here is my intuitive solution:

 #create data frame df <- data.frame(A=1:5, B=seq(10,50,10), C=seq(100,500, 100))' #subtract the shift from num rows shift <- 1 #in this case the shift is 1 invshift <- nrow(df) - shift #Now create the new column df$D <- c(NA, head(df$B, invshift)+tail(df$C, invshift))' 

Here invshift , the number of lines minus 1, is 4. nrow(df) gives you the number of lines in the data frame or in the vector. Similarly, if you want to accept even earlier values, subtract 2, 3, ..., etc. from nrow, and put NA, respectively, at the beginning.

+1
Jul 05 '18 at 10:51
source share

is it possible to apply the shift function described above to individual rows of the df column (as opposed to the whole column) based on the if / else statement? I tried with the above code in a loop (for (I'm at 1: nrow (df)) .... however, it just returns NA, so I assume that it doesn't like using the string condition [i]. Thanks

0
Jan 21 '19 at 16:10
source share



All Articles