When using: =, why with = TRUE by default?

In data.table by default with = TRUE and j is evaluated within x . Then it helps to use column names as variables. And when with = FALSE , j is the vector of names or positions to choose from.

I managed to find some with = FALSE examples.

 set.seed(1234) DT <- data.table(x=rep(c(1,2,3),each=4), y=c("A","B"), v=sample(1:100,12)) ## The askers solution #first step is to create cumsum columns colNames <- c("x","v"); newColNames <- paste0("SUM.",colNames) DT[, newColNames := lapply(.SD,cumsum) ,by=y, .SDcols = colNames, with=FALSE]; test <- DT[, newColNames:=lapply(.SD,cumsum) ,by=y, .SDcols=colNames, with=TRUE]; 

We can verify that DT :

 > DT # setting `with=FALSE` - what I require xyv SUM.x SUM.v 1: 1 A 12 1 12 2: 1 B 62 1 62 3: 1 A 60 2 72 4: 1 B 61 2 123 5: 2 A 83 4 155 6: 2 B 97 4 220 7: 2 A 1 6 156 8: 2 B 22 6 242 9: 3 A 99 9 255 10: 3 B 47 9 289 11: 3 A 63 12 318 12: 3 B 49 12 338 

and test :

 > test # this is when setting " with = TRUE" xyv newColNames 1: 1 A 12 1 2: 1 B 62 1 3: 1 A 60 2 4: 1 B 61 2 5: 2 A 83 4 6: 2 B 97 4 7: 2 A 1 6 8: 2 B 22 6 9: 3 A 99 9 10: 3 B 47 9 11: 3 A 63 12 12: 3 B 49 12 

I do not understand why this is the result when setting with = TRUE . So my question basically is when with = TRUE is useful?

I do not understand why the default value is with = TRUE , although there is a good reason for this.

Thank you very much!

+6
source share
1 answer

I see your thought. We refused to use with=TRUE|FALSE in combination with := . Since it is implicitly clear whether with=TRUE refers to the left side or to the right side of := . Instead, it is now preferable to wrap the LHS := brackets.

 DT[, x.sum:=cumsum(x)] # assign cumsum(x) to the column called "x.sum" DT[, (target):=cumsum(x)] # assign to the name contained in target value 

As Justin mentioned, most of the time we assign a new or existing column that we know in advance. In other words, most often the assigned column is not held in a variable. We do it in a way that is convenient. However, data.table is flexible and also allows you to specify the name of the target column.

I believe that a case can be made when it should be:

 DT[, "x.sum":=cumsum(x)] # assign cumsum(x) to the column called "x.sum" DT[, x.sum:=cumsum(x)] # assign to the name contained in x.sum contents. 

However, since := is an assignment operator and j is evaluated within the scope of DT , it would be difficult for me if DT[, x.sum:=cumsum(x)] did not x.sum column.

Explicit parentheses, i.e. (target):= , imply some kind of evaluation, so the syntax is more clear. In my opinion, anyway. Of course you can call paste0 , etc. Directly on the left side := also unnecessarily with=FALSE ; eg.

 DT[, paste0("SUM.",colNames) := lapply(.SD, ...), by=...] 

In short, I never use with when I use := .

+5
source

All Articles