I noticed something very peculiar when converting dates to character classes for large datasets. As an example, I created a data set with layouts as follows:
DT = data.table(x=rep("2007-1-1", 1e9), y = rep(1,1e9))
DT[,x] <- as.Date(DT[,x])
Now I would like to convert the column x from date format dates to a character.
DT[,x.character:= as.character(x)]
This takes a little time for large datasets, and I noticed that the time required for the conversion decreases dramatically if we did the following:
DT[,x.character:= as.character(x+y-y)]
All I did here was add y and subtract y, so I really get the same results. From a logical point of view, it seems that I am making the computer work more. However, is there a reason why this method will lead to a faster start than direct conversion?
10000 system.time() :
DT = data.table(x=rep(as.Date("2007-1-1"), 1e5), y = rep(1,1e5))
system.time(DT[,x.character:= as.character(x)])
> user system elapsed
1.89 0.12 2.03
system.time(DT[,x.character:= as.character(x+y-y)])
> user system elapsed
0.635 0.008 0.643
system.time(DT[,x.character.sub:= as.character(x+y-y+y-y)])
> user system elapsed
0.347 0.004 0.351
, , , y-y . , ?
!