It may have already been answered, and I missed it, but it is hard to find.
A very simple question: why dt[,x]is the whole tiny bit faster than dt$x?
Example:
dt<-data.table(id=1:1e7,var=rnorm(1e6))
test<-microbenchmark(times=100L,
dt[sample(1e7,size=200000),var],
dt[sample(1e7,size=200000),]$var)
test[,"expr"]<-c("in j","$")
Unit: milliseconds
expr min lq mean median uq max neval
$ 14.28863 15.88779 18.84229 17.23109 18.41577 53.63473 100
in j 14.35916 15.97063 18.87265 17.99266 18.37939 54.19944 100
I may not have chosen a better example, so feel free to suggest something more poignant.
In any case, the estimate in jis faster, at least in 75% of cases (although it seems that the upper upper tail, like the average above, the side note, would be nice if microbenchmarksome histograms could spit me out).
Why is this so?
source
share