Consider the following script, which we will call Foo.r
set.seed(1) x=matrix(rnorm(1000*1000),ncol=1000) x=data.frame(x) dummy = sapply(1:1000,function(i) sum(x[i,]) ) #dummy = sapply(1:1000,function(i) sum(x[,i]) )
When the first dummy line is commented out, we sum the columns and the code takes less than a second to run on my machine.
$ time Rscript Foo.r real 0m0.766s user 0m0.536s sys 0m0.080s
When the second line of dummy commented out (and the first is commented out), we summarize the lines, and the runtime approaches 30 seconds.
$ time Rscript Foo.r real 0m30.589s user 0m30.248s sys 0m0.104s
Note that I know the standard summation functions rowSums and colSums , but I use the sum only as an example for this strange asymmetric performance behavior.
r
merlin2011
source share