Replication () vs for loop?

Question

Replication () vs for loop?

Does anyone know how the replicate () function works in R, and how efficient is it with respect to using a for loop?

For example, is there a difference in efficiency between ...

means <- replicate(100000, mean(rnorm(50)))

A...

 means <- c() for(i in 1:100000) { means <- c(means, mean(rnorm(50))) }

(I might have scored something a little higher, but you get the idea.)

+6

performance for-loop r

Rickyb Nov 16 '12 at 7:37

source share

4 answers

replicate is a wrapper for sapply , which itself is a wrapper for lapply . lapply is ultimately the .Internal function, which is written in C and executes the loop in an optimized way, not through an interpreter. Its main advantages are efficient memory management, especially compared to the highly efficient vector growing method presented above.

+8

James Nov 16 '12 at 7:46

source share

I have a completely different experience with replicate , which also bothers me. It often happens that my R crashes and my laptop freezes when I use replicate compared to for , and it surprises me, as for the reasons mentioned above, I also expected that the C-written function would outperform the for loop. For example, if you perform the functions listed below, you will see that the for loop is faster than replicate

 system.time(for (i in 1:10) runif(1e7)) # user system elapsed # 3.340 0.218 3.558 system.time(replicate(10, runif(1e7))) # user system elapsed # 4.622 0.484 5.109

therefore, with replication 10 the for loop is clearly faster. If you repeat it for 100 repetitions, you will get similar results. So I wonder if anyone can come up with an example showing its practical privileges compared to for .

PS I also created a function for runif(1e7) , and that made no difference in comparison. Basically, I did not come up with any example that shows the advantage of replicate .

+1

Ef haghish Apr 25 '16 at 17:33

source share

Vectorization is the key difference between the two. I will rise to explain this. R is a computer language with a high level of interpretation. He takes care of many basic computer tasks for you. When you write

 x <- 2.0

you do not need to tell the computer that

"2.0" is a floating point number;
"x" must store data with a numeric type;
he must find a place in his memory in order to put "5";
it should register "x" as a pointer to a specific place in memory.

R depicts these things separately.

But there is a price for such a comfortable problem: it is slower than low-level languages.

In C or FORTRAN, most of this “if test” will be performed at compile time, rather than at run time. They are translated into binary computer language (0/1) after they are written, BUT before they start. This allows the compiler to organize binary machine code in an optimal way for interpreting the computer.

What does this have to do with vectorization in R? Well, many of the R functions are actually written in a compiled language such as C, C ++, and FORTRAN, and have a small R shell. That is the difference between your approach. For loops add additional test if operations that the machine must perform on the data, making it slower

+1

Worice Apr 26 '16 at 12:07

source share

Paul hiemstra · Accepted Answer · 2012-11-16T08:34:27+0000

You can simply check the code and get your answer empirically. Note that I also added a second for the flavor of the loop, which circumvents the growing vector problem by pre-highlighting the vector.

 repl_function = function(no_rep) means <- replicate(no_rep, mean(rnorm(50))) for_loop = function(no_rep) { means <- c() for(i in 1:no_rep) { means <- c(means, mean(rnorm(50))) } means } for_loop_prealloc = function(no_rep) { means <- vector(mode = "numeric", length = no_rep) for(i in 1:no_rep) { means[i] <- mean(rnorm(50)) } means } no_loops = 50e3 benchmark(repl_function(no_loops), for_loop(no_loops), for_loop_prealloc(no_loops), replications = 3) test replications elapsed relative user.self sys.self 2 for_loop(no_loops) 3 18.886 6.274 17.803 0.894 3 for_loop_prealloc(no_loops) 3 3.209 1.066 3.189 0.000 1 repl_function(no_loops) 3 3.010 1.000 2.997 0.000 user.child sys.child 2 0 0 3 0 0 1 0 0

Looking at the relative column, the non-prealocated for loop is 6.2 times slower. However, prealocated for loop is as fast as replicate .

Replication () vs for loop?

More articles: