Rewrite the loop using

Question

Rewrite the loop using

Newbie question: this double cycle on a data frame of about 50 thousand elements is evaluated very slowly, taking more than 30 seconds. I read on the Internet that I have to use some form of the apply function to fix this, but still cannot get the code in order. Starting with the first data.frame that has results in it, the goal is to get a second data.frame where only values that exceed the target are populated, and everyone else has a value of 0.

This code works:

ExcessGain = function(Value, Target){ max(0,Value - Target) } Pcnt_O_O_x = data.frame() for (j in 1:ncol(Pcnt_O_O)){ for (i in 1:nrow(Pcnt_O_O)){ Pcnt_O_O_x[i,j] = ExcessGain(Pcnt_O_O[i,j], GainTargetPcnt) } }

Can I speed this up by using the apply function instead of the inner loop?

+3

r

LGTrader Apr 25 '13 at 15:16

source share

1 answer

Simon O'Hanlon · Accepted Answer · 2013-04-25T15:30:15+0000

Your function looks like it just subtracts the target value from the value of each cell in your array. Any negative values are replaced with 0. In this case, you do not need loops, you can just use the R built-in to the vector to do this:

 set.seed(123) # If you have a data.frame of all numeric elements turn it into a matrix first df <- as.matrix( data.frame( matrix( runif(25) , nrow = 5 ) ) ) target <- 0.5 df # X1 X2 X3 X4 X5 #1 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393 #2 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034 #3 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068 #4 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698 #5 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058 df2 <- df - target df2 # X1 X2 X3 X4 X5 #1 -0.21242248 -0.45444350 0.45683335 0.3998250 0.3895393 #2 0.28830514 0.02810549 -0.04666584 -0.2539123 0.1928034 #3 -0.09102308 0.39241904 0.17757064 -0.4579405 0.1405068 #4 0.38301740 0.05143501 0.07263340 -0.1720793 0.4942698 #5 0.44046728 -0.04338526 -0.39707532 0.4545036 0.1557058 df2[ df2 < 0 ] <- 0 df2 # X1 X2 X3 X4 X5 #1 0.0000000 0.00000000 0.4568333 0.3998250 0.3895393 #2 0.2883051 0.02810549 0.0000000 0.0000000 0.1928034 #3 0.0000000 0.39241904 0.1775706 0.0000000 0.1405068 #4 0.3830174 0.05143501 0.0726334 0.0000000 0.4942698 #5 0.4404673 0.00000000 0.0000000 0.4545036 0.1557058

Here are a few benchmarks showing the difference in speed when working on matrix , as opposed to working on data.frame . f.df( df ) and fm( m ) are two functions that work on data.frame and a matrix with 1 million resepctively elements:

 require( microbenchmark ) microbenchmark( f.df( df ) , fm( m ) , times = 10L ) #Unit: milliseconds # expr min lq median uq max neval # f.df(df) 6944.09808 9009.39684 9233.18528 9533.75089 10036.5963 10 # fm(m) 37.26433 39.00189 40.46229 41.15626 130.6983 10

Work on the matrix is two orders of magnitude faster when the matrix is large.

If you really need to use the apply function, you can aplpy for each matrix cell as follows:

 m <- matrix( runif(25) , nrow = 5 ) target <- 0.5 apply( m , 1:2 , function(x) max(x - target , 0 ) ) # [,1] [,2] [,3] [,4] [,5] #[1,] 0.4575807 0.0000000 0.15935928 0.0000000 0.1948637 #[2,] 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000 #[3,] 0.0000000 0.0000000 0.00000000 0.0000000 0.0000000 #[4,] 0.3912719 0.0000000 0.06155316 0.1533290 0.0000000 #[5,] 0.3228921 0.4697041 0.23554353 0.1352888 0.0000000

Rewrite the loop using

More articles: