Faster aspect ratio tables in R

I create proportional tables based on the xts object. Since this is part of a large program that (unfortunately) requires about 10 ^ 6 cycles, it creates a rather bottleneck, and I would like to speed it up.

This is an example of where I started:

library(quantmod) test.xts <- xts(sample(seq(1,5, by=.5), 50, replace=T), as.Date(1:50)) system.time(for(i in 1:10000){ prop.table(table(test.xts)) }) >user system elapsed 19.86 0.00 18.58 

I already changed the xts to the matrix and led to a significant increase in speed. I just mention that this is xts initially in case I skip something with xts, which will speed it up beyond what I already saw, turning it into a matrix.

 test.mat <- as.matrix(test.xts) system.time(for(i in 1:10000){ prop.table(table(test.mat)) }) >user system elapsed 2.78 0.00 2.90 

But I would really like it to be as fast as possible, so I hope that others have suggestions for further improvements there. I hope there will be an obvious approach.

Another piece of information is that the output from these tables ultimately merges with a similar output from another time period, so the dimensions should remain named. (Ie, I need to be able to match the proportion for the value β€œ10” at time 1 with the fraction β€œ10” at time 2).

Any help is greatly appreciated.

+4
source share
2 answers

table() implicitly creates a factor that is expensive. In your case, you can save a lot (over 10 times) by using tabulate() , since you already have integers:

 a <- tabulate(test.mat) names(a) <- seq_along(a) a / sum(a) 1 2 3 4 5 6 7 8 9 10 0.16 0.14 0.08 0.14 0.08 0.16 0.02 0.06 0.10 0.06 

timings:

 system.time(for(i in 1:10000){ a <- tabulate(test.mat) names(a) <- seq_along(a) a/sum(a) }) user system elapsed 0.208 0.002 0.210 

your time to compare:

 system.time(for(i in 1:10000) prop.table(table(test.mat))) user system elapsed 3.373 0.028 3.402 
+4
source

Configuring a joran comment using tabulate() directly may be faster. It has three differences:

  • It uses only integers and truncates decimal numbers.
  • He silently ignores all negative values ​​and zeros.
  • It creates a bit for all 1: n values, even if there are zero values

See ?tabulate more details.

With this caution here, a function that gives ~ 9x is accelerated:

 prop2 <- function(x){ x <- tabulate(x) out <- x/sum(x) names(out) <- seq_along(out) return(out) } 

Testing Speed:

 library(rbenchmark) test.mat <- as.matrix(test.xts) f1 <- function() prop.table(table(test.mat)) benchmark(f1(), prop2(test.mat), replications = 1000, columns = c("test", "relative", "elapsed"), order = "relative") #------ test relative elapsed 2 prop2(test.mat) 1.0 0.10 1 f1() 9.1 0.91 

Confirm conclusion:

 > prop.table(table(test.mat)) test.mat 1 2 3 4 5 6 7 8 9 10 0.04 0.02 0.20 0.12 0.08 0.10 0.06 0.14 0.12 0.12 > prop2(test.mat) 1 2 3 4 5 6 7 8 9 10 0.04 0.02 0.20 0.12 0.08 0.10 0.06 0.14 0.12 0.12 
+2
source

Source: https://habr.com/ru/post/1413214/


All Articles