The fastest way to convert numeric characters to R

I need to convert a number vector to a character in R. As I know, there are different ways (see below).

It seems that the fastest ways are sprintf and gettextf.

set.seed(1) a <- round(runif(100000), 2) system.time(b1 <- as.character(a)) user system elapsed 0.108 0.000 0.105 system.time(b2 <- formatC(a)) user system elapsed 0.052 0.000 0.052 system.time(b3 <- sprintf('%.2f', a)) user system elapsed 0.044 0.000 0.046 system.time(b4 <- gettextf('%.2f', a)) user system elapsed 0.048 0.000 0.046 system.time(b5 <- paste0('', a)) user system elapsed 0.124 0.000 0.129 

Are there other methods for converting numeric characters to a character in R? Thanks for any suggestions.

+5
source share
3 answers

Since you have rounded a to the ultimate precision, do a one-time conversion of unique values ​​and see them

 f0 = formatC f1 = function(x) { ux = unique(x); formatC(ux)[match(x, ux)] } 

It gives identical results.

 > identical(f0(a), f1(a)) [1] TRUE 

and faster, at least for a sample dataset.

 > microbenchmark(f0(a), f1(a)) Unit: milliseconds expr min lq mean median uq max neval f0(a) 46.05171 46.89991 47.33683 47.42225 47.58196 52.43244 100 f1(a) 10.97090 11.39974 11.48993 11.52598 11.58505 11.90506 100 

(although is this efficiency really relevant in R?)

+5
source

It actually seems that formatC comes out faster:

 library(microbenchmark) a <- round(runif(100000), 2) microbenchmark( as.character(a), formatC(a), format(a), sprintf('%.2f', a), gettextf('%.2f', a), paste0('', a) ) 

Conclusion:

 Unit: milliseconds expr min lq mean median uq max neval as.character(a) 69.58868 70.74803 71.98464 71.41442 72.92168 82.21936 100 formatC(a) 33.35502 36.29623 38.83611 37.60454 39.27079 72.92176 100 format(a) 55.98344 56.78744 58.00442 57.64804 58.83614 66.15601 100 sprintf("%.2f", a) 46.54285 47.40126 48.53067 48.10791 49.12717 65.26819 100 gettextf("%.2f", a) 46.74888 47.81214 49.23166 48.60025 49.16692 84.90208 100 paste0("", a) 86.62459 88.67753 90.80720 89.86829 91.33774 125.51421 100 

My sessionInfo :

 R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.4-2 loaded via a namespace (and not attached): [1] colorspace_1.2-4 digest_0.6.4 ggplot2_1.0.0 grid_3.1.0 gtable_0.1.2 MASS_7.3-35 [7] munsell_0.4.2 plyr_1.8.1 proto_0.3-10 Rcpp_0.11.3 reshape2_1.4 scales_0.2.4 [13] stringr_0.6.2 tools_3.1.0 
+7
source

Three other methods that I can think of, none of them work as fast as gettextf

 storage.mode(a) <- "character" mode(a) <- "character" as.vector(a, "character") 

The latter is basically as.character.default , bypassing the dispatch method. The terms for all of them are about the same as paste(a)

+4
source

Source: https://habr.com/ru/post/1213105/


All Articles