Another solution comes from another question: how to effectively use library(profr) in R :
For example:
install.packages("profr") devtools::install_github("alexwhitworth/imputation") x <- matrix(rnorm(1000), 100) x[x>1] <- NA library(imputation) library(profr) a <- profr(kNN_impute(x, k=5, q=2), interval= 0.005)
It seems to me (at least to me) how plots are generally useful here (e.g. plot(a) ). But the data structure itself seems to offer a solution:
R> head(a, 10) level g_id t_id f start end n leaf time source 9 1 1 1 kNN_impute 0.005 0.190 1 FALSE 0.185 imputation 10 2 1 1 var_tests 0.005 0.010 1 FALSE 0.005 <NA> 11 2 2 1 apply 0.010 0.190 1 FALSE 0.180 base 12 3 1 1 var.test 0.005 0.010 1 FALSE 0.005 stats 13 3 2 1 FUN 0.010 0.110 1 FALSE 0.100 <NA> 14 3 2 2 FUN 0.115 0.190 1 FALSE 0.075 <NA> 15 4 1 1 var.test.default 0.005 0.010 1 FALSE 0.005 <NA> 16 4 2 1 sapply 0.010 0.040 1 FALSE 0.030 base 17 4 3 1 dist_q.matrix 0.040 0.045 1 FALSE 0.005 imputation 18 4 4 1 sapply 0.045 0.075 1 FALSE 0.030 base
One iterative solution:
That is, the data structure involves the use of tapply to summarize data. This can be done simply for one run of profr::profr
t <- tapply(a$time, paste(a$source, a$f, sep= "::"), sum) t[order(t)]
This shows that the largest users of the kernlab::kernelMatrix and overhead from R are for S4 classes and generics.
Preferably:
I note that, given the stochastic nature of the sampling process, I prefer to use averages to get a more reliable picture of the time profile:
prof_list <- replicate(100, profr(kNN_impute(x, k=5, q=2), interval= 0.005), simplify = FALSE) fun_timing <- vector("list", length= 100) for (i in 1:100) { fun_timing[[i]] <- tapply(prof_list[[i]]$time, paste(prof_list[[i]]$source, prof_list[[i]]$f, sep= "::"), sum) }
Removing unusual replicas and converting to data.frame s:
fun_timing <- fun_timing[-c(15,83)] fun_timing2 <- lapply(fun_timing, function(x) { ret <- data.frame(fun= names(x), time= x) dimnames(ret)[[1]] <- 1:nrow(ret) return(ret) })
Combine replication (almost certainly could be faster) and examine the results:
# function for merging DF in a list merge_recursive <- function(list, ...) { n <- length(list) df <- data.frame(list[[1]]) for (i in 2:n) { df <- merge(df, list[[i]], ... = ...) } return(df) }
results
From the results, a similar but stronger picture arises, as is the case with one case. Namely, there is a lot of overhead from R , and also that library(kernlab) slows me down. It should be noted that since kernlab is implemented in S4, the overhead in R is due to the fact that S4 classes are significantly slower than S3 classes.
I would also like to note that my personal opinion is that a cleaned version of this may be a useful porting request as a summary method for profr . Although I would be interested to see other offers!