As an example of toys, suppose we have a function called "my_func" (code below) that takes two parameters, "n" and "p". Our function "my_func" will generate a random matrix "x" with columns "n" and "p" and do something expensive both during operation and in memory, for example, by calculating the sum of singular values ββof "x". (Of course, the function is single-line, but I read for readability here.)
my_func <- function(n, p) { x <- replicate(p, rnorm(n)) sum(svd(x)$d) }
If we want to calculate "my_func" for several values ββof "n", and for each value of "n" we have several values ββof "p", then vectorize this function and then apply it to "my_func", is simple:
n <- 10 * seq_len(5) p <- 100 * seq_len(10) grid <- expand.grid(n = n, p = p) my_func <- Vectorize(my_func) set.seed(42) do.call(my_func, grid) [1] 98.61785 195.50822 292.21575 376.79186 468.13570 145.18359 [7] 280.67456 421.03196 557.87138 687.75040 168.42994 340.42452 [13] 509.65528 683.69883 851.29063 199.08474 400.25584 595.18311 [19] 784.21508 982.34591 220.73215 448.23698 669.02622 895.34184 [25] 1105.48817 242.52422 487.56694 735.67588 976.93840 1203.25949
Please note that each call to the function "my_func" can be very slow for large "n" and "p" (try n = 1000 and p = 2000 for starters).
Now, in my actual application with a similarly constructed function, the number of lines in the "grid" is much larger than indicated here. Therefore, I am trying to understand that vectorization in R is slightly better.
First question: In the above example, the calls to "my_func" are executed sequentially, so that using memory in one call is garbage collection before the next call? I often use vectology, but I never stopped to ask this question.
Second question: (This question may depend on the first). Assuming the number of calls is large enough and that my_func is slow enough, is there parallelism here? I guess yes. My real question is this: is there a possibility of parallelization if, instead, 'my_func' had the same large matrix passed to it for each call? For the argument, suppose the matrix is ββcalled "y", has 1000 rows and 5000 columns, and is calculated on the fly. Of course, the transfer of the matrix "y" to each of the parallel nodes will have some lag.
I understand that the answer to the second question may be βIt depends on ...β If so, let me know and I will try to give more detailed information.
In addition, I appreciate any advice, feedback, or OMFG WTF N00B, YOU DO NOT WATCH THIS OTHER DEFENSE SERVICE FOR THE RELATED DISCUSSION !!! 111oneone1