First of all, I'm looking for a quick (er) way to subset / index a matrix many times:
for (i in 1:99000) { subset.data <- data[index[, i], ] }
Background:
I am performing a sequential testing procedure, including loading in R. Wanting to replicate some of the simulation results, I came across this bottleneck where you need to do a lot of indexing. To implement the bootstrap block, I created an index matrix with which I will multiply the original data matrix to obtain repeated data samples.
The sequential testing procedure takes about 10 seconds. Using this in simulations with 2500 repetitions and several parametric constellations, it will take about 40 days. Using parallel processing and better processor power, you can do it faster, but still not very nice: /
- Is there a better way to reprogram the data / get rid of the loop?
- Can be applied, inserted into vectors, replicated, etc. go anywhere?
- Does it make sense to implement a subset in C (e.g. manipulate some pointers)?
Despite the fact that every step has already been taken incredibly fast R, it is simply not fast enough.
I would be very happy for any answer / help / advice!
related issues:
- A quick subset of matrices through '[': row by row, column by column, or doesn't it matter?
- quick function for creating bootstrap samples in matrix forms in R
- random sampling - matrix
from there
mapply(function(row) return(sample.data[row,]), row = boot.index) replicate(B, apply(sample.data, 2, sample, replace = TRUE))
didn't really do it for me.
r simulation matrix-indexing statistics-bootstrap
Niels
source share