R boot statistics for groups for big data

I want to load a dataset in which there are groups. A simple script would be a simple download:

data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]} boot(data, stat, R = 10) 

This gives me the error incorrect number of subscripts on matrix , due to the by = "group" . I managed to solve it with a subset, but I don't like this solution. Is there an easier way to get this task to work?

In particular, I would like to introduce an additional argument to the statistics function, for example stat(x, i, groupvar) , and pass it to the boot function, for example boot(data, stat(groupvar = group), R = 100) ?

+4
source share
3 answers

This should do it:

 data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 
+2
source

Using

  boot * 1.3-18 2016-02-23 CRAN (R 3.2.3) data.table * 1.9.7 2015-10-05 Github (Rdatatable/ data.table@d607425 ) 

I got an error using the OP code with the answer provided by @eddi:

 data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2)), by = "group"]} data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 

It produces an error message:

 Error in eval(expr, envir, enclos) : object 'group' not found 

The error was fixed by removing by=group from the stat function:

 set.seed(1000) data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]} data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 

Generates the following Bootstrap statistics:

 [[1]] ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = .SD, statistic = stat, R = 10) Bootstrap Statistics : original bias std. error t1* 0.5158232 0.004930451 0.01576641 t2* 0.5240713 -0.001851889 0.02851483 [[2]] ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = .SD, statistic = stat, R = 10) Bootstrap Statistics : original bias std. error t1* 0.5142383 -0.0072475030 0.02568692 t2* 0.5291694 -0.0001509404 0.02378447 

Below, I modify the sample dataset to indicate which Bootstrap statistics come with a combination of column columns:

Consider group 1, which has an average of 10 for x1 and an average of 10,000 for x2 and 2, which has an average of 2,000 for x1 and an average of 8,000 for x2:

 data2 <- as.data.table(list(x1 = c(runif(100, 9,11),runif(100, 1999,2001)), x2 = c(runif(100, 9999,10001),runif(100, 7999,8001)), group = rep(c(1,2), each=100))) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]} data2[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 

What gives:

 [[1]] ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = .SD, statistic = stat, R = 10) Bootstrap Statistics : original bias std. error t1* 10.00907 0.007115938 0.04349184 t2* 9999.90176 -0.019569568 0.06160653 [[2]] ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = .SD, statistic = stat, R = 10) Bootstrap Statistics : original bias std. error t1* 1999.965 0.031694179 0.06561209 t2* 8000.110 -0.006569872 0.03992401 
+3
source

Lots of code problems before you get into the group.

Did you mean something like this?

 data <- as.data.frame(list(x1 = runif(200), x2 = runif(200), group = factor(sample(letters[1:2])))) stat <- function(x, i) c(m1 = mean(x$x1[i]), m2 = mean(x$x2[i])) > stat(x,1:10) m1 m2 0.4465738 0.5522221 

Then from there you can worry about doing it in a group, but you choose .

For instance:

 library(plyr) dlply( data, .(group), function( dat ) boot(dat, stat, R=10) ) 

For large datasets, try data.table :

 by( seq(nrow(data)), data$group, function(idx) myboot(data[idx,])) 

I went with by() , not with the argument data.table ,by= , because you want the result to be a list. Maybe some functions that I don’t know about are for this, but I could not find them (see the Change History for the problem that it caused).

The subset is still executed using the data.table [] method, so it should be pretty fast.

+1
source

All Articles