R data.table computes a function for a subset vector for each group member

Question

R data.table computes a function for a subset vector for each group member

I have a data table that is pretty similar to

set.seed(1)

dt<-data.table(med=sample(letters,50,T),
    diag=sample(LETTERS[1:7],50,T),
    val=sample(1:100,50,F))

I want to calculate the probability that anyone valwill be more than valfor the same one diag, and return it to a new column in the table, let's say prob(I know about this that the probability is not necessarily normal. I'm fine with this situation).

I can do this with a for loop:

for (i in 1:50){
    dg<-dt[i,diag]
    vl<-dt[i,val]
    dt$prob[i]<-pnorm(vl,
                      mean(dt[diag==dg,val]),
                      sd(dt[diag==dg,val]),
                      lower.tail = F)
}

but my data is pretty big (dt - about 800k lines, with some 2k levels on diag), so I would like to vectorize instead of loops.

I tried

dt[,
   .(lapply(.SD,function(x) 
                pnorm(x[1],
                mean(x),
                sd(x),
                lower.tail = F))),
   by=diag,
   .SDcols="val"]

which, of course, groups do not diaggive only one probability and, therefore, are of little use. I also tried

dt[,
   .(lapply(.SD,function(x) 
                pnorm(x[1],
                mean(x),
                sd(x),
                lower.tail = F))),
   by=.EACHI,
   .SDcols="val"]

but it causes an error:

Error in `[.data.table`(dt, , .(lapply(.SD, function(x) pnorm(x[1], mean(x),  : 
  logicial error. i is not data.table, but mult='all' and 'by'=.EACHI

?

data.table s, , , (plyr, dplyr ..).

,

+4

vectorization r data.table subset

PavoDive 26 . '15 3:26

2

a dplyr :

dt %>% group_by(diag) %>% 
       mutate(prob = pnorm(val, mean(val), sd(val), lower.tail = FALSE))

+2

jeremycg 26 . '15 4:03

thelatemail · Accepted Answer · 2015-06-26T04:07:26+0000

data.table:

dt[, prob2 := pnorm(val, mean(val), sd(val), lower.tail=FALSE), by=diag]

, :

head(dt)
#   med diag val       prob      prob2
#1:   p    E  91 0.04713131 0.04713131
#2:   f    E   3 0.92991675 0.92991675
#3:   o    B  26 0.83792988 0.83792988
#4:   t    C  38 0.70877125 0.70877125
#5:   g    E  71 0.16909178 0.16909178
#6:   i    E  25 0.75428819 0.75428819

R data.table computes a function for a subset vector for each group member

More articles: