I have a data table that is pretty similar to
set.seed(1)
dt<-data.table(med=sample(letters,50,T),
diag=sample(LETTERS[1:7],50,T),
val=sample(1:100,50,F))
I want to calculate the probability that anyone valwill be more than valfor the same one diag, and return it to a new column in the table, let's say prob(I know about this that the probability is not necessarily normal. I'm fine with this situation).
I can do this with a for loop:
for (i in 1:50){
dg<-dt[i,diag]
vl<-dt[i,val]
dt$prob[i]<-pnorm(vl,
mean(dt[diag==dg,val]),
sd(dt[diag==dg,val]),
lower.tail = F)
}
but my data is pretty big (dt - about 800k lines, with some 2k levels on diag), so I would like to vectorize instead of loops.
I tried
dt[,
.(lapply(.SD,function(x)
pnorm(x[1],
mean(x),
sd(x),
lower.tail = F))),
by=diag,
.SDcols="val"]
which, of course, groups do not diaggive only one probability and, therefore, are of little use. I also tried
dt[,
.(lapply(.SD,function(x)
pnorm(x[1],
mean(x),
sd(x),
lower.tail = F))),
by=.EACHI,
.SDcols="val"]
but it causes an error:
Error in `[.data.table`(dt, , .(lapply(.SD, function(x) pnorm(x[1], mean(x), :
logicial error. i is not data.table, but mult='all' and 'by'=.EACHI
?
data.table s, , , (plyr, dplyr ..).
,