I have a data table with many columns. I need to skip them and create new columns using some condition. I am currently writing a separate condition line for each column. Let me explain an example. Consider an example of data like -
set.seed(71)
DT <- data.table(town = rep(c('A','B'), each=10),
tc = rep(c('C','D'), 10),
one = rnorm(20,1,1),
two = rnorm(20,2,1),
three = rnorm(20,3,1),
four = rnorm(20,4,1),
five = rnorm(20,5,2),
six = rnorm(20,6,2),
seven = rnorm(20,7,2),
total = rnorm(20,28,3))
For each column from one to a common, I need to create 4 new columns, i.e. medium, sd, uplimit, lowlimit for calculating 2 sigma-outlier. I'm doing it -
DTnew <- DT[, as.list(unlist(lapply(.SD, function(x) list(mean = mean(x), sd = sd(x), uplimit = mean(x)+1.96*sd(x), lowlimit = mean(x)-1.96*sd(x))))), by = .(town,tc)]
This is DTnew data.table. Then I team up with my DT
DTmerge <- merge(DT, DTnew, by= c('town','tc'))
Now, to come up with outliers, I write a separate set of codes for each variable -
DTAoutlier <- DTmerge[ ,one.Aoutlier := ifelse (one >= one.lowlimit & one <= one.uplimit,0,1)]
DTAoutlier <- DTmerge[ ,two.Aoutlier := ifelse (two >= two.lowlimit & two <= two.uplimit,0,1)]
DTAoutlier <- DTmerge[ ,three.Aoutlier := ifelse (three >= three.lowlimit & three <= three.uplimit,0,1)]
can anyone help simplify this code so
outlier. 8 , , 100 , 100 ? for? ?
data.table , . , , 3 10. DTlog, DT. DT DT.
DTlog <- DT[,(lapply(.SD,log)),by = .(town,tc),.SDcols=3:10]
.