I am not a statistician, so I don’t know if there is a general method to solve this issue. For me, the problem will be easier if you change your data in a long format.
library(reshape2) dat.m <- melt(dat) dat.m$value <- as.numeric(dat.m$value) head(dat.m) ID variable value 1 ILMN_1762337 sample1 7.86 2 ILMN_2055271 sample1 5.72 3 ILMN_1736007 sample1 3.82 4 ILMN_2383229 sample1 6.34 5 ILMN_1806310 sample1 6.15 6 ILMN_1653355 sample1 7.01
Then for each variable you do the following:
- Calculate limits using quantile
- remove genes that do not satisfy the condition.
You can do this, for example, using ddply from plyr :
res <- ddply(dat.m,.(variable),function(x){
CHANGE after clarification of the OP, if you want the cutoff values of 20% and 80% for the entire matrix, not only for each individual sample, you calculate qq outside ddply
qq <- quantile(dat.m$value, probs = c(0.2,0.8))
Then you will comment on the corresponding line, for example:
res <- ddply(dat.m,.(variable),function(x){ z <- x$value
PS here:
dat <- read.table(text=' ID sample1 sample2 sample3 sample4 sample5 sample6 ILMN_1762337 7.86 5.05 4.89 5.74 6.78 6.41 ILMN_2055271 5.72 4.29 4.64 5.00 6.30 8.02 ILMN_1736007 3.82 6.48 6.06 7.13 8.20 4.06 ILMN_2383229 6.34 4.34 6.12 6.83 4.82 5.57 ILMN_1806310 6.15 6.37 5.54 5.22 4.59 6.28 ILMN_1653355 7.01 4.73 6.62 6.27 4.77 6.12 ILMN_1705025 6.09 6.68 6.80 6.85 8.35 4.15 ILMN_1814316 5.77 5.17 5.94 6.51 7.12 7.20 ILMN_1814317 5.97 5.97 5.97 5.97 5.97 5.97 ILMN_1814318 5.97 5.97 5.97 5.97 5.97 5.97 ILMN_1814319 5.97 5.97 5.97 5.97 5.97 5.97',header=TRUE)
agstudy
source share