How to specify columns in R to be used in matches (without listing separately)?

Question

How to specify columns in R to be used in matches (without listing separately)?

Suppose I have three data columns ( sample1 , sample2 and sample3 ). I need all the rows in which the letter b or h appears in any of the columns. This works great:

 data <- data.frame(row_name=c("s1_100","s1_200", "s2_300", "s1_400", "s1_500"), sample1=rep("a",5), sample2=c(rep("b",2),rep("a",3)), sample3=c(rep("a",4),"h") ) data # row_name sample1 sample2 sample3 # s1_100 aba # s1_200 aba # s1_300 aaa # s1_400 aaa # s1_500 aah bh <- c('b','h') bh_data <- subset(data, ( sample1 %in% bh | sample2 %in% bh | sample3 %in% bh ) ) bh_data # row_name sample1 sample2 sample3 # s1_100 aba # s1_200 aba # s1_500 aah

However, since I ask the same question about each column, is there an extra way to do this?

But in fact, we have more than 800 columns and more than 70,000 rows, and we will want to select as many or more specific columns for the search. For example, using hundreds of column names just doesn't seem practical unless I scripted an R script.

0

r data.table subset

Christopher bottoms Oct 10 '14 at 17:50

source share

1 answer

akrun · Accepted Answer · 2014-10-10T18:06:28+0000

Try

  indx <- Reduce(`|`, lapply(df[,-1], `%in%`, bh)) df[indx,] # row_name sample1 sample2 sample3 #1 s1_100 aba #2 s1_200 aba #5 s1_500 aah

Or using data.table

  library(data.table) nm1 <- paste0("sample", 1:3) setDT(df)[df[, Reduce(`|`,lapply(.SD, `%in%`, bh)), .SDcols=nm1]] # row_name sample1 sample2 sample3 #1: s1_100 aba #2: s1_200 aba #3: s1_500 aah

data

  df <- structure(list(row_name = c("s1_100", "s1_200", "s1_300", "s1_400", "s1_500"), sample1 = c("a", "a", "a", "a", "a"), sample2 = c("b", "b", "a", "a", "a"), sample3 = c("a", "a", "a", "a", "h")), .Names = c("row_name", "sample1", "sample2", "sample3"), class = "data.frame", row.names = c(NA, -5L))

How to specify columns in R to be used in matches (without listing separately)?

data

More articles: