How to specify columns in R to be used in matches (without listing separately)?

Suppose I have three data columns ( sample1 , sample2 and sample3 ). I need all the rows in which the letter b or h appears in any of the columns. This works great:

 data <- data.frame(row_name=c("s1_100","s1_200", "s2_300", "s1_400", "s1_500"), sample1=rep("a",5), sample2=c(rep("b",2),rep("a",3)), sample3=c(rep("a",4),"h") ) data # row_name sample1 sample2 sample3 # s1_100 aba # s1_200 aba # s1_300 aaa # s1_400 aaa # s1_500 aah bh <- c('b','h') bh_data <- subset(data, ( sample1 %in% bh | sample2 %in% bh | sample3 %in% bh ) ) bh_data # row_name sample1 sample2 sample3 # s1_100 aba # s1_200 aba # s1_500 aah 

However, since I ask the same question about each column, is there an extra way to do this?

But in fact, we have more than 800 columns and more than 70,000 rows, and we will want to select as many or more specific columns for the search. For example, using hundreds of column names just doesn't seem practical unless I scripted an R script.

0
r data.table subset
source share
1 answer

Try

  indx <- Reduce(`|`, lapply(df[,-1], `%in%`, bh)) df[indx,] # row_name sample1 sample2 sample3 #1 s1_100 aba #2 s1_200 aba #5 s1_500 aah 

Or using data.table

  library(data.table) nm1 <- paste0("sample", 1:3) setDT(df)[df[, Reduce(`|`,lapply(.SD, `%in%`, bh)), .SDcols=nm1]] # row_name sample1 sample2 sample3 #1: s1_100 aba #2: s1_200 aba #3: s1_500 aah 

data

  df <- structure(list(row_name = c("s1_100", "s1_200", "s1_300", "s1_400", "s1_500"), sample1 = c("a", "a", "a", "a", "a"), sample2 = c("b", "b", "a", "a", "a"), sample3 = c("a", "a", "a", "a", "h")), .Names = c("row_name", "sample1", "sample2", "sample3"), class = "data.frame", row.names = c(NA, -5L)) 
+3
source share

All Articles