Extract rows from data.frame based on shared list values

Question

Extract rows from data.frame based on shared list values

I am looking for an easy way to filter rows from data.frame based on a list of number sequences.

Here is an example:

My original data frame:

data <- data.frame(x=c(0,1,2,0,1,2,3,4,5,12,2,0,10,11,12,13),y="other_data")

My list:

 list1 <- list(1:5,10:13)

My goal is to save only rows from "data" that contain exactly the same numerical sequence "list1" as in the column "x" of "data". Thus, the output data.frame should be:

 finaldata <- data.frame(x=c(1:5,10:13),y="other_data")

Any ideas for this?

+7

list r filtering

jeff6868 Sep 17 '15 at 11:27

source share

4 answers

Why not use rollapply from zoo :

 library(zoo) ind = lapply(list1, function(x) { n = length(x) which(rollapply(data$x, n, function(y) all(y==x))) + 0:(n-1) }) data[unlist(ind),] #xy #5 1 other_data #6 2 other_data #7 3 other_data #8 4 other_data #9 5 other_data #13 10 other_data #14 11 other_data #15 12 other_data #16 13 other_data

+1

Colonel beauvel Sep 17 '15 at 11:47

source share

 extract_fun <- function(x, dat){ # Index where the sequences start ind <- which(dat == x[1]) # Indexes (within dat) where the sequence should be ind_seq <- lapply(ind, seq, length.out = length(x)) # Extract the values from dat at the position dat_val <- mapply(`[`, list(dat), ind_seq) # Check if values within dat == those in list1 i <- which(as.logical(apply(dat_val, 2, all.equal, x))) # which one is equal? # Return the correct indices ind_seq[[i]] }

Get indexes on an element in list1 and combine them with the necessary indexes

 all_ind <- do.call(c, lapply(list1, extract_fun, data$x)) data[all_ind,]

Result:

  xy 5 1 other_data 6 2 other_data 7 3 other_data 8 4 other_data 9 5 other_data 13 10 other_data 14 11 other_data 15 12 other_data 16 13 other_data

+1

Rentrop Sep 17 '15 at 13:08

source share

The match2 function passes through each x value and checks it and the next n values against a vector of length n. Reduce is then used to create a sequence for indexing.

 match2 <- function(vec) { start <- which(sapply(1:nrow(data), function(i) all(data$x[i:(i+length(vec)-1)] == vec))) Reduce(':', c(start,start+length(vec)-1)) }

In doing so, we can use the apply function to repeat the process for each list1 .

 s <- sapply(list1, match2) data[unlist(s),] # xy # 5 1 other_data # 6 2 other_data # 7 3 other_data # 8 4 other_data # 9 5 other_data # 13 10 other_data # 14 11 other_data # 15 12 other_data # 16 13 other_data

0

Pierre lafortune Sep 17 '15 at 12:45

source share

Heroka · Accepted Answer · 2015-09-17T12:03:22+0000

I started with a custom function to get a subset for one sequence, and then easily spread with the foot.

 #function that takes sequence and a vector #and returns indices of vector that have complete sequence get_row_indices<- function(sequence,v){ #get run lengths of whether vector is in sequence rle_d <- rle(v %in% sequence) #test if it complete, so both v in sequence and length of #matches is length of sequence select <- rep(length(sequence)==rle_d$lengths &rle_d$values,rle_d$lengths) return(select) } #add row ID to data to show selection data$row_id <- 1:nrow(data) res <- do.call(rbind,lapply(list1,function(x){ return(data[get_row_indices(sequence=x,v=data$x),]) })) res > res xy row_id 5 1 other_data 5 6 2 other_data 6 7 3 other_data 7 8 4 other_data 8 9 5 other_data 9 13 10 other_data 13 14 11 other_data 14 15 12 other_data 15 16 13 other_data 16

Extract rows from data.frame based on shared list values

More articles: