Group rows based on condition in time series and ignoring false values

Question

Group rows based on condition in time series and ignoring false values

I have a set of places for animals with different sampling intervals. What I want to do is group and label sequences where the sampling interval meets certain criteria (for example, below a certain value). This is a revision of this issue , which has been noted as a duplicate of this . The difference in this revised question is that all values that DO NOT meet the criteria should be ignored, not marked.

Let me illustrate some dummy data:

start <- Sys.time() timediff <- c(rep(5,3),rep(20,3),rep(5,2)) timediff <- cumsum(timediff) # Set up a dataframe with a couple of time values df <- data.frame(TimeDate = start + timediff) # For understanding purposes, I will note the time differences in a separate column df$TimeDiff <- c(diff(df$TimeDate),NA)

Using the @Josh O'Brien answer , you can define a function that groups values that match certain criteria.

 number.groups <- function(input){ input[is.na(input)] <- FALSE # to eliminate NA return(head(cumsum(c(TRUE,!input)),-1)) } # Define the criteria and apply the function df$Group <- number.groups(df$TimeDiff <= 5) # output TimeDate TimeDiff Group 1 2016-03-16 15:41:51 5 1 2 2016-03-16 15:41:56 5 1 3 2016-03-16 15:42:01 20 1 4 2016-03-16 15:42:21 20 2 5 2016-03-16 15:42:41 20 3 6 2016-03-16 15:43:01 5 4 7 2016-03-16 15:43:06 5 4 8 2016-03-16 15:43:11 NA 4

The problem is that lines 4 and 5 are marked as separate groups and not ignored. Is there a way to make sure that values that DO NOT belong to the group are not grouped (for example, remain NA)?

+2

r grouping

Ratnanil Mar 16 '16 at 14:47

source share

1 answer

Ratnanil · Accepted Answer · 2016-03-17T15:01:12+0000

I think I found a way to solve the problem. The approach is to compare each value with the next and use this information to eliminate unique values. Then rename the remaining values, turning them into factors.

 number.groups <- function(input){ # Replace NAs with FALSE for cumsum() to work input[is.na(input)] <- FALSE # Make Groups using cumsum() group = (head(cumsum(c(TRUE,!input)),-1)) # Compare each value with the next compare <- head(group,-1) == tail(group,-1) # determine unique values uniques <- !(c(compare,F) | c(F,compare)) # remove unique values group[which(uniques)] <- NA # convert into factors group <- as.factor(group) # rename the factors levels(group) <- 1:length(levels(group)) return(group) } # apply the function df$Group <- number.groups(df$TimeDiff <= 5) # output TimeDate TimeDiff Group 1 2016-03-17 15:44:28 5 1 2 2016-03-17 15:44:33 5 1 3 2016-03-17 15:44:38 20 1 4 2016-03-17 15:44:58 20 <NA> 5 2016-03-17 15:45:18 20 <NA> 6 2016-03-17 15:45:38 5 2 7 2016-03-17 15:45:43 5 2 8 2016-03-17 15:45:48 NA 2

Group rows based on condition in time series and ignoring false values

More articles: