Assigning values ​​in a sequence to a group of consecutive lines leaving several lines blank

I am trying to group several lines of sequences (and assigning them the same value), leaving some of the lines blank (when a certain condition is not met).

My data is the locations (xy coordinates), the date / time at which they were measured, and the time interval between measures. Somehow simplified, they look like this:

ID XY Time Span 1 3445 7671 0:00 - 2 3312 7677 4:00 4 3 3309 7680 12:00 8 4 3299 7681 16:00 4 5 3243 7655 20:00 4 6 3222 7612 4:00 8 7 3260 7633 0:00 4 8 3254 7641 8:00 8 9 3230 7612 0:00 16 10 3203 7656 4:00 4 11 3202 7678 8:00 4 12 3159 7609 20:00 12 ... 

I would like to assign a value to each sequence of locations that are measured over a 4 hour period and make my data look like this:

 ID XY Time Span Sequence 1 3445 7671 0:00 - - 2 3312 7677 4:00 4 1 3 3309 7680 12:00 8 NA 4 3299 7681 16:00 4 2 5 3243 7655 20:00 4 2 6 3222 7612 4:00 8 NA 7 3260 7633 0:00 4 3 8 3254 7641 8:00 8 NA 9 3230 7612 0:00 16 NA 10 3203 7656 4:00 4 4 11 3202 7678 8:00 4 4 12 3159 7609 20:00 12 NA 

I tried several algorithms with a for loop plus an ifelse loop, for example:

 Sequence <- for (i in 1:max(ID)) { ifelse (Span <= 4, i+1, "NA") } 

no luck. I know that my attempt is incorrect, but my programming skills are really basic, and I did not find a similar problem on the Internet.

Any ideas would be greatly appreciated!

+3
source share
3 answers

Here is a long liner:

 ifelse(x <- DF$Span == 4, cumsum(c(head(x, 1), tail(x, -1) - head(x, -1) == 1)), NA) # [1] NA 1 NA 2 2 NA 3 NA NA 4 4 NA 

Explanation:

  • x is a TRUE / FALSE vector showing where Span 4 .
  • tail(x, -1) is a safe way to write x[2:length(x)]
  • head(x, -1) is a safe way to write x[1:(length(x)-1)]
  • tail(x, -1) - head(x, -1) == 1 is the TRUE / FALSE vector showing where we went from Span != 4 to Span == 4 .
  • since the vector above is one element shorter than x , I preceded head(x, 1) in front of it. head(x, 1) is a safe way to write x[1] .
  • Then I take cumsum , so it converts the TRUE / FALSE vector to a vector of increasing integers: where Span goes from !=4 to ==4 , it increases by 1, otherwise it remains constant.
  • Everything is wrapped in ifelse , so you only see numbers, where x is TRUE, i.e., where Span == 4 .
+6
source

Here is another alternative using rle and rep . Suppose your data.frame is called "test".

Initialize the Sequence column by filling it with NA .

 test$Sequence <- NA 

Secondly, specify the condition that you meet, in this case test$Span == 4 .

 x <- test$Span == 4 

Third, use a combination of rle output ( lengths and values ) to find out how many times each new run in sequence.

 spanSeq <- rle(x)$lengths[rle(x)$values == TRUE] 

Finally, use rep with the times argument specified for the result obtained in step 3. Adjust the required test$Sequence values ​​according to the index corresponding to test$Span == 4 , and replace them with a new sequence.

 test$Sequence[x] <- rep(seq_along(spanSeq), times = spanSeq) test # ID XY Time Span Sequence # 1 1 3445 7671 0:00 - NA # 2 2 3312 7677 4:00 4 1 # 3 3 3309 7680 12:00 8 NA # 4 4 3299 7681 16:00 4 2 # 5 5 3243 7655 20:00 4 2 # 6 6 3222 7612 4:00 8 NA # 7 7 3260 7633 0:00 4 3 # 8 8 3254 7641 8:00 8 NA # 9 9 3230 7612 0:00 16 NA # 10 10 3203 7656 4:00 4 4 # 11 11 3202 7678 8:00 4 4 # 12 12 3159 7609 20:00 12 NA 

Once you understand the steps you can take, you can also do this directly with within() . The following will give you the same result:

 within(test, { Sequence <- NA spanSeq <- rle(Span == 4)$lengths[rle(Span == 4)$values == TRUE] Sequence[Span == 4] <- rep(seq_along(spanSeq), times = spanSeq) rm(spanSeq) }) 
+1
source
 count = 0 for (i in 1:max(ID)) { Sequence[i] = ifelse(Span[i] <= 4, count <- count+1, NA) } 
0
source

All Articles