Count unique days with overlapping and spaces in date ranges

Question

Count unique days with overlapping and spaces in date ranges

Group       Start            End             Days
A           5/12/2015        5/14/2015       3
A           5/12/2015        5/14/2015       3
B           1/1/2015         1/3/2015        3
B           1/1/2015         1/3/2015        3
H           1/8/2015         1/9/2015        2
H           1/8/2015         1/9/2015        2
H           1/13/2015        1/15/2015       3
H           1/7/2015         1/17/2015       3
H           1/12/2015        1/22/2015       7

I have attached a sample of my dataset above. I am trying to calculate the number of unique days for each group in R. For some observations this is pretty simple, i.e. A and B. However, there are several groups with different overlapping days, as well as spaces in date ranges, i.e. H.

In any case, can I summarize the number of unique days (do not overlap and account for spaces) for each group in R? ie A and B will return 3 days respectively, and H will return in 11 days.

Group   Count
A       3
B       3
H       16

My best guess is to use the dplyr and sumize functions, however I was not able to wrap my head around any solution. Any help is appreciated! Thanks you

+4

date r dplyr

Michael luu May 12, '16 at 19:36

2

, (, Start End ):

library(data.table)
setDT(mydf)[, .(dates = seq.Date(Start,End,'day')) , by = .(Group,1:nrow(mydf))
            ][, .(count = uniqueN(dates)), by = Group][]

:

   Group count
1:     A     3
2:     B     3
3:     H    16

: Start End. uniqueN. , (. ), .

H . , , , 16.

R:

l <- mapply(seq.Date, mydf$Start, mydf$End, 1)
df2 <- data.frame(group = rep(mydf$Group,sapply(l,length)),
                  dates = unlist(l))
aggregate(dates ~ group, df2, function(x) length(unique(x)))

:

  group dates
1     A     3
2     B     3
3     H    16

dates df2 , as.Date(unlist(l), origin = '1970-01-01') unlist(l).

:

mydf <- structure(list(Group = c("A", "A", "B", "B", "H", "H", "H", "H", "H"), 
                       Start = structure(c(16567, 16567, 16436, 16436, 16443, 16443, 16448, 16442, 16447), class = "Date"), 
                       End = structure(c(16569, 16569, 16438, 16438, 16444, 16444, 16450, 16452, 16457), class = "Date"), 
                       Days = c(3L, 3L, 3L, 3L, 2L, 2L, 3L, 3L, 7L)), 
                  .Names = c("Group", "Start", "End", "Days"), row.names = c(NA, -9L), class = "data.frame")

+4

Jaap 12 '16 20:16

shreyasgm · Accepted Answer · 2016-05-12T20:40:09+0000

a dplyr :

library(dplyr)

df %>%
    group_by(Group,rn = row_number()) %>%
    do(data.frame(.,Date = seq(as.Date(.$Start,format = '%m/%d/%Y'),
                               as.Date(.$End,format = '%m/%d/%Y'),
                               '1 day'))) %>%
    group_by(Group) %>%
    summarise(numDays = n_distinct(Date))

, , , .

:

   Group numDays
  (fctr)   (int)
1      A       3
2      B       3
3      H      16

Count unique days with overlapping and spaces in date ranges

More articles: