The number of events on each client based on several key dates

I'm new to R, and I have a dataset with customer numbers and dates for several thousand events. Data is formatted as follows:

data <- data.frame("Client"=c(rep(1, 4), rep(2, 3), rep(3, 2)), "Date"=as.Date(c("2015-11-20", "2015-12-04", "2016-01-08", "2016-04-07", "2015-12-19", "2016-02-02", "2016-02-21", "2016-01-04", "2016-02-12")), "Event"=rep(1, 9))
data
  Client       Date Event
1      1 2015-11-20     1
2      1 2015-12-04     1
3      1 2016-01-08     1
4      1 2016-04-07     1
5      2 2015-12-19     1
6      2 2016-02-02     1
7      2 2016-02-21     1
8      3 2016-01-04     1
9      3 2016-02-12     1

Given a set of key dates,

 refdates <- as.Date(c("2016-01-01", "2016-03-01"))

I would like to count the number of events on each client (1) 30 days after the key date, (2) 0-30 days before the key date and (3) 31-60 days before the key date for the set of key dates.

I want the output to have a data frame that looks like this:

  Client    RefDate post30 prior30 prior31.60
1      1 2016-01-01      1       1          1
2      1 2016-03-01      0       0          1
3      2 2016-01-01      0       1          0
4      2 2016-03-01      0       2          0
5      3 2016-01-01      1       0          0
6      3 2016-03-01      0       1          1

It seems to me that I can do this using plyr, but I feel a little overhead. Can someone point me in the right direction, please?

+4
source share
3

R.

do.call(rbind, lapply(refdates, FUN=function(i) {
  aggregate(cbind("post30"=data$Date - i > -1 & data$Date - i < 31,
                  "prior30"=data$Date - i > -31 & data$Date - i < 0, 
                  "prior31.60"=data$Date - i > -61 & data$Date - i < -30),
            list(data$Client), FUN=sum)
}))

:

  • aggregate , .
  • cbind , .
  • lapply aggregate. , .
  • , do.call data.frames rbinds data.frame.
+3

dplyr . , , .

require(dplyr)

data <- data.frame("Client"=c(rep(1, 4), rep(2, 3), rep(3, 2)), "Date"=as.Date(c("2015-11-20", "2015-12-04", "2016-01-08", "2016-04-07", "2015-12-19", "2016-02-02", "2016-02-21", "2016-01-04", "2016-02-12")), "Event"=rep(1, 9))
data

refdates <- as.Date(c("2016-01-01", "2016-03-01"))

data %>%
  merge(refdates, all = T) %>%
  rename(RefDate = y) %>%
  mutate(
    post30 = ifelse(between(Date - RefDate, 1, 31), 1, 0),
    prior30 = ifelse(between(Date - RefDate, -30, 0), 1, 0),
    prior30.60 = ifelse(between(Date - RefDate, -60, -31), 1, 0)
         ) %>%
   group_by(Client, RefDate) %>%
   summarise(post30 = sum(post30),
            prior30 = sum(prior30),
            prior30.60 = sum(prior30.60)
  )

:

  Client    RefDate post30 prior30 prior30.60
   (dbl)     (date)  (dbl)   (dbl)      (dbl)
1      1 2016-01-01      1       1          1
2      1 2016-03-01      0       0          1
3      2 2016-01-01      0       1          0
4      2 2016-03-01      0       2          0
5      3 2016-01-01      1       0          0
6      3 2016-03-01      0       1          1
+1

dplyr:

library(dplyr)
out <- data %>%
  merge(refdates) %>%
  rename(RefDate = y) %>%
  group_by(Client, RefDate) %>%
  mutate(Date.diff = Date - RefDate) %>%
  summarise(post30 = sum(Date.diff < 30 & Date.diff > 0),
            prior30 = sum(Date.diff < 0 & Date.diff > -30),
            prior31.60 = sum(Date.diff < -30 & Date.diff > -60))

out
  Client    RefDate post30 prior30 prior31.60
   (dbl)     (date)  (int)   (int)      (int)
1      1 2016-01-01      1       1          1
2      1 2016-03-01      0       0          1
3      2 2016-01-01      0       1          0
4      2 2016-03-01      0       2          0
5      3 2016-01-01      1       0          0
6      3 2016-03-01      0       1          1
+1

All Articles