Retrieving event types from the last 21-day window

My dataframe looks like this. The two rightmost columns are my required columns.

**Name ActivityType ActivityDate Email(last 21 says) Webinar(last21)** John Email 1/1/2014 NA NA John Webinar 1/5/2014 NA NA John Sale 1/20/2014 Yes Yes John Webinar 3/25/2014 NA NA John Sale 4/1/2014 No Yes John Sale 7/1/2014 No No Tom Email 1/1/2015 NA NA Tom Webinar 1/5/2015 NA NA Tom Sale 1/20/2015 Yes Yes Tom Webinar 3/25/2015 NA NA Tom Sale 4/1/2015 No Yes Tom Sale 7/1/2015 No No 

I'm just trying to create a yes / no variable that indicates whether there has been an email or webinar in the last 21 days for each sale transaction. I thought (code layout) along the dplyr usage lines as follows:

 custlife %>% group_by(Name) %>% mutate(Email(last21days)=lag(ifelse(ActivityType = "Email" & ActivityDate of email within (activity date of sale - 21),Yes,No)). 

I am not sure how to implement this. Kindly help. Your help is truly appreciated!

+3
r zoo dplyr
source share
2 answers

Here's a solution to data.table . Here I create 2 temporary datasets - one for Sale and one for the other activities, and then connects between them with a rolling window 21 using by = .EACHI to check the conditions in each connection. Then I attach the result to the original dataset.

Convert the date column to the Date class and enter the data by name and date (for final / sliding join)

 library(data.table) setkey(setDT(df)[, ActivityDate := as.IDate(ActivityDate, "%m/%d/%Y")], Name, ActivityDate) 

Create 2 temporary datasets for each activity

 Saletemp <- df[ActivityType == "Sale", .(Name, ActivityDate)] Elsetemp <- df[ActivityType != "Sale", .(Name, ActivityDate, ActivityType)] 

Join the transition calendar from 21 to a temporary set of sales data when checking conditions

 Saletemp[Elsetemp, `:=`(Email21 = as.logical(which(i.ActivityType == "Email")), Webinar21 = as.logical(which(i.ActivityType == "Webinar"))), roll = -21, by = .EACHI] 

Join everything back

 df[Saletemp, `:=`(Email21 = i.Email21, Webinar21 = i.Webinar21)] df # Name ActivityType ActivityDate Email21 Webinar21 # 1: John Email 2014-01-01 NA NA # 2: John Webinar 2014-01-05 NA NA # 3: John Sale 2014-01-20 TRUE TRUE # 4: John Webinar 2014-03-25 NA NA # 5: John Sale 2014-04-01 NA TRUE # 6: John Sale 2014-07-01 NA NA # 7: Tom Email 2015-01-01 NA NA # 8: Tom Webinar 2015-01-05 NA NA # 9: Tom Sale 2015-01-20 TRUE TRUE # 10: Tom Webinar 2015-03-25 NA NA # 11: Tom Sale 2015-04-01 NA TRUE # 12: Tom Sale 2015-07-01 NA NA 
+5
source share

Here is another option with base R :

df first split into Name , and then, among each subset, for each Sale , it looks to see if there is an email (webinar) from the Sale for 21 days. Finally, the list is not split according to Name .
You just need to replace FALSE with no and TRUE with yes .

 df_split <- split(df, df$Name) df_split <- lapply(df_split, function(tab){ i_s <- which(tab[,2]=="Sale") tab$Email21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Email", 3] >= d_s-21)}) tab$Webinar21[i_s] <- sapply(tab[i_s, 3], function(d_s){any(tab[tab$ActivityType=="Webinar", 3] >= d_s-21)}) tab }) df_res <- unsplit(df_split, df$Name) df_res # Name ActivityType ActivityDate Email21 Webinar21 #1 John Email 2014-01-01 NA NA #2 John Webinar 2014-01-05 NA NA #3 John Sale 2014-01-20 TRUE TRUE #4 John Webinar 2014-03-25 NA NA #5 John Sale 2014-04-01 FALSE TRUE #6 John Sale 2014-07-01 FALSE FALSE #7 Tom Email 2015-01-01 NA NA #8 Tom Webinar 2015-01-05 NA NA #9 Tom Sale 2015-01-20 TRUE TRUE #10 Tom Webinar 2015-03-25 NA NA #11 Tom Sale 2015-04-01 FALSE TRUE #12 Tom Sale 2015-07-01 FALSE FALSE 

<strong> data

 df <- structure(list(Name = c("John", "John", "John", "John", "John", "John", "Tom", "Tom", "Tom", "Tom", "Tom", "Tom"), ActivityType = c("Email", "Webinar", "Sale", "Webinar", "Sale", "Sale", "Email", "Webinar", "Sale", "Webinar", "Sale", "Sale"), ActivityDate = structure(c(16071, 16075, 16090, 16154, 16161, 16252, 16436, 16440, 16455, 16519, 16526, 16617), class = "Date")), .Names = c("Name", "ActivityType", "ActivityDate"), row.names = c(NA, -12L), index = structure(integer(0), ActivityType = c(1L, 7L, 3L, 5L, 6L, 9L, 11L, 12L, 2L, 4L, 8L, 10L)), class = "data.frame") 
+2
source share

All Articles