How to create the required matrix using a Dataframe in R

I have one data frame that looks like this:

DF_1>

T_id D1 D2 Num type type_2 fig xt-1 2017-05-01 2017-03-25 12:11:45 10 AX 25.20 xt-2 2017-05-01 2017-03-25 21:05:25 20 AY 20.15 xt-3 2017-05-01 2017-03-25 08:10:55 25 BX 15.11 xt-4 2017-05-03 2017-03-25 07:19:35 30 BY 22.56 xt-5 2017-05-03 2017-03-25 13:12:56 45 CZ 35.45 xt-6 2017-05-03 2017-03-25 18:14:44 20 DZ 27.21 xt-7 2017-04-06 2017-03-25 19:21:35 15 AZ 23.20 xt-8 2017-04-06 2017-03-25 21:11:15 40 CX 21.40 xt-9 2017-04-08 2017-02-25 22:25:04 20 AA 27.50 xt-10 2017-04-06 2017-02-25 16:04:08 30 AY 32.20 xt-11 2017-04-05 2017-02-25 18:15:25 20 CZ 30.20 xt-12 2017-04-01 2017-01-25 19:22:25 50 AZ 33.15 xt-13 2017-04-02 2017-01-25 23:19:05 15 AA 30.12 xt-14 2017-03-03 2017-01-25 14:25:09 15 DY 31.25 xt-15 2017-03-10 2017-01-25 23:25:36 40 AX 25.45 

From the above data frame, I want the two matrices mentioned below:

 1. Date (Last Three Date from `sys.date()`) D1 count sum mean_num total_sum count_A sum_A count_other sum_other mean_fig mean_TAT 2017-05-03 3 95 31.66 6 0 0 3 95 28.40 2017-05-02 0 0 0 3 0 0 0 0 0.00 2017-05-01 3 55 18.33 3 2 30 1 25 20.15 
  • To calculate mean_TAT : substract D2 - D1 and what to take for this day based on the count value for the same date.
  • total_sum will be cumulative from the first date of the month.
  • count_A and sum_A based on type as A for a particular day.
  • count_other and sum_other for those where type is other than A

2. Base of the month (last three months in the data frame)

enter image description here Depending on the month, the format will be the same, only the calculation will be monthly.

  • There are 5 additional rows and 2 columns for each month, where the first three will be the top 3 type_2 based on the count for a specific month.
  • increase_% will be calculated in the previous month (i.e. if the count for May-17 is 50 than in April-17 100, what would be -50% and the same for the remaining 5 rows compared to their previous month count and sum .
  • The fourth A will be constant for each month for the value, where type_2 is "A".
  • The fifth Other would be different than those mentioned above 4 type_2 .
  • Total will be according to the column for count and sum will be an addition, and for mean will be an average value.

It seems that I could not explain correctly, I hope that the matrix is ​​understandable using a dataframe.

Looking forward to some help.

+8
r dataframe ggplot2 dplyr
source share
2 answers

Here is the first part:

 library(lubridate) library(dplyr) df2 <- df1 %>% mutate(ym = year(D1)*100+month(D1)) %>% arrange(D1) %>% group_by(D1,ym) %>% summarize(count = n(), sum=sum(Num), mean_num=mean(Num), count_A=sum(type=='A'), sum_A=sum(Num * (type=='A')), count_other=sum(type!='A'), sum_other=sum(Num * (type!='A')), mean_fig = mean(fig), mean_TAT = mean(D2-D1)) %>% group_by(ym) %>% mutate(total_sum=cumsum(count)) %>% ungroup %>% arrange(desc(D1)) %>% select(D1,count,sum,mean_num,total_sum,count_A,sum_A,count_other,sum_other,mean_fig,mean_TAT) # # A tibble: 9 x 11 # D1 count sum mean_num total_sum count_A sum_A count_other sum_other mean_fig mean_TAT # <date> <int> <int> <dbl> <int> <int> <int> <int> <int> <dbl> <time> # 1 2017-05-03 3 95 31.66667 6 0 0 3 95 28.40667 -39.00000 days # 2 2017-05-01 3 55 18.33333 3 2 30 1 25 20.15333 -37.00000 days # 3 2017-04-08 1 20 20.00000 7 1 20 0 0 27.50000 -42.00000 days # 4 2017-04-06 3 85 28.33333 6 2 45 1 40 25.60000 -21.33333 days # 5 2017-04-05 1 20 20.00000 3 0 0 1 20 30.20000 -39.00000 days # 6 2017-04-02 1 15 15.00000 2 1 15 0 0 30.12000 -67.00000 days # 7 2017-04-01 1 50 50.00000 1 1 50 0 0 33.15000 -66.00000 days # 8 2017-03-10 1 40 40.00000 2 1 40 0 0 25.45000 -44.00000 days # 9 2017-03-03 1 15 15.00000 1 0 0 1 15 31.25000 -37.00000 days 

<strong> data

 df1 <- read.table(text="T_id D1 D2 Num type type_2 fig xt-1 2017-05-01 '2017-03-25 12:11:45' 10 AX 25.20 xt-2 2017-05-01 '2017-03-25 21:05:25' 20 AY 20.15 xt-3 2017-05-01 '2017-03-25 08:10:55' 25 BX 15.11 xt-4 2017-05-03 '2017-03-25 07:19:35' 30 BY 22.56 xt-5 2017-05-03 '2017-03-25 13:12:56' 45 CZ 35.45 xt-6 2017-05-03 '2017-03-25 18:14:44' 20 DZ 27.21 xt-7 2017-04-06 '2017-03-25 19:21:35' 15 AZ 23.20 xt-8 2017-04-06 '2017-03-25 21:11:15' 40 CW 21.40 xt-9 2017-04-08 '2017-02-25 22:25:04' 20 AQ 27.50 xt-10 2017-04-06 '2017-02-25 16:04:08' 30 AW 32.20 xt-11 2017-04-05 '2017-02-25 18:15:25' 20 CV 30.20 xt-12 2017-04-01 '2017-01-25 19:22:25' 50 AZ 33.15 xt-13 2017-04-02 '2017-01-25 23:19:05' 15 AZ 30.12 xt-14 2017-03-03 '2017-01-25 14:25:09' 15 DY 31.25 xt-15 2017-03-10 '2017-01-25 23:25:36' 40 AX 25.45",h=T,strin=F) df1$D1 <- as.Date(df1$D1,"%Y-%m-%d") df1$D2 <- as.Date(df1$D2,"%Y-%m-%d") expected_output <- read.table(text="D1 count sum mean_num total_sum count_A sum_A count_other sum_other mean_fig 2017-05-03 3 95 31.66 6 0 0 3 95 28.40 2017-05-02 0 0 0 3 0 0 0 0 0.00 2017-05-01 3 55 18.33 3 2 30 1 25 20.15") 

Some tips for part 2:

I cannot work miracles without processing your questions (providing an accurate reproducible result is more than necessary here). But here is a way to get closer, I hope:

 df_month <- df1 %>% mutate(ym = year(D1)*100+month(D1)) %>% arrange(D1) %>% group_by(ym) %>% summarize(count = n(), sum=sum(Num), mean_num=mean(Num), count_A=sum(type=='A'), sum_A=sum(Num * (type=='A')), count_other=sum(type!='A'), sum_other=sum(Num * (type!='A')), mean_fig = mean(fig), mean_TAT = mean(D2-D1)) %>% mutate(type_2=paste0(month.abb[ym%% 100],"-",ym %/% 100 -2000)) %>% select(ym,type_2,count,sum,mean_num,count_A,sum_A,count_other,sum_other,mean_fig,mean_TAT) df_top3 <- df1 %>% filter(type_2 !="A") %>% mutate(ym = year(D1)*100+month(D1)) %>% arrange(desc(ym)) %>% group_by(ym,type_2) %>% summarize(count = n(), sum=sum(Num), mean_num=mean(Num), count_A=sum(type=='A'), sum_A=sum(Num * (type=='A')), count_other=sum(type!='A'), sum_other=sum(Num * (type!='A')), mean_fig = mean(fig), mean_TAT = mean(D2-D1)) %>% group_by(ym) %>% arrange(desc(count)) %>% slice(1:3) %>% ungroup %>% select(ym,type_2,count,sum,mean_num,count_A,sum_A,count_other,sum_other,mean_fig,mean_TAT) df_A <- df1 %>% filter(type_2 == "A") %>% mutate(ym = year(D1)*100+month(D1)) %>% arrange(desc(ym)) %>% group_by(ym,type_2) %>% summarize(count = n(), sum=sum(Num), mean_num=mean(Num), count_A=sum(type=='A'), sum_A=sum(Num * (type=='A')), count_other=sum(type!='A'), sum_other=sum(Num * (type!='A')), mean_fig = mean(fig), mean_TAT = mean(D2-D1)) %>% select(ym,type_2,count,sum,mean_num,count_A,sum_A,count_other,sum_other,mean_fig,mean_TAT) df_other <- df1 %>% mutate(ym = year(D1)*100+month(D1)) %>% anti_join(bind_rows(df_top3,df_A),by = c("ym","type_2")) %>% mutate(type_2="Other") %>% arrange(desc(ym)) %>% group_by(ym,type_2) %>% summarize(count = n(), sum=sum(Num), mean_num=mean(Num), count_A=sum(type=='A'), sum_A=sum(Num * (type=='A')), count_other=sum(type!='A'), sum_other=sum(Num * (type!='A')), mean_fig = mean(fig), mean_TAT = mean(D2-D1)) %>% select(ym,type_2,count,sum,mean_num,count_A,sum_A,count_other,sum_other,mean_fig,mean_TAT) # it empty with your example data bind_rows(df_month,df_top3,df_A,df_other) %>% arrange(ym) %>% select(-ym) %>% rename(Month = type_2) 
+5
source share

Extending the previous work, here is the code for table 2 (with 2 exceptions):

 library(dplyr) library(lubridate) library(magrittr) table2 <- df1 %>% mutate(ym = year(D1)*100+month(D1)) %>% mutate(monthLabel = paste0(year(D1), "-", month(D1, label = TRUE))) %>% group_by(type_2, ym, monthLabel) %>% summarize(count = n(), sum=sum(Num), mean_num=round(mean(Num), 1), count_A=sum(type=='A'), sum_A=sum(Num * (type=='A')), count_other=sum(type!='A'), sum_other=sum(Num * (type!='A')), mean_fig = round(mean(fig)), mean_TAT = round(mean(D2-D1))) %>% ungroup() %>% group_by(ym, monthLabel) %>% mutate(total_sum = sum(count)) %>% ungroup() %>% group_by(type_2) %>% arrange(ym) %>% mutate(priorC = lag(count)) %>% mutate(countIncrease = ifelse(count == 0, "", round(100*(count-priorC)/count))) %>% mutate(priorS = lag(sum)) %>% mutate(sumIncrease = ifelse(sum == 0, "", round(100*(sum-priorS)/sum))) %>% select(-priorC, -priorS) %>% ungroup() %>% arrange(desc(ym), desc(count)) %>% select(monthLabel, type_2, count, countIncrease, sum, sumIncrease, mean_num, total_sum, count_A, sum_A, count_other, sum_other, mean_fig, mean_TAT) 

Omissions / Exceptions:

  • There is no row A, since A is never Type_2 in data samples.
  • All Type_2 are displayed for each month, as in the data samples no more than 4 in each month.
+3
source share

All Articles