I would like to turn the first table into the second one, selecting the last observation of the group for a and b , the first observation for c , summarizes each observation for the group for d and e , and for f check if a valid date exists and use this date.
Table 1:
ID abcdef 1 10 100 1000 10000 100000 ? 1 10 100 1001 10010 100100 5/07/1977 1 11 111 1002 10020 100200 5/07/1977 2 22 222 2000 20000 200000 6/02/1980 3 33 333 3000 30000 300000 20/12/1978 3 33 333 3001 30010 300100 ? 4 40 400 4000 40000 400000 ? 4 40 400 4001 40010 400100 ? 4 40 400 4002 40020 400200 7/06/1944 4 44 444 4003 40030 400300 ? 4 44 444 4004 40040 400400 ? 4 44 444 4005 40050 400500 ? 5 55 555 5000 50000 500000 31/05/1976 5 55 555 5001 50010 500100 31/05/1976
Table 2:
ID abcdef 1 11 111 1000 30030 300300 5/07/1977 2 22 222 2000 20000 200000 6/02/1980 3 33 333 3000 60010 600100 20/12/1978 4 44 444 4000 240150 2401500 7/06/1944 5 55 555 5000 100010 1000100 31/05/1976
I looked at the StackOverflow questions and I only saw elements of this. I can go through the next step.
library(data.table) setwd('D:/Work/BRB/StackOverflow') DT = data.table(fread('datatable.csv', header=TRUE)) AB = DT[ , .SD[.N], ID ] AB = AB[ , c('a', 'b') ] C = DT[ , .SD[1], ID ] C = C[ , 'c' ] DE = DT[ , .(d = sum(d), e = sum(e)) , by = ID ] Final = cbind(AB, C, DE) Final
My question is: can I perform operations with variables a , b , c , d , e in one transformation without the need to split it into 3?
Also, I have no idea how to do this f . Any suggestions?
Finally, I'm new to R. Is there anything else I can improve on my code?
r data.table
apotheosied
source share