When working in R, I have data with a similar structure below (code 1). And I'm looking to create a new data.frame with the following characteristics:
For each unique value of ID_1, I would like to have two new columns, one of which contains a list (ID_2 that share ID_1 and Direction == 1), and the other column contains a list (ID_2 that share ID_1 and Direction == 0), ( see next block of code 2)
Dataset 1 block (initial):
ID_1 ID_2 Direction 100001 1 1 100001 11 1 100001 111 1 100001 1111 0 100001 11111 0 100001 111111 0 100002 2 1 100002 22 1 100002 222 0 100002 2222 0 100003 3 1 100003 33 1 100003 333 1 100003 3333 0 100003 33333 0 100003 333333 1 100004 4 1 100004 44 1
Converted to:
Dataset block 2 (desired result):
ID_1 ID_2_D1 ID_2_D0 100001 1,11,111 1111,11111,111111 100002 2,22 222,222 100003 3,33,333,333333 3333,33333 100004 4,44
I have a code that does this (taking the loops of a subset of the subsets), but I run it over many millions of unique βID_1sβ, doing it a lot of time (hours, I tell you!).
Any advice - maybe using apply () or the plyr () package, which can make this work faster?
Code for reference:
DF <- data.frame(ID_1=c(100001,100001,100001,100001,100001,100001,100002,100002,100002,100002,100003,100003,100003,100003,100003,100003,100004,100004) ,ID_2=c(1,11,111,1111,11111,111111,2,22,222,2222,3,33,333,3333,33333,333333,4,44) ,Direction=c(1,1,1,0,0,0,1,1,0,0,1,1,1,0,0,1,1,1) )
My current (too slow) code is:
DF2 <- data.frame( ID_1=DF[!duplicated(DF$ID_1),][,1]) for (i in 1:length(unique(DF2$ID_1))){ DF2$ID_2_D1[i] <- list(subset(DF,ID_1==unique(DF2$ID_1)[i] & Direction==1)$ID_2) DF2$ID_2_D0[i] <- list(subset(DF,ID_1==unique(DF2$ID_1)[i] & Direction==0)$ID_2) }