Is there a way in data.table to assign an identifier to groups based on the identifier?

Suppose I have data that looks like this:

Name Date Bank1 1/1/2005 Bank1 6/1/2005 Bank1 1/1/2006 Bank1 6/1/2006 Bank1 1/1/2007 Bank2 6/1/2006 Bank2 1/1/2007 Bank3 6/1/2005 Bank3 1/1/2006 Bank3 6/1/2006 Bank3 1/1/2007 .... 

Here the first date is 1/1/2005, and the last is 1/1/2007. I would like to assign an identifier so that I have:

 Name Date ID Bank1 1/1/2005 1 Bank1 6/1/2005 2 Bank1 1/1/2006 3 Bank1 6/1/2006 4 Bank1 1/1/2007 5 Bank2 6/1/2006 4 Bank2 1/1/2007 5 Bank3 6/1/2005 2 Bank3 1/1/2006 3 Bank3 6/1/2006 4 Bank3 1/1/2007 5 .... 

Basically, I want to assign an identifier for a date that matches its order of all dates. In addition, Bank1 has a complete list of all dates. I was wondering if there was an easy way in data.table to do this. I tried things like loops, but with higher dimensions it gets harder. Any suggestions would be greatly appreciated, thanks!

+5
source share
2 answers

I think you are looking for a special .GRP character:

 dt[, id := .GRP, by=Date] # Name Date id # 1: Bank1 1/1/2005 1 # 2: Bank1 6/1/2005 2 # 3: Bank1 1/1/2006 3 # 4: Bank1 6/1/2006 4 # 5: Bank1 1/1/2007 5 # 6: Bank2 6/1/2006 4 # 7: Bank2 1/1/2007 5 # 8: Bank3 6/1/2005 2 # 9: Bank3 1/1/2006 3 # 10: Bank3 6/1/2006 4 # 11: Bank3 1/1/2007 5 
+6
source

Here is the dplyr solution:

 library(dplyr) df %>% select(Date) %>% slice(unique(Date)) %>% mutate(ID = row_number(as.Date(Date, format = "%d/%m/%Y"))) %>% left_join(df, .) %>% arrange(Name) 

What gives:

 #Joining by: "Date" # Name Date ID #1 Bank1 1/1/2005 1 #2 Bank1 6/1/2005 2 #3 Bank1 1/1/2006 3 #4 Bank1 6/1/2006 4 #5 Bank1 1/1/2007 5 #6 Bank2 6/1/2006 4 #7 Bank2 1/1/2007 5 #8 Bank3 6/1/2005 2 #9 Bank3 1/1/2006 3 #10 Bank3 6/1/2006 4 #11 Bank3 1/1/2007 5 
+2
source

Source: https://habr.com/ru/post/1214865/


All Articles