Create a new column in the data frame: index in the group (not unique between groups)

I have a data frame with two columns: the first column contains the group to which each person belongs, and the second contains an individual identifier. See below:

df <- data.frame( group=c('G1','G1','G1','G1','G2','G2','G2','G2'), 
      indiv=c('indiv1','indiv1','indiv2','indiv2','indiv3',
              'indiv3','indiv4','indiv4'))

   group   indiv
1     G1  indiv1
2     G1  indiv1
3     G1  indiv2
4     G1  indiv2
5     G2  indiv3
6     G2  indiv3
7     G2  indiv4
8     G2  indiv4

I would like to create a new column in my data frame (keeping a long format) with the index of each person in the group, that is:

   group   indiv  Ineed
1     G1  indiv1      1
2     G1  indiv1      1
3     G1  indiv2      2
4     G1  indiv2      2
5     G2  indiv3      1
6     G2  indiv3      1
7     G2  indiv4      2
8     G2  indiv4      2

I tried with the data.table.N or .GRP methods, without success (good work on data.table, by the way!).

Any help is much appreciated!

+4
source share
3 answers

Here you can use the new function rleid(from the development version v> = 1.9.5)

setDT(df)[, Ineed := rleid(indiv), group][]
#    group  indiv Ineed
# 1:    G1 indiv1     1
# 2:    G1 indiv1     1
# 3:    G1 indiv2     2
# 4:    G1 indiv2     2
# 5:    G2 indiv3     1
# 6:    G2 indiv3     1
# 7:    G2 indiv4     2
# 8:    G2 indiv4     2

( ), ( CRAN v <= 1.9.4)

setDT(df)[, Ineed := as.numeric(factor(indiv)), group][]
#    group  indiv Ineed
# 1:    G1 indiv1     1
# 2:    G1 indiv1     1
# 3:    G1 indiv2     2
# 4:    G1 indiv2     2
# 5:    G2 indiv3     1
# 6:    G2 indiv3     1
# 7:    G2 indiv4     2
# 8:    G2 indiv4     2
+4

1.9.5 ( ) frank ( frankv). :

require(data.table) ## 1.9.5+
setDT(df)[, col := frank(indiv, ties.method="dense"), by=group]
df
#    group  indiv col
# 1:    G1 indiv1   1
# 2:    G1 indiv1   1
# 3:    G1 indiv2   2
# 4:    G1 indiv2   2
# 5:    G2 indiv3   1
# 6:    G2 indiv3   1
# 7:    G2 indiv4   2
# 8:    G2 indiv4   2

, .

+4

Another option using base R

df$Ineed <- with(df, ave(as.numeric(indiv), group, 
                  FUN=function(x) cumsum(!duplicated(x))))
df
#  group  indiv Ineed
#1    G1 indiv1     1
#2    G1 indiv1     1
#3    G1 indiv2     2
#4    G1 indiv2     2
#5    G2 indiv3     1
#6    G2 indiv3     1
#7    G2 indiv4     2
#8    G2 indiv4     2

The data.table version will be

setDT(df)[, Ineed := cumsum(!duplicated(indiv)), group][]
+2
source

All Articles