How to assign a counter to a specific subset of data.frame, which is determined by a combination of factors?

My question is: I have a data frame with some factor variables. Now I want to assign a new vector to this data frame, which creates an index for each subset of these factor variables.

data <-data.frame(fac1=factor(rep(1:2,5)), fac2=sample(letters[1:3],10,rep=T)) 

Gives me something like:

  fac1 fac2 1 1 a 2 2 c 3 1 b 4 2 a 5 1 c 6 2 b 7 1 a 8 2 a 9 1 b 10 2 c 

And I want this is a combination counter that takes into account the appearance of each combination of factors. Like this

  fac1 fac2 counter 1 1 a 1 2 2 c 1 3 1 b 1 4 2 a 1 5 1 c 1 6 2 b 1 7 1 a 2 8 2 a 2 9 1 b 2 10 1 a 3 

So far I have been thinking about using tapply to get a counter for all factor combinations, which works great

 counter <-tapply(data$fac1, list(data$fac1,data$fac2), function(x) 1:length(x)) 

But I do not know how I can assign a list of counters (for example, unregistered) for combinations in a data frame without using an inefficient loop :)

+7
source share
4 answers

This is the job for the ave() function:

 # Use set.seed for reproducible examples # when random number generation is involved set.seed(1) myDF <- data.frame(fac1 = factor(rep(1:2, 7)), fac2 = sample(letters[1:3], 14, replace = TRUE), stringsAsFactors=FALSE) myDF$counter <- ave(myDF$fac2, myDF$fac1, myDF$fac2, FUN = seq_along) myDF # fac1 fac2 counter # 1 1 a 1 # 2 2 b 1 # 3 1 b 1 # 4 2 c 1 # 5 1 a 2 # 6 2 c 2 # 7 1 c 1 # 8 2 b 2 # 9 1 b 2 # 10 2 a 1 # 11 1 a 3 # 12 2 a 2 # 13 1 c 2 # 14 2 b 3 

Note the use of stringsAsFactors=FALSE in the data.frame() step. If you haven’t done this, you can get the output using: myDF$counter <- ave(as.character(myDF$fac2), myDF$fac1, myDF$fac2, FUN = seq_along) .

+6
source

Solution data.table

 library(data.table) DT <- data.table(data) DT[, counter := seq_len(.N), by = list(fac1, fac2)] 
+4
source

This is a basic R method that avoids a (explicit) loop.

 data$counter <- with(data, { inter <- as.character(interaction(fac1, fac2)) names(inter) <- seq_along(inter) inter.ordered <- inter[order(inter)] counter <- with(rle(inter.ordered), unlist(sapply(lengths, sequence))) counter[match(names(inter), names(inter.ordered))] }) 
0
source

Here's a little loop option (I renamed your variable to "x" because "data" is used differently):

 x <-data.frame(fac1=rep(1:2,5), fac2=sample(letters[1:3],10,rep=T)) x$fac3 <- paste( x$fac1, x$fac2, sep="" ) x$ctr <- 1 y <- table( x$fac3 ) for( i in 1 : length( rownames( y ) ) ) x$ctr[x$fac3 == rownames(y)[i]] <- 1:length( x$ctr[x$fac3 == rownames(y)[i]] ) x <- x[-3] 

I don’t know if this operation is effective on a large data.frame, but it works!

0
source

All Articles