1 increment for each change in a column

Let's say I have the following data frame

set.seed(123)
df <- data.frame(var1=(runif(10)>0.5)*1)

var1 can have any type / number of levels not specifically 0 and 1s

I would like to create var2one that increments by 1 every time it var1changes without usingfor loop

Expected result in this case:

data.frame(var1=(runif(10)>0.5)*1, var2=c(1, 2, 3, 4, 4, 5, 6, 6, 6, 7))

var1 var2
   0    1
   1    2
   0    3
   1    4
   1    4
   0    5
   1    6
   1    6
   1    6
   0    7

Another option for a data frame might be:

df <- data.frame(var1=c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1"))

in this case, the result should be:

var1 var2
   a    1
   a    1
   1    2
   0    3
   b    4
   b    4
   b    4
   c    5
   1    6
   1    6
+4
source share
4 answers

Based on Mr. Flick, answer:

df$var2 <- cumsum(c(0,as.numeric(diff(df$var1))!=0))

But if you do not want to use diff, you can still use:

df$var2 <- c(0,cumsum(as.numeric(with(df,var1[1:(length(var1)-1)] != var1[2:length(var1)]))))

It starts at 0, not 1, but I'm sure you see how to change it if you want.

+8
source

diff() cumsum().

df$var2 <- cumsum(c(1,diff(df$var1)!=0))
+8

(rle)

x = c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1")
r = rle(x)

> rle(x)
Run Length Encoding
  lengths: int [1:6] 2 1 1 3 1 2
  values : chr [1:6] "a" "1" "0" "b" "c" "1"

, ( "a" ) 2 , "1" .. , , "" , ,

> rep(seq_along(r$lengths), r$lengths)
 [1] 1 1 2 3 4 4 4 5 6 6

Other answers are semi-mandatory as they rely on a column being a factor (); they fail when the column is actually a symbol ().

> diff(x)
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : 
  non-numeric argument to binary operator

The bypass was to map characters to integers along lines

> diff(match(x, x))
[1]  0  2  1  1  0  0  3 -5  0

Hmm, but by saying that I find that rle doesn't work on factors!

> f = factor(x)
> rle(f)
Error in rle(factor(x)) : 'x' must be a vector of an atomic type
> rle(as.vector(f))
Run Length Encoding
  lengths: int [1:6] 2 1 1 3 1 2
  values : chr [1:6] "a" "1" "0" "b" "c" "1"
+7
source

Here is another solution with R base with inverse.rle():

df <- data.frame(var1=c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1"))
r <- rle(as.character(df$var1))
r$values <- seq_along(r$values)
df$var2 <- inverse.rle(r)

Short version:

df$var2 <- with(rle(as.character(df$var1)), rep(seq_along(values), lengths))

Here is a solution with data.table:

library("data.table")
dt <- data.table(var1=c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1"))
dt[, var2:=rleid(var1)]
+1
source