Finding an easy way to complete Stata bysort tasks in R

Question

Finding an easy way to complete Stata bysort tasks in R

I am very new to R, and a couple of days trying to do something that Stata does quite simply. A friend gave me a relatively complicated answer to this question, but I was wondering if there is an easy way to do the following.

Suppose I have two dataframe variables organized as shown below:

category var1 a 1 a 2 a 3 b 4 b 6 b 8 b 10 c 11 c 14 c 17

I want to generate five additional variables, each of which must be inserted into the same frame: var2 , var3 , var4 , var5 and var6

(1) var2 is a dummy variable that takes the value 1 for the first observation in each category (i.e. each of the three groups defined by category ), and 0 otherwise.

(2) var3 is a dummy variable that takes the value 1 for the last observation in each category, 0 otherwise.

(3) var4 calculates how many cases are in each group to which a particular case belongs (i.e. 3 for category a, 4 for category b and 3 for category c)

(4) var5 records the difference between each observation in var1 and the observation of it

(5) var6 records the difference between each observation in var1 and the observation on it, but only within the groups defined by category .

I am quite familiar with Stata, and I believe that all of the above is not difficult to do using the bysort prefix bysort . For example, var1 easily generated using the bysort category: gen var1=1 if _n==1 . But on the last day I tear out my hair, trying to figure out how to use it with R. I'm sure there are several solutions (my friend participated in the ddplyr package, which seemed like a step above my paid rating), Is there nothing easier than bysort ?

The final data set should look something like this:

 category var1 var2 var3 var4 var5 var6 a 1 1 0 3 n/an/a a 2 0 0 3 1 1 a 3 0 1 3 1 1 b 4 1 0 4 1 n/a b 6 0 0 4 2 2 b 8 0 0 4 2 2 b 10 0 1 4 2 2 c 11 1 0 3 1 n/a c 14 0 0 3 3 3 c 17 0 1 3 3 3

Thanks so much for any suggestions in advance. Sorry for the rookie question; I am sure this will answer somewhere else, but I could not find it, despite the hours of searching.

+7

r stata

daanoo Aug 13 '14 at 2:18

source share

2 answers

Reply using dplyr

 library(dplyr) dat <- dat %>% group_by(category) %>% mutate(var2 = ifelse(row_number() == 1, 1, 0))%>% mutate(var3 = ifelse(row_number() == n(), 1, 0)) %>% mutate(var4 = n()) %>% mutate(var6 = lag(var1, 1)) %>% ungroup() %>% mutate(var5 = lag(var1, 1))

+3

Matthew Jun 06 '15 at 16:15

source share

rawr · Accepted Answer · 2014-08-13T02:38:57+0000

 dat <- read.table(header = TRUE, text = 'category var1 a 1 a 2 a 3 b 4 b 6 b 8 b 10 c 11 c 14 c 17') (dat <- within(dat, { var6 <- ave(var1, category, FUN = function(x) c(NA, diff(x))) var5 <- c(NA, diff(var1)) var4 <- ave(var1, category, FUN = length) var3 <- rev(!duplicated(rev(category))) * 1 var2 <- (!duplicated(category)) * 1 })) # category var1 var2 var3 var4 var5 var6 # 1 a 1 1 0 3 NA NA # 2 a 2 0 0 3 1 1 # 3 a 3 0 1 3 1 1 # 4 b 4 1 0 4 1 NA # 5 b 6 0 0 4 2 2 # 6 b 8 0 0 4 2 2 # 7 b 10 0 1 4 2 2 # 8 c 11 1 0 3 1 NA # 9 c 14 0 0 3 3 3 # 10 c 17 0 1 3 3 3

Finding an easy way to complete Stata bysort tasks in R

More articles: