I am very new to R, and a couple of days trying to do something that Stata does quite simply. A friend gave me a relatively complicated answer to this question, but I was wondering if there is an easy way to do the following.
Suppose I have two dataframe variables organized as shown below:
category var1 a 1 a 2 a 3 b 4 b 6 b 8 b 10 c 11 c 14 c 17
I want to generate five additional variables, each of which must be inserted into the same frame: var2 , var3 , var4 , var5 and var6
(1) var2 is a dummy variable that takes the value 1 for the first observation in each category (i.e. each of the three groups defined by category ), and 0 otherwise.
(2) var3 is a dummy variable that takes the value 1 for the last observation in each category, 0 otherwise.
(3) var4 calculates how many cases are in each group to which a particular case belongs (i.e. 3 for category a, 4 for category b and 3 for category c)
(4) var5 records the difference between each observation in var1 and the observation of it
(5) var6 records the difference between each observation in var1 and the observation on it, but only within the groups defined by category .
I am quite familiar with Stata, and I believe that all of the above is not difficult to do using the bysort prefix bysort . For example, var1 easily generated using the bysort category: gen var1=1 if _n==1 . But on the last day I tear out my hair, trying to figure out how to use it with R. I'm sure there are several solutions (my friend participated in the ddplyr package, which seemed like a step above my paid rating), Is there nothing easier than bysort ?
The final data set should look something like this:
category var1 var2 var3 var4 var5 var6 a 1 1 0 3 n/an/a a 2 0 0 3 1 1 a 3 0 1 3 1 1 b 4 1 0 4 1 n/a b 6 0 0 4 2 2 b 8 0 0 4 2 2 b 10 0 1 4 2 2 c 11 1 0 3 1 n/a c 14 0 0 3 3 3 c 17 0 1 3 3 3
Thanks so much for any suggestions in advance. Sorry for the rookie question; I am sure this will answer somewhere else, but I could not find it, despite the hours of searching.