Change the data frame from the width to a panel with several variables and some temporary invariant

This is the main data analysis problem that Stata does in one step.

Create a wide data frame with data over time (x0) and time-varying data for 2000 and 2005 (x1, x2):

d1 <- data.frame(subject = c("id1", "id2"), x0 = c("male", "female"), x1_2000 = 1:2, x1_2005 = 5:6, x2_2000 = 1:2, x2_2005 = 5:6 ) 

st

 subject x0 x1_2000 x1_2005 x2_2000 x2_2005 1 id1 male 1 5 1 5 2 id2 female 2 6 2 6 

I want to shape it like a panel, so the data looks like this:

  subject x0 time x1 x2 1 id1 male 2000 1 1 2 id2 female 2000 2 2 3 id1 male 2005 5 5 4 id2 female 2005 6 6 

I can do it with reshape st

 d2 <-reshape(d1, idvar="subject", varying=list(c("x1_2000","x1_2005"), c("x2_2000","x2_2005")), v.names=c("x1","x2"), times = c(2000,2005), direction = "long", sep= "_") 

My main concern is that when you have dozens of variables, this command becomes very long. In stata you can simply enter:

 reshape long x1 x2, i(subject) j(year) 

Is there such a simple solution in R?

+8
r reshape data-manipulation panel stata
source share
2 answers

reshape can guess many of its arguments. In this case, it is sufficient to indicate the following. Packages are not used.

  reshape(d1, dir = "long", varying = 3:6, sep = "_") 

giving:

  subject x0 time x1 x2 id 1.2000 id1 male 2000 1 1 1 2.2000 id2 female 2000 2 2 2 1.2005 id1 male 2005 5 5 1 2.2005 id2 female 2005 6 6 2 
+12
source share

here is a short example using reshape2 package:

 library(reshape2) library(stringr) # it is always useful to start with melt d2 <- melt(d1, id=c("subject", "x0")) # redefine the time and x1, x2, ... separately d2 <- transform(d2, time = str_replace(variable, "^.*_", ""), variable = str_replace(variable, "_.*$", "")) # finally, cast as you want d3 <- dcast(d2, subject+x0+time~variable) 

now you don’t even need to specify x1 and x2.
This code works if the variables are incremented:

 > d1 <- data.frame(subject = c("id1", "id2"), x0 = c("male", "female"), + x1_2000 = 1:2, + x1_2005 = 5:6, + x2_2000 = 1:2, + x2_2005 = 5:6, + x3_2000 = 1:2, + x3_2005 = 5:6, + x4_2000 = 1:2, + x4_2005 = 5:6 + ) > > d2 <- melt(d1, id=c("subject", "x0")) > d2 <- transform(d2, time = str_replace(variable, "^.*_", ""), + variable = str_replace(variable, "_.*$", "")) > > d3 <- dcast(d2, subject+x0+time~variable) > > d3 subject x0 time x1 x2 x3 x4 1 id1 male 2000 1 1 1 1 2 id1 male 2005 5 5 5 5 3 id2 female 2000 2 2 2 2 4 id2 female 2005 6 6 6 6 
+4
source share

All Articles