How to convert a wide data frame to a long data frame for a multi-level structure with a “four nesting”?

I did a study that, in retrospect (one lives, learns :-)), it seems to generate layered data. Now I am trying to restructure a data set from wide to long so that I can analyze it using, for example, lme4.

In doing so, I encounter a problem that I encountered several times earlier, but for which I never found a good solution. This time I searched again, but I'm probably using the wrong keywords - or this problem is much less common than I thought.

Basically, in this data set, variable names indicate for which data the data is being collected. I asked the participants to evaluate (evaluate) the intervention (maybe something is real). Each intervention is in one of 6 areas of behavior. In addition, participants evaluated each intervention either when it was presented independently, or simultaneously with one other intervention, or with two other interventions. There were three types of interventions, and all of them were evaluated before (t0), and after (t1) I presented them with some information.

So, in reality, I have a data frame that can be regenerated as follows:

### Elements of the variable names measurementMomentsVector <- c("t0", "t1"); interventionTypesVector <- c("fear", "know", "scd"); nrOfInterventionsSimultaneouslyVector <- c(1, 2, 3); behaviorDomainsVector <- c("diet", "pox", "alc", "smoking", "traff", "adh"); ### Generate a vector with all variable names variableNames <- apply(expand.grid(measurementMomentsVector, interventionTypesVector, nrOfInterventionsSimultaneouslyVector, behaviorDomainsVector), 1, paste0, collapse="_"); ### Generate 5 'participants' worth of data wideData <- data.frame(matrix(rnorm(5*length(variableNames)), nrow=5)); ### Assign names names(wideData) <- variableNames; ### Add unique id variable for every participants wideData$id <- 1:5; 

Thus, using head(wideData)[, 1:5] , you can see something that looks like in the data frame:

  t0_fear_1_diet t1_fear_1_diet t0_know_1_diet t1_know_1_diet t0_scd_1_diet 1 -0.9338191 0.9747453 1.0069036 0.3500103 -0.844699708 2 0.8921867 1.3687834 -1.2005791 0.2747955 1.316768219 3 1.6200200 0.5245470 -1.2910586 1.3211912 -0.174795144 4 0.1543738 0.7535642 0.4726131 -0.3464789 -0.009190702 5 -1.3676692 -0.4491574 -2.0902003 -0.3484678 -2.537501824 

Now I want to convert this data into a long data framework with six variables, for example, "id", "measureMoment", "interactionType", "nrOfInterventionsSimultaneous", "behaviorDomain" and "rating", where the first variable indicates the participants to which the record belongs , the last variable is the assessment (assessment, assessment, assessment), the participants gave a specific intervention, and the four variables between them indicate which intervention is evaluated precisely.

Maybe I can write some "user" code just for this problem, but I expect R to have something for this. "I played with reshape2, for example:

 longData <- reshape(wideData, varying=1:(ncol(wideData)-1), idvar="id", sep="_", direction="long") 

But it is not possible to guess the time-varying variables:

 Error in guess(varying) : failed to guess time-varying variables from their names 

I have struggled with this several times, and I am unable to find answers on the Internet. And now I really need to move on, so I decided to try it as a last resort before I start writing something to order :-)

I would really appreciate any pointers anyone can give.

+7
r dataframe reshape2 reshape
source share
1 answer

I think your problem can be solved with a two-step approach:

  • melt your data into a long data.frame (or, as I did, into a long data.table )
  • separate the variable column with all labels into separate columns for each grouping variable you want.

Since the information for this is indicated in the labels, this can be easily achieved using the tstrsplit function from the data.table package.

This is what you can look for:

 library(data.table) longData <- melt(setDT(wideData), id.vars="id") longData[, c("moment", "intervention", "number", "behavior") := tstrsplit(variable, "_", type.convert = TRUE) ][, variable:=NULL] 

result:

 > head(longData,15) id value moment intervention number behavior 1: 1 -0.07747254 t0 fear 1 diet 2: 2 -0.76207379 t0 fear 1 diet 3: 3 1.15501244 t0 fear 1 diet 4: 4 1.24792369 t0 fear 1 diet 5: 5 -0.28226121 t0 fear 1 diet 6: 1 -1.04875354 t1 fear 1 diet 7: 2 -0.91436882 t1 fear 1 diet 8: 3 0.72863487 t1 fear 1 diet 9: 4 0.10934261 t1 fear 1 diet 10: 5 -0.06093002 t1 fear 1 diet 11: 1 -0.70725760 t0 know 1 diet 12: 2 1.06309003 t0 know 1 diet 13: 3 0.89501164 t0 know 1 diet 14: 4 1.48148316 t0 know 1 diet 15: 5 0.22086835 t0 know 1 diet 

As an alternative to data.table you can also split the variable column using the cSplit function of the cSplit package (after which you will have to rename the resulting variable columns):

 library(splitstackshape) longData <- cSplit(longData, sep="_", "variable", "wide", type.convert=TRUE) names(longData) <- c("id","value","moment","intervention","number","behavior") 

or using tidyr :

 library(tidyr) separate(longData, variable, c("moment", "intervention", "number", "behavior"), sep="_", remove=TRUE) 
+8
source share

All Articles