Convert R-frames from long to wide format, but with unequal group sizes, for use with qcc

I would like to convert a data frame from a long format to a wide format, but with unequal group sizes.

Possible use will be in "qcc", which requires a data frame or matrix with each row consisting of one group, using NA in groups with fewer samples.

The following code will create an example data set and also show manual conversion to the desired format.

# This is an example of the initial data that I have # * 10 sample measurements, over 3 groups with 3, 2, and 5 elements respectively x <- rnorm(10) x_df <- data.frame( time = c( rep('2001 Q1',3), rep('2001 Q2',2), rep('2001 Q3',5) ), measure = x ) x_df # This is a manual conversion into the desired format x_pad <- c( x[1:3], NA, NA, x[4:5], NA, NA, NA, x[6:10] ) x_matrix <- matrix( x_pad, nrow = 3, ncol = 5, byrow = TRUE, dimnames = list(c('2001 Q1','2001 Q2','2001 Q3')) ) x_matrix # desired format # An example of how it will be used library(qcc) plot(qcc(x_matrix, type = 'xbar', plot = FALSE)) 

So, I would like to convert this:

  time measure 1 2001 Q1 0.14680685 2 2001 Q1 0.53593193 3 2001 Q1 0.56097974 4 2001 Q2 -1.48102689 5 2001 Q2 0.18150972 6 2001 Q3 1.72018147 7 2001 Q3 -0.08480855 8 2001 Q3 -2.23208877 9 2001 Q3 -1.15269107 10 2001 Q3 0.57975023 

... to that...

  [,1] [,2] [,3] [,4] [,5] 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA 2001 Q2 -1.4810269 0.18150972 NA NA NA 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502 

There is probably a simple way (maybe some use of reshape or reshape2 changes that I am not familiar with?), But a bunch of searches have not helped me yet.

Thanks for any help!

===========

In one of the solutions below, the following will generate the final qcc xbar graph, including group labels:

 library(splitstackshape) out_df <- dcast( getanID( x_df, 'time' ), time~.id, value.var='measure' ) qcc( out_df[,-1], type = 'xbar', labels = out_df[,1] ) 
+5
source share
3 answers

You can create a sequence column ('.id') using getanID from splitstackshape and use dcast from data.table to convert long format to wide format. The output of splitstackshape is a data table. When we load splitstackshape , the data table will also be loaded. So, if you already have the version of data.table installed, then you can use dcast from data.table .

 library(splitstackshape) dcast(getanID(df1, 'time'), time~.id, value.var='measure') # time 1 2 3 4 5 #1: 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA #2: 2001 Q2 -1.4810269 0.18150972 NA NA NA #3: 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502 
+6
source

You will need an intermediate variable that gives the identifier "inside time". You can create it and change it this way.

 library(tidyr) library(dplyr) group_by(X, time) %>% mutate(seq = 1:n()) %>% ungroup() %>% spread(seq, measure) 
+6
source

Another splitstackshape approach

 cSplit(setDT(df)[, toString(measure), by='time'], 'V1', ',') # time V1_1 V1_2 V1_3 V1_4 V1_5 #1: 2001 Q1 0.1468068 0.53593193 0.5609797 NA NA #2: 2001 Q2 -1.4810269 0.18150972 NA NA NA #3: 2001 Q3 1.7201815 -0.08480855 -2.2320888 -1.152691 0.5797502 

Or using the version of devel data.table , a similar approach after gluing a β€œmeasure” over the β€œtime” grouping column will use tstrsplit to separate the β€œV1” column generated from toString(measure) .

  setDT(df)[, toString(measure), by ='time'][, c(list(time), tstrsplit(V1, ', '))] 

Alternatively, we can add type.convert=TRUE to tstrsplit to convert the class split columns. By default it is FALSE .

+5
source

All Articles