I have data from an online survey where respondents go through a series of questions 1-3 times. The survey software (Qualtrics) writes this data into multiple columns, i.e. Q3.2 in the survey will have Q3.2.1. columns Q3.2.1. Q3.2.2. and Q3.2.3. :
df <- data.frame( id = 1:10, time = as.Date('2009-01-01') + 0:9, Q3.2.1. = rnorm(10, 0, 1), Q3.2.2. = rnorm(10, 0, 1), Q3.2.3. = rnorm(10, 0, 1), Q3.3.1. = rnorm(10, 0, 1), Q3.3.2. = rnorm(10, 0, 1), Q3.3.3. = rnorm(10, 0, 1) ) # Sample data id time Q3.2.1. Q3.2.2. Q3.2.3. Q3.3.1. Q3.3.2. Q3.3.3. 1 1 2009-01-01 -0.2059165 -0.29177677 -0.7107192 1.52718069 -0.4484351 -1.21550600 2 2 2009-01-02 -0.1981136 -1.19813815 1.1750200 -0.40380049 -1.8376094 1.03588482 3 3 2009-01-03 0.3514795 -0.27425539 1.1171712 -1.02641801 -2.0646661 -0.35353058 ...
I want to combine all the columns of QN.N * into neat separate columns of QN.N, eventually getting something like this:
id time loop_number Q3.2 Q3.3 1 1 2009-01-01 1 -0.20591649 1.52718069 2 2 2009-01-02 1 -0.19811357 -0.40380049 3 3 2009-01-03 1 0.35147949 -1.02641801 ... 11 1 2009-01-01 2 -0.29177677 -0.4484351 12 2 2009-01-02 2 -1.19813815 -1.8376094 13 3 2009-01-03 2 -0.27425539 -2.0646661 ... 21 1 2009-01-01 3 -0.71071921 -1.21550600 22 2 2009-01-02 3 1.17501999 1.03588482 23 3 2009-01-03 3 1.11717121 -0.35353058 ...
The tidyr library has a gather() function, which is great for combining one set of columns:
library(dplyr) library(tidyr) library(stringr) df %>% gather(loop_number, Q3.2, starts_with("Q3.2")) %>% mutate(loop_number = str_sub(loop_number,-2,-2)) %>% select(id, time, loop_number, Q3.2) id time loop_number Q3.2 1 1 2009-01-01 1 -0.20591649 2 2 2009-01-02 1 -0.19811357 3 3 2009-01-03 1 0.35147949 ... 29 9 2009-01-09 3 -0.58581232 30 10 2009-01-10 3 -2.33393981
The resulting data frame has 30 rows, as expected (10 individuals, 3 loops each). However, collecting the second set of columns does not work correctly - it successfully makes two combined columns Q3.2 and Q3.3 , but ends in 90 rows instead of 30 (all combinations of 10 people, 3 cycles of Q3.2, and 3 cycles of Q3.3, the combinations will be increase significantly for each group of columns in the actual data):
df %>% gather(loop_number, Q3.2, starts_with("Q3.2")) %>% gather(loop_number, Q3.3, starts_with("Q3.3")) %>% mutate(loop_number = str_sub(loop_number,-2,-2)) id time loop_number Q3.2 Q3.3 1 1 2009-01-01 1 -0.20591649 1.52718069 2 2 2009-01-02 1 -0.19811357 -0.40380049 3 3 2009-01-03 1 0.35147949 -1.02641801 ... 89 9 2009-01-09 3 -0.58581232 -0.13187024 90 10 2009-01-10 3 -2.33393981 -0.48502131
Is there a way to use multiple calls to gather() like this, combining small subsets of columns like this, while maintaining the correct number of rows?