R, data.table or dplyr, decoding with long colnames formats

Imagine that I have a dataframe with column names like Mary1, Mary2, Mary3, Bob1, Bob2, Bob3, Pam1, Pam2, Pam3 etc., but with many columns.

Here is a simpler reproducible example.

set.seed(1) mydata <- data.frame() mydata <- rbind(mydata,c(1,round(runif(30),3))) mydata <- rbind(mydata,c(2,round(runif(30),3))) mydata <- rbind(mydata,c(3,round(runif(30),3))) colnames(mydata) <- c("id", paste0(rep(LETTERS[1:10], each=3), 1:3)) 

which gives:

 id A1 A2 A3 B1 B2 B3 C1 C2 C3 D1 D2 D3 E1 E2 E3 F1 F2 F3 G1 G2 G3 H1 H2 H3 I1 I2 I3 J1 J2 J3 ... 1 0.266 0.372 0.573 0.908 0.202 0.898 0.945 0.661 0.629 0.062 0.206 0.177 0.687 0.384 0.770 0.498 0.718 0.992 0.380 0.777 0.935 0.212 0.652 0.126 0.267 0.386 0.013 0.382 0.870 0.340 ... 2 0.482 0.600 0.494 0.186 0.827 0.668 0.794 0.108 0.724 0.411 0.821 0.647 0.783 0.553 0.530 0.789 0.023 0.477 0.732 0.693 0.478 0.861 0.438 0.245 0.071 0.099 0.316 0.519 0.662 0.407 ... 3 0.913 0.294 0.459 0.332 0.651 0.258 0.479 0.766 0.084 0.875 0.339 0.839 0.347 0.334 0.476 0.892 0.864 0.390 0.777 0.961 0.435 0.713 0.400 0.325 0.757 0.203 0.711 0.122 0.245 0.143 ... 

I want to get the format of a long table, for example:

 set.seed(1) mydata <- data.frame() mydata <- rbind(mydata,c(1,1,round(runif(10),3))) mydata <- rbind(mydata,c(1,2,round(runif(10),3))) mydata <- rbind(mydata,c(1,3,round(runif(10),3))) mydata <- rbind(mydata,c(2,1,round(runif(10),3))) mydata <- rbind(mydata,c(2,2,round(runif(10),3))) mydata <- rbind(mydata,c(2,3,round(runif(10),3))) colnames(mydata) <- c("id","N", LETTERS[1:10]) 

this is:

  id NABCDEFGHIJ 1 1 0.266 0.372 0.573 0.908 0.202 0.898 0.945 0.661 0.629 0.062 1 2 0.206 0.177 0.687 0.384 0.770 0.498 0.718 0.992 0.380 0.777 1 3 0.482 0.600 0.494 0.186 0.827 0.668 0.794 0.108 0.724 0.411 2 1 0.935 0.212 0.652 0.126 0.267 0.386 0.013 0.382 0.870 0.340 2 2 0.821 0.647 0.783 0.553 0.530 0.789 0.023 0.477 0.732 0.693 2 3 0.478 0.861 0.438 0.245 0.071 0.099 0.316 0.519 0.662 0.407 

How can I get it with data.table or dplyr / tidyr? or any other simple option.

If i try

 melt(mydata, id=1) 

the result is a single column.

I checked the official help and vignettes, but I can only find simpler examples, with a small number of columns, the user specifies each of them manually and one example template (), but I canโ€™t adapt it to my example.

Other threads use gsub, but this is confusing for me.

What I really want to do is a little more complicated, but I think this is the first step (I will repeat it later). Imagine that my columns are Mary1, Mary2, Bob1, Bob2, Pam1, Pam2 ... I want to create new columns with the differences from the two above: Mary1-Mary2, Bob1-Bob2, Pam1-Pam2 ...

Summing up: I do not want to write the name of all columns manually, but automatically delete them, deleting the last digit.

PD: Okay, I am updating my question. It should work not only for names such as A1, A2 ... but also for longer names such as

 colnames(mydata) <- c("id", paste0(rep(LETTERS[1:10], each=3), rep(LETTERS[1:10], each=3), 1:3)) 

I am not against speed, I look at something simple, not mysterious.

+1
r reshape
source share
3 answers

Using data.table::melt :

 require(data.table) n = unique(gsub("[0-9]+$", "", names(mydata)[-1L])) p = paste0("^", n) melt(setDT(mydata), measure=patterns(p), value.name=n, variable.name="N") # id NABCDEFGHIJ # 1: 1 1 0.266 0.908 0.945 0.062 0.687 0.498 0.380 0.212 0.267 0.382 # 2: 2 1 0.482 0.186 0.794 0.411 0.783 0.789 0.732 0.861 0.071 0.519 # 3: 3 1 0.913 0.332 0.479 0.875 0.347 0.892 0.777 0.713 0.757 0.122 # 4: 1 2 0.372 0.202 0.661 0.206 0.384 0.718 0.777 0.652 0.386 0.870 # 5: 2 2 0.600 0.827 0.108 0.821 0.553 0.023 0.693 0.438 0.099 0.662 # 6: 3 2 0.294 0.651 0.766 0.339 0.334 0.864 0.961 0.400 0.203 0.245 # 7: 1 3 0.573 0.898 0.629 0.177 0.770 0.992 0.935 0.126 0.013 0.340 # 8: 2 3 0.494 0.668 0.724 0.647 0.530 0.477 0.478 0.245 0.316 0.407 # 9: 3 3 0.459 0.258 0.084 0.839 0.476 0.390 0.435 0.325 0.711 0.143 
+5
source share
 set.seed(1) mydata <- data.frame() mydata <- rbind(mydata,c(1,round(runif(30),3))) mydata <- rbind(mydata,c(2,round(runif(30),3))) mydata <- rbind(mydata,c(3,round(runif(30),3))) colnames(mydata) <- c("id", paste0(rep(LETTERS[1:10], each=3), 1:3)) reshape(mydata, dir = 'long', varying = names(mydata)[-1], sep = '', timevar = 'N') # id NABCDEFGHIJ # 1.1 1 1 0.266 0.908 0.945 0.062 0.687 0.498 0.380 0.212 0.267 0.382 # 2.1 2 1 0.482 0.186 0.794 0.411 0.783 0.789 0.732 0.861 0.071 0.519 # 3.1 3 1 0.913 0.332 0.479 0.875 0.347 0.892 0.777 0.713 0.757 0.122 # 1.2 1 2 0.372 0.202 0.661 0.206 0.384 0.718 0.777 0.652 0.386 0.870 # 2.2 2 2 0.600 0.827 0.108 0.821 0.553 0.023 0.693 0.438 0.099 0.662 # 3.2 3 2 0.294 0.651 0.766 0.339 0.334 0.864 0.961 0.400 0.203 0.245 # 1.3 1 3 0.573 0.898 0.629 0.177 0.770 0.992 0.935 0.126 0.013 0.340 # 2.3 2 3 0.494 0.668 0.724 0.647 0.530 0.477 0.478 0.245 0.316 0.407 # 3.3 3 3 0.459 0.258 0.084 0.839 0.476 0.390 0.435 0.325 0.711 0.143 
+4
source share

Here is one solution with tidyr :

 library(tidyr) mydata %>% gather(key, value, -id) %>% separate(key, into = c('key1', 'key2'), sep = '(?<=[a-zA-Z])(?=[0-9])') %>% spread(key1, value) 

Result:

  id key2 ABCDEFGHIJ 1 1 1 0.266 0.908 0.945 0.062 0.687 0.498 0.380 0.212 0.267 0.382 2 1 2 0.372 0.202 0.661 0.206 0.384 0.718 0.777 0.652 0.386 0.870 3 1 3 0.573 0.898 0.629 0.177 0.770 0.992 0.935 0.126 0.013 0.340 4 2 1 0.482 0.186 0.794 0.411 0.783 0.789 0.732 0.861 0.071 0.519 5 2 2 0.600 0.827 0.108 0.821 0.553 0.023 0.693 0.438 0.099 0.662 6 2 3 0.494 0.668 0.724 0.647 0.530 0.477 0.478 0.245 0.316 0.407 7 3 1 0.913 0.332 0.479 0.875 0.347 0.892 0.777 0.713 0.757 0.122 8 3 2 0.294 0.651 0.766 0.339 0.334 0.864 0.961 0.400 0.203 0.245 9 3 3 0.459 0.258 0.084 0.839 0.476 0.390 0.435 0.325 0.711 0.143 
+3
source share

All Articles