R: combining multiple columns into one by group (twice in the same frame)

Question

R: combining multiple columns into one by group (twice in the same frame)

So I have a dataframe that looks like this:

Day HL.Average D.Average LL.Average noHKB.Average HL.SD D.SD LL.SD noHKB.SD 1 0.00000 8760 8900 10000 8030 2337.844 924.2742 1120.785 1592.646 2 1.90625 13300 11900 12100 3860 1016.291 2308.2661 3581.763 1031.057 3 3.00000 14500 7320 12300 1750 2945.098 1308.0389 4338.897 1793.583 4 4.00000 16200 9160 15100 2710 1006.893 514.2177 4362.261 2691.648

And I'm trying to organize it in this:

(The numbers in this table are slightly different, because I made this table in excel instead ...)

 Day Group Average SD 0 HL 8.76E+03 2337.843612 2 HL 1.33E+04 1016.291341 3 HL 1.45E+04 2945.098059 4 HL 1.62E+04 1006.892976 0 LL 1.00E+04 1120.785009 2 LL 1.21E+04 3581.762692 3 LL 1.23E+04 4338.897375 4 LL 1.51E+04 4362.260616 0 D 8.90E+03 924.2741667 2 D 1.19E+04 2308.266118 3 D 7.32E+03 1308.038881 4 D 9.16E+03 514.2176747

I thought about using stack () and then merge (), but the problem is that my original "xx.Average" and "xx.SD" headers complicate the resulting ind from the stack ...

Does anyone know of a command / method for easily organizing data?

Thank you very much!

+5

r

Fenrir Aug 4 '15 at 12:49

source share

4 answers

Here is my awkward decision. It works, but for the operator package %>% some package is required ( dplyr , but this is only a personal choice.). It has a melt function from the reshape2 package, separate from tidyr to separate columns (e.g. LL.Average in LL and Average ) and a dcast from reshape2 to distribution with the formula. Please note that I have not changed noHKB , but this can be done with an easy step, it is just a personal choice where to do it.

 library(reshape2) library(dplyr) library(tidyr) df %>% melt(id.vars = "Day") %>% separate(variable, c("Group", "Measure"), sep = "\\.") %>% dcast(...~ Measure) -> df1 df1 Day Group Average SD 1 0.00000 D 8900 924.2742 2 0.00000 HL 8760 2337.8440 3 0.00000 LL 10000 1120.7850 4 0.00000 noHKB 8030 1592.6460 5 1.90625 D 11900 2308.2661 6 1.90625 HL 13300 1016.2910 7 1.90625 LL 12100 3581.7630 8 1.90625 noHKB 3860 1031.0570 9 3.00000 D 7320 1308.0389 10 3.00000 HL 14500 2945.0980 11 3.00000 LL 12300 4338.8970 12 3.00000 noHKB 1750 1793.5830 13 4.00000 D 9160 514.2177 14 4.00000 HL 16200 1006.8930 15 4.00000 LL 15100 4362.2610 16 4.00000 noHKB 2710 2691.6480

+3

Sabdem Aug 4 '15 at 1:13

source share

You can also use the version of devel data.table ie v1.9.5 , which can take multiple measure columns.

We get the unique column names of the dataset that are not Day columns, and remove the part of the suffix that begins with . . Convert 'data.frame' to 'data.table' ( setDT(df1) ). Use melt and specify regex patterns in the measure argument. Other options include changing the column names "value" and "variable" ( value.name , variable.name ). By default, the Group column will be a numerical index that can be replaced with unique names from an earlier stage.

 library(data.table)#v1.9.5+ nm1 <- unique(sub('\\..*', '', colnames(df1)[-1])) melt(setDT(df1), measure=patterns('Average$', 'SD$'), value.name=c('Average', 'SD'), variable.name='Group')[, Group:= nm1[Group]][] # Day Group Average SD # 1: 0.00000 HL 8760 2337.8440 # 2: 1.90625 HL 13300 1016.2910 # 3: 3.00000 HL 14500 2945.0980 # 4: 4.00000 HL 16200 1006.8930 # 5: 0.00000 D 8900 924.2742 # 6: 1.90625 D 11900 2308.2661 # 7: 3.00000 D 7320 1308.0389 # 8: 4.00000 D 9160 514.2177 # 9: 0.00000 LL 10000 1120.7850 #10: 1.90625 LL 12100 3581.7630 #11: 3.00000 LL 12300 4338.8970 #12: 4.00000 LL 15100 4362.2610 #13: 0.00000 noHKB 8030 1592.6460 #14: 1.90625 noHKB 3860 1031.0570 #15: 3.00000 noHKB 1750 1793.5830 #16: 4.00000 noHKB 2710 2691.6480

Installation instructions for the devel version: here

data

 df1 <- structure(list(Day = c(0, 1.90625, 3, 4), HL.Average = c(8760L, 13300L, 14500L, 16200L), D.Average = c(8900L, 11900L, 7320L, 9160L), LL.Average = c(10000L, 12100L, 12300L, 15100L), noHKB.Average = c(8030L, 3860L, 1750L, 2710L), HL.SD = c(2337.844, 1016.291, 2945.098, 1006.893), D.SD = c(924.2742, 2308.2661, 1308.0389, 514.2177), LL.SD = c(1120.785, 3581.763, 4338.897, 4362.261), noHKB.SD = c(1592.646, 1031.057, 1793.583, 2691.648)), .Names = c("Day", "HL.Average", "D.Average", "LL.Average", "noHKB.Average", "HL.SD", "D.SD", "LL.SD", "noHKB.SD"), class = "data.frame", row.names = c(NA, -4L))

+2

akrun Aug 4 '15 at 6:48

source share

Here is another way to use gather , separate and spread from tidyr

 library(dplyr) library(tidyr) df %>% gather(key, value, -Day) %>% # gather all columns except "Day" into "key" and "value" columns separate(key, c("Group", "tmp")) %>% # separate the "key" column using "." into "Group", "tmp" spread(tmp, value) # change "value" column from long to wide format using "tmp" # Day Group Average SD #1 0.00000 D 8900 924.2742 #2 0.00000 HL 8760 2337.8440 #3 0.00000 LL 10000 1120.7850 #4 0.00000 noHKB 8030 1592.6460 #5 1.90625 D 11900 2308.2661 #6 1.90625 HL 13300 1016.2910 #7 1.90625 LL 12100 3581.7630 #8 1.90625 noHKB 3860 1031.0570 #9 3.00000 D 7320 1308.0389 #10 3.00000 HL 14500 2945.0980 #11 3.00000 LL 12300 4338.8970 #12 3.00000 noHKB 1750 1793.5830 #13 4.00000 D 9160 514.2177 #14 4.00000 HL 16200 1006.8930 #15 4.00000 LL 15100 4362.2610 #16 4.00000 noHKB 2710 2691.6480

+1

Veerendra gadekar Aug 4 '15 at 9:09

source share

Scott ritchie · Accepted Answer · 2015-08-04T01:25:08+0000

@ ananda-mahto splitstackshape package is designed for this type of problem!

Unfortunately, he expects variable names to be prefixes, not postfixes, so we will need to change the variable names first

 library(splitstackshape) # reverse variable names: names(df) <- sapply(strsplit(names(df), "\\."), function(x) paste(rev(x), collapse=".")) # Reshape the data frame: newdf <- merged.stack(df, id.vars=c("Day"), var.stubs=c("Average", "SD"), sep=".") setnames(newdf, ".time_1", "Group") newdf ## Day Group Average SD ## 1: 0.00000 D 8900 924.2742 ## 2: 0.00000 HL 8760 2337.8440 ## 3: 0.00000 LL 10000 1120.7850 ## 4: 0.00000 noHKB 8030 1592.6460 ## 5: 1.90625 D 11900 2308.2661 ## 6: 1.90625 HL 13300 1016.2910 ## 7: 1.90625 LL 12100 3581.7630 ## 8: 1.90625 noHKB 3860 1031.0570 ## 9: 3.00000 D 7320 1308.0389 ## 10: 3.00000 HL 14500 2945.0980 ## 11: 3.00000 LL 12300 4338.8970 ## 12: 3.00000 noHKB 1750 1793.5830 ## 13: 4.00000 D 9160 514.2177 ## 14: 4.00000 HL 16200 1006.8930 ## 15: 4.00000 LL 15100 4362.2610 ## 16: 4.00000 noHKB 2710 2691.6480

R: combining multiple columns into one by group (twice in the same frame)

data

More articles: