Tidyr :: combine column patterns

I have a dataset that looks something like this.

site <- c("A", "B", "C", "D", "E") D01_1 <- c(1, 0, 0, 0, 1) D01_2 <- c(1, 1, 0, 1, 1) D02_1 <- c(1, 0, 1, 0, 1) D02_2 <- c(0, 1, 0, 0, 1) D03_1 <- c(1, 1, 0, 0, 0) D03_2 <- c(0, 1, 0, 0, 1) df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2) 

I am trying to combine columns D0x_1 and D0x_2 so that the values ​​in the columns are separated by a slash. I can do this with the following code, and it works fine:

 library(dplyr) library(tidyr) df.unite <- df %>% unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>% unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>% unite(D03, D03_1, D03_2, sep = "/", remove = TRUE) 

... but the problem is that this requires me to enter a couple of unite several times, and it is cumbersome for the large number of columns in my dataset. Is there a way in dplyr join through the same patterned column names and then cross the columns? unite_each does not seem to exist.

+6
source share
3 answers

Two options that are actually confused.


Option 1. The base of the bore

First, you can use lapply to apply unite_ (the standard evaluation version to which you can pass strings) programmatically through columns. To do this, you will need to create a list of names to use it, and then wrap lapply in do.call(cbind to catch the columns and cbind site back to it. Total:

 cols <- unique(substr(names(df)[-1], 1, 3)) cbind(site = df$site, do.call(cbind, lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE), sep = '/', remove = TRUE) %>% select_(x)}) )) # site D01 D02 D03 # 1 A 1/1 1/0 1/0 # 2 B 0/1 0/1 1/1 # 3 C 0/0 1/0 0/0 # 4 D 0/1 0/0 0/0 # 5 E 1/1 1/1 0/1 

Option 2: Charming Chains

Alternatively, if you really like pipes, you can hack it all in a chain ( lapply included!) lapply replacing a few basic functions for dplyr :

 df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>% lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE), sep = '/', remove = TRUE) %>% select_(x)}) %>% bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D')) # Source: local data frame [5 x 4] # # site D01 D02 D03 # (chr) (chr) (chr) (chr) # 1 A 1/1 1/0 1/0 # 2 B 0/1 0/1 1/1 # 3 C 0/0 1/0 0/0 # 4 D 0/1 0/0 0/0 # 5 E 1/1 1/1 0/1 

Check the intermediate products to see how they fit together, but this is almost the same logic as the basic approach.

+2
source

This is a solution with basic features. First, I searched for indexes *** _ 1 in columns. I also created column names for the final process using gsub() and unique() . The nozzle part inserts two columns with / . If x = 1, then x +1 = 2. Therefore, you always select two columns next to each other and process the task for insertion. Then I added site using cbind() and created a data frame. The final task is to assign column names.

 library(magrittr) ind <- grep(pattern = "1$", x = names(df)) names <- unique(gsub(pattern = "_\\d+$", replacement = "", x = names(df))) sapply(ind, function(x){ foo <- paste(df[,x], df[, x+1], sep = "/") foo }) %>% cbind(as.character(df$site), .) %>% data.frame -> out names(out) <- names # site D01 D02 D03 #1 A 1/1 1/0 1/0 #2 B 0/1 0/1 1/1 #3 C 0/0 1/0 0/0 #4 D 0/1 0/0 0/0 #5 E 1/1 1/1 0/1 
+3
source

You can also use the simple R approach:

 cols <- split(names(df)[-1], sub("_\\d+", "", names(df)[-1])) cbind(df[1], sapply(names(cols), function(col) { do.call(paste, c(df[cols[[col]]], sep = "/")) })) # site D01 D02 D03 #1 A 1/1 1/0 1/0 #2 B 0/1 0/1 1/1 #3 C 0/0 1/0 0/0 #4 D 0/1 0/0 0/0 #5 E 1/1 1/1 0/1 
0
source

All Articles