Merge (opposite to split) a pair of lines in r

Question

Merge (opposite to split) a pair of lines in r

I have a column as shown below. Each column has two pairs with the suffixes "a" and "b" - for example, col1a, col1b, colNa, colNb, etc. To the end of the file (> 50,000).

mydataf <- data.frame (Ind = 1:5, col1a = sample (c(1:3), 5, replace = T), col1b = sample (c(1:3), 5, replace = T), colNa = sample (c(1:3), 5, replace = T), colNb = sample (c(1:3),5, replace = T), K_a = sample (c("A", "B"),5, replace = T), K_b = sample (c("A", "B"),5, replace = T)) mydataf Ind col1a col1b colNa colNb K_a K_b 1 1 1 1 2 3 BA 2 2 1 3 2 2 BB 3 3 2 1 1 1 BB 4 4 3 1 1 3 AB 5 5 1 1 3 2 BA

With the exception of the first column (Ind), I want to collapse a couple of lines to make the dataframe look like the following: at the time, the suffix "a" and "b" will be deleted. Also, the combined characters or number must be ordered 1 first, that 2, A first, than B

  Ind col1 colN K_ 1 11 23 AB 2 13 22 BB 3 12 11 BB 4 13 13 AB 5 11 23 AB

Edit: the grep function (possibly) in the answer has a problem if the column name is similar.

 mydataf <- data.frame (col_1_a = sample (c(1:3), 5, replace = T), col_1_b = sample (c(1:3), 5, replace = T), col_1_Na = sample (c(1:3), 5, replace = T), col_1_Nb = sample (c(1:3),5, replace = T), K_a = sample (c("A", "B"),5, replace = T), K_b = sample (c("A", "B"),5, replace = T)) n <- names(mydataf) nm <- c(unique(substr(n, 1, nchar(n)-1))) df <- data.frame(sapply(nm, function(x){ idx <- grep(x, n) cols <- mydataf[idx] x <- apply(cols, 1, function(z) paste(sort(z), collapse = "")) return(x) })) names(df) <- nm df col_1_ col_1_N K_ 1 2233 23 BB 2 2233 22 BB 3 1123 13 AB 4 1223 12 AB 5 2333 33 AB

+7

merge r collapse

shNIL Jul 27 '12 at 20:54

source share

1 answer

Julius · Accepted Answer · 2012-07-27T21:52:42+0000

 mydataf Ind col1a col1b colNa colNb K_a K_b 1 1 2 1 1 1 AA 2 2 1 2 1 3 BA 3 3 1 2 3 2 AA 4 4 1 2 3 1 AB 5 5 1 2 2 1 AA n <- names(mydataf) nm <- c("Ind", unique(substr(n, 1, nchar(n)-1)[-1])) df <- data.frame(sapply(nm, function(x){ idx <- grep(paste0(x, "[ab]?$"), n) cols <- mydataf[idx] x <- apply(cols, 1, function(z) paste(sort(z), collapse = "")) return(x) })) names(df) <- nm df Ind col1 colN K_ 1 1 12 11 AA 2 2 12 13 AB 3 3 12 23 AA 4 4 12 13 AB 5 5 12 12 AA

Merge (opposite to split) a pair of lines in r

More articles: