I have this table (data1) with four columns
SNP rs6576700 rs17054099 rs7730126 sample1 GG TT GG
I need to split 2-4 columns into two columns each, so the new output has 7 columns. Like this:
SNP rs6576700 rs6576700 rs17054099 rs17054099 rs7730126 rs7730126 sample1 GGTTCC
With the following function, I could split all the columns at a time, but the result does not match me.
split <- function(x){ x <- as.character(x) strsplit(as.character(x), split="-") } data2=apply(data1[,-1], 2, split) data2 $rs17054099 $rs17054099[[1]] [1] "T" "T" $rs7730126 $rs7730126[[1]] [1] "G" "G" $rs6576700 $rs6576700[[1]] [1] "C" "C"
In the stack overflow, I found a method for converting strsplit output to dataframe, but rs numbers are not in columns in rows (I got similar output with other methods in this strsplit by row thread and distribute the results by column in data.frame )
> n <- max(sapply(data2, length)) > l <- lapply(data2, function(X) c(X, rep(NA, n - length(X)))) > data.frame(t(do.call(cbind, l))) t.do.call.cbind..l.. rs17054099 T, T rs7730126 G, G rs2061700 C, C
If I do not use the transpose (... (t (do.call ...) function, the output is a list that I cannot write to the file.
I would like to have a solution in R to make it part of the pipeline.
I forgot to say that I need to apply this to a million columns.