Combining tabs of delim files into one file using R

Question

Combining tabs of delim files into one file using R

I have several txt files with 3 columns in each file, for example: file 1:

ProbeID X_Signal_intensity X_P-Value xxx 2.34 .89 xxx 6.45 .04 xxx 1.09 .91 xxx 5.87 .70 . . . . . . . . .

file 2:

 ProbeID Y_Signal_intensity Y_P-Value xxx 1.4 .92 xxx 2.55 .14 xxx 4.19 .16 xxx 3.47 .80 . . . . . . . . .

file 3:

 ProbeID Z_Signal_intensity Z_P-Value xxx 9.40 .82 xxx 1.55 .04 xxx 3.19 .56 xxx 2.47 .90 . . . . . . . . .

In all of the above files, the values of the ProbeID column are identical, but not the other columns. Now I want to merge all of the above files using for-loop into one file as follows:

 ProbeID X_intensity X_P-Value Y_intensity Y_P-Value Z_intensity Z_P-Value xxx 2.34 .89 1.4 .92 9.40 .82 xxx 6.45 .04 2.55 .14 1.55 .04 xxx 1.09 .91 4.19 .16 3.19 .56 xxx 5.87 .70 3.47 .80 2.47 .90

Please help me.

+4

merge r dataframe read.table

Dinesh Aug 4 '11 at 13:53

source share

4 answers

Read in the files pointed out by Richie Cotton, but be sure to add additional arguments to the application call. For example, add header=TRUE .

 file.names <- c("file X.txt", "file Y.txt", "file Z.txt") file.list <- lapply(file.names, read.table, header=TRUE)

Then you might need merge_recurse from the reshape package :

 require(reshape) mynewframe <- merge_recurse(file.list,all.x=TRUE,all.y=TRUE,by="ProbeID")

This will work for any given number of data blocks, if not a billion of them. See the ?merge help page for more information on the arguments used.

CORRECTION: in merge_recurse you need to use all.x and all.y , as shown in the above correction. You cannot just use the shortcut all or you will get errors.

Little demo:

 X2 <- data.frame(ProbeID=(2:4),Z2=4:6) X1 <- data.frame(ProbeID=1:3,Z1=1:3) X3 <- data.frame(ProbeID=1:3,Z3=7:9) file.list <- list(X1,X2,X3) mynewframe <- merge_recurse(file.list,all.x=TRUE,all.y=TRUE,by="ProbeID") > mynewframe ProbeID Z1 Z2 Z3 1 1 1 NA 7 2 2 2 4 8 3 3 3 5 9 4 4 NA 6 NA

+4

Joris meys Aug 4 '11 at 14:05

source share

Read in your files

 filenames <- c("file X.txt", "file Y.txt", "file Z.txt") data_list <- lapply(filenames, read.table)

Combine them into one big data frame

~~all_data <- do.call (cbind, data_list)~~

~~all_data <- do.call (merge, data_list, by = "ProbeID")~~

This gives a good lesson to "always concentrate on providing an answer." cbind not smart enough to match identifiers, and merge not smart enough to handle more than two frames of data. Take a look at Joris answer and use merge_recurse instead. Or forget what you think you wanted and use my other answer below.

In fact, the best idea, rather than having a large number of columns, should consist of only 4 columns: ProbeID, Signal_intensity, P_value, and Source_file.

 data_list <- lapply(data_list, function(x) { colnames(x) <- c("ProbeID", "Signal_intensity", "P_value") x }) all_data <- do.call(rbind, data_list) all_data$Source_file <- rep(filenames, times = sapply(data_list, nrow))

+2

Richie cotton Aug 4 '11 at 13:57

source share

I'm going to add another approach to the mix that uses Reduce

 Reduce(function(...) merge(..., all = T), file.list)

0

Ramnath Aug 4 '11 at 19:19

source share

Sarah west · Accepted Answer · 2011-08-04T13:59:45+0000

My approach is to read files in data.frames

see help(read.delim) for read modes.

After you have three data.frames files, you can use

 total <- merge(dataframeA,dataframeB,by="ProbeID")

see http://www.statmethods.net/management/merging.html for documentation.

Combining tabs of delim files into one file using R

More articles: