How to “flatten” or “collapse” a 2D data frame into a 1D data frame in R?

I have a two-dimensional table with distances in data.frame in R (imported from csv):

CP000036 CP001063 CP001368 CP000036 0 ab CP001063 a 0 c CP001368 bc 0 

I would like to smooth it out. that I have one axis value in the first column, and another axis value in the second column, and then the distance in the third column:

 Genome1 Genome2 Dist CP000036 CP001063 a CP000036 CP001368 b CP001063 CP001368 c 

The above is ideal, but it would be quite normal to repeat so that each cell in the input matrix has its own row:

 Genome1 Genome2 Dist CP000036 CP000036 0 CP000036 CP001063 a CP000036 CP001368 b CP001063 CP000036 a CP001063 CP001063 0 CP001063 CP001368 c CP001368 CP000036 b CP001368 CP001063 c CP001368 CP001368 0 

Here is an example of a 3x3 matrix, but my data set is much larger (about 2000x2000). I would do this in Excel, but I need ~ 3 million rows to output, while the maximum Excel is ~ 1 million.

This question is very similar to "How to" collapse "or" collapse "a 2D Excel spreadsheet in 1D?" 1

+4
source share
1 answer

So this is one solution using melt from the reshape2 package:

 dm <- data.frame( CP000036 = c( "0", "a", "b" ), CP001063 = c( "a", "0", "c" ), CP001368 = c( "b", "c", "0" ), stringsAsFactors = FALSE, row.names = c( "CP000036", "CP001063", "CP001368" ) ) # assuming the distance follows a metric we avoid everything below and on the diagonal dm[ lower.tri( dm, diag = TRUE ) ] <- NA dm$Genome1 <- rownames( dm ) # finally melt and avoid the entries below the diagonal with na.rm = TRUE library(reshape2) dm.molten <- melt( dm, na.rm= TRUE, id.vars="Genome1", value.name="Dist", variable.name="Genome2" ) print( dm.molten ) Genome1 Genome2 Dist 4 CP000036 CP001063 a 7 CP000036 CP001368 b 8 CP001063 CP001368 c 

There are probably more efficient solutions, but I like this one because it is simple and simple.

+3
source

All Articles