Dendrogram export to table in R

I would like to export hclust-dendrogram from R to a data table, in order to subsequently import it into other ("home-made") software. str(unclass(fit))provides an overview of the text for the dendrogram, but what I'm looking for is really a number table. I looked at the Bioconductor ctc package, but the result of its production looks somewhat mysterious. I would like to have something similar to this table: http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm Is there any way to get this from the hclust object in R?

+5
source share
2 answers

If anyone is interested in exporting dendrograms, here is my solution. Most likely, this is not the best, since I started using R only recently, but at least it works. Therefore, suggestions for improving the code are welcome.

So, if hris my hclust object, and dfis my data, the first column of which contains a simple index starting at 0, and the row names are the names of the clustered elements:

# Retrieve the leaf order (row name and its position within the leaves)
leaf.order <- matrix(data=NA, ncol=2, nrow=nrow(df),
              dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hr$labels[hr$order]
for (i in 1:nrow(leaf.order)) {
   leaf.order[which(leaf.order[,2] %in% rownames(df[i,])),1] <- df[i,1]
}
leaf.order <- as.data.frame(leaf.order)

hr.merge <- hr$merge
n <- max(df[,1])

# Re-index all clustered leaves and nodes. First, all leaves are indexed starting from 0.
# Next, all nodes are indexed starting from max. index leave + 1.
for (i in 1:length(hr.merge)) {
  if (hr.merge[i]<0) {hr.merge[i] <- abs(hr.merge[i])-1}
  else { hr.merge[i] <- (hr.merge[i]+n) }
}
node.id <- c(0:length(hr.merge))

# Generate dendrogram matrix with node index in the first column.
dend <- matrix(data=NA, nrow=length(node.id), ncol=6,
           dimnames=list(c(0:(length(node.id)-1)),
              c("node.id", "parent.id", "pruning.level",
              "height", "leaf.order", "row.name")) )
dend[,1] <- c(0:((2*nrow(df))-2))  # Insert a leaf/node index

# Calculate parent ID for each leaf/node:
# 1) For each leaf/node index, find the corresponding row number within the merge-table.
# 2) Add the maximum leaf index to the row number as indexing the nodes starts after indexing all the leaves.
for (i in 1:(nrow(dend)-1)) {
  dend[i,2] <- row(hr.merge)[which(hr.merge %in% dend[i,1])]+n
}

# Generate table with indexing of all leaves (1st column) and inserting the corresponding row names into the 3rd column.
hr.order <- matrix(data=NA,
           nrow=length(hr$labels), ncol=3,
           dimnames=list(c(), c("order.number", "leaf.id", "row.name")))
hr.order[,1] <- c(0:(nrow(hr.order)-1))
hr.order[,3] <- t(hr$labels[hr$order])
hr.order <- data.frame(hr.order)
hr.order[,1] <- as.numeric(hr.order[,1])

# Assign the row name to each leaf.
dend <- as.data.frame(dend)
for (i in 1:nrow(df)) {
      dend[which(dend[,1] %in% df[i,1]),6] <- rownames(df[i,])
}

# Assign the position on the dendrogram (from left to right) to each leaf.
for (i in 1:nrow(hr.order)) {
      dend[which(dend[,6] %in% hr.order[i,3]),5] <- hr.order[i,1]-1
}

# Insert height for each node.
dend[c((n+2):nrow(dend)),4] <- hr$height

# All leaves get the highest possible pruning level
dend[which(dend[,1] <= n),3] <- nrow(hr.merge)

# The nodes get a decreasing index starting from the pruning level of the
# leaves minus 1 and up to 0

for (i in (n+2):nrow(dend)) {
   if ((dend[i,4] != dend[(i-1),4]) || is.na(dend[(i-1),4])){
        dend[i,3] <- dend[(i-1),3]-1}
      else { dend[i,3] <- dend[(i-1),3] }
}
dend[,3] <- dend[,3]-min(dend[,3])

dend <- dend[order(-node.id),]

# Write results table.
write.table(dend, file="path", sep=";", row.names=F)
+3
source

There is a package that exactly matches what you want - Labeltodendro ; -)

But seriously, can not you just manually remove items from the object hclust(eg, $merge, $height, $order), and create your own table of elements extracted?

+1
source

All Articles