Converting ndarray generated by hcluster to Newick string for use with ete2 package

I have a list of vectors created at startup:

import hcluster
import numpy as np
from ete2 import Tree

vecs = [np.array(i) for i in document_list] 

where document_list is a collection of web documents that I am analyzing. Then I do hierarchical clustering:

Z = hcluster.linkage(vecs, metric='cosine') 

This generates ndarray, for example:

[[ 12.          19.           0.           1.        ]
[ 15.          21.           0.           3.        ]
[ 18.          22.           0.           4.        ]
[  3.          16.           0.           7.        ]
[  8.          23.           0.           6.        ]
[  5.          27.           0.           6.        ]
[  1.          28.           0.           7.        ]
[  0.          21.           0.           2.        ]
[  5.          29.           0.18350472   2.        ]
[  2.          10.           0.18350472   3.        ]
[ 47.          30.           0.29289577   9.        ]
[ 13.          28.           0.29289577  13.        ]
[ 73.          32.           0.29289577  18.        ]
[ 26.          12.           0.42264521   5.        ]
[  5.          33.           0.42264521  12.        ]
[ 14.          35.           0.42264521  12.        ]
[ 19.          35.           0.42264521  18.        ]
[  4.          20.           0.31174826   3.        ]
[ 34.          21.           0.5         19.        ]
[ 38.          29.           0.31174826  21.        ]]

Is it possible to convert this ndarray to a new line that can be passed to the ete2 Tree () constructor so that I can draw and manipulate the new tree using the tools provided by ete2?

Does it even make sense to try and do it, and if there is no other way that I can generate a tree / dendrogram using the same data and ete2 (I understand that there are other packages that can draw dendrograms like dendropy and hcluster, but would prefer to use ete2 anyway)?

Thank!

+5
2

:

from hcluster import linkage, to_tree
from ete2 import Tree

#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")
T = to_tree(Z)

#ete2 section
root = Tree()
root.dist = 0
root.name = "root"
item2node = {T: root}

to_visit = [T]
while to_visit:
    node = to_visit.pop()
    cl_dist = node.dist /2.0
    for ch_node in [node.left, node.right]:
        if ch_node:
            ch = Tree()
            ch.dist = cl_dist
            ch.name = str(ch_node.id)
            item2node[node].add_child(ch)
            item2node[ch_node] = ch
            to_visit.append(ch_node)

# This is your ETE tree structure
tree = root
+3

:

from hcluster import linkage, to_tree
from ete2 import Tree

#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")

R,T       = to_tree( mat, rd=True )
#print "ROOT", R, "TREE", T
root      = Tree()
root.dist = 0
root.name = 'root'
item2node = {R.get_id(): root}
to_visit  = T

while to_visit:
    node = to_visit.pop()
    #print "NODE", node
    cl_dist = node.dist / 2.0

    for ch_node in [node.get_left(), node.get_right()]:
        if ch_node:
            ch_node_id         = ch_node.get_id()
            ch_node_name       = str(ch_node_id)
            ch                 = Tree()
            ch.dist            = cl_dist
            ch.name            = ch_node_name

            if nodeNames:
                if ch_node_id < len(nodeNames):
                    ch.name    = nodeNames[ ch_node_id ]

            item2node[ch_node_id] = ch
            item2node[ch_node_id].add_child(ch)
            to_visit.append(ch_node)
0

All Articles