Sklearn implements one agglomeration clustering algorithm, the arrival method minimizes dispersion. Sklearn is usually documented with many good use cases, but I could not find examples of how to use this function.
Basically my problem is to draw the dendrogram according to the clustering of my data, but I do not understand the result of the work. The documentation says that it returns the children, the number of components, the number of sheets and parents of each node.
However, for my sample data, the results are not meaningful. For the matrix (32 542), which was grouped with the connectivity matrix, this is the output:
>>> wt = ward_tree(mymat, connectivity=connectivity, n_clusters=2) >>> mymat.shape (32, 542) >>> wt (array([[16, 0], [17, 1], [18, 2], [19, 3], [20, 4], [21, 5], [22, 6], [23, 7], [24, 8], [25, 9], [26, 10], [27, 11], [28, 12], [29, 13], [30, 14], [31, 15], [34, 33], [47, 46], [41, 40], [36, 35], [45, 44], [48, 32], [50, 42], [38, 37], [52, 43], [54, 39], [53, 51], [58, 55], [56, 49], [60, 57]]), 1, 32, array([32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 53, 48, 48, 51, 51, 55, 55, 57, 50, 50, 54, 56, 52, 52, 49, 49, 53, 60, 54, 58, 56, 58, 57, 59, 60, 61, 59, 59, 61, 61]))
In this case, I asked for two clusters with 32 vectors containing functions. But how are two clusters visible in the data? Where are they? And what do children really mean, how can children be taller than the total number of samples?