Why does Kruskal clustering generate suboptimal classes?

Question

Why does Kruskal clustering generate suboptimal classes?

I tried to develop a clustering algorithm whose task was to find the classes k in a set of two-dimensional points (with k indicated as input) using the Kruskal algorithm, slightly modified to find k spanning trees instead of one.

I compared my result with the proposed optimal (1) using the rand index, which at k = 7 was 95.5%. The comparison can be seen on the link below.

Problem:

The set contains 5 clearly spaced clusters that are easily classified according to the algorithm, but the results are rather disappointing for k> 5, and this is when things start to get complicated. I believe my algorithm is correct, and maybe the data is especially bad for Kruskal's approach. It is known that single-bond agglomeration clustering, for example, Kruskal's, does not cope well with some problems, since it reduces the assessment of cluster quality to the only similarity between the two points.

The idea of the algorithm is very simple:

Make a complete graph with a dataset, with the weight of the ribs being the Euclidean distance between the pair.
Sort the list of edges by weight.
For each edge (in order), add it to the spanning forest if it does not form a cycle. Stop when all edges have been covered or when there are k trees in the remaining forest.

enter image description here

Bottomline: ? ? , ? Kruskal?

(1): Gionis, A., H. Mannila P. Tsaparas, Clustering aggregation. ACM- (TKDD), 2007.1 (1): .1-30.

+4

algorithm tree cluster-analysis minimum-spanning-tree kruskals-algorithm

rgcalsaverini 05 . '13 3:47

3

, , , . - , .

-, - - , .

+1

mcdowella 05 . '13 5:50

, , .

0

Bytemain 05 . '13 5:37

Anony-Mousse · Accepted Answer · 2013-12-06T09:36:14+0000

.

Kruskal . " " O(n^3), O(n^2 log n) - n^2.

, SLINK O(n^2) O(n) .

, . ELKI .

bette, ( O(n^3) runtime) , DBSCAN ( O(n^2) O(n log n) ). epsilon=2 minPts=5 .

Why does Kruskal clustering generate suboptimal classes?

More articles: