Questions about Clustering Methods

Recently I came to study clustering in the field of data mining, and I studied sequential clustering and hierarchical clustering and k-tools.

I also read about the statement that distinguishes the k-tool from the other two clustering methods, saying that the k-tool does not handle nominal attributes very well, but the text does not explain this point. the difference that I see is that for K-means we will know in advance that we will need exactly K clusters, until we know how many clusters we need for the other two clustering methods.

So can anyone give me some idea of ​​why such a statement exists, i.e. Does a k-tool have this problem when considering examples of nominal attributes, and is there a way to overcome this?

Thanks in advance.

+6
artificial-intelligence machine-learning neural-network data-mining
source share
1 answer

The k-means algorithm calculates the centroids of the cluster, taking the average values ​​of all points in the cluster. If the parameter is nominal, you cannot accept the average value.

Sometimes nominal values ​​can be entered into a kind of order, and then compared with real values. For example, days of the week can be displayed in the range [1.0-7.0], but sometimes it is sometimes impossible, for example, an attribute with the values ​​[Windows, Linux, OSX].

+5
source share

All Articles