Matlab: Kmeans gives different results every time.

I run kmeans in matlab on a 400x1000 matrix and for some reason, whenever I run the algorithm, I get different results. The following is sample code:

[idx, ~, ~, ~] = kmeans(factor_matrix, 10, 'dist','sqeuclidean','replicates',20); 

For some reason, every time I run this code, I get different results? any ideas?

I use it to identify problems with multiple collinearity.

Thanks for the help!

+8
matlab k-means feature-selection
source share
3 answers

The implementation of k-means in MATLAB has a randomized component: the choice of initial centers. This leads to different results. In practice, however, MATLAB runs k-means several times and returns you the lowest distortion clustering. If you meet different clusters each time, this may mean that your data does not lend itself to the appearance of clusters (spherical) that are looking for a k-tool, and are an indication for attempts at other clustering algorithms (for example, spectral ones).

You can get deterministic behavior by passing it the initial set of centers as one of the function arguments ( start parameter). This will give you the same output clustering every time. There are several heuristics for selecting an initial set of centers (for example, K-means ++ ).

+18
source share

As you can read on the wiki , k-mean algorithms are usually heuristic and partially probabilistic, and Matlab is no exception.

This means that there is a certain random part for the algorithm (in the case of Matlab , repeatedly using random starting points to find a global solution), This makes kmeans output clusters that have good average quality. But: given the pseudo-random nature of the algorithm, you will receive slightly different clusters every time - this is normal behavior.

+6
source share

This is called an initialization problem, because kmeans starts with random iniinital points to cluster your data. matlab selects k random points and calculates the distance of the points in your data to these places and finds new centroids to further minimize the distance. so you can get different results for centroid locations, but the answer is similar.

+2
source share

All Articles