Matlab: K-Means Clustering

I have a matrix A (369x10) that I want to group into 19 clusters. I use this method

[idx ctrs]=kmeans(A,19) 

which gives idx (369x1) and ctrs (19x10)

I get the point to here. All my rows are grouped in 19 clusters.

Now I have an array B (49x10). I want to know where the rows of this B correspond in the specified 19 clusters.

How is this possible in MATLAB?

Thank you in advance

+7
matlab machine-learning cluster-analysis k-means
source share
5 answers

I can't think of a better way to do this than what you described. The built-in function will save one line, but I could not find it. Here is the code I would use:

 [ids ctrs]=kmeans(A,19); D = dist([testpoint;ctrs]); %testpoint is 1x10 and D will be 20x20 [distance testpointID] = min(D(1,2:end)); 
+4
source share

The following is a complete example of clustering:

 %% generate sample data K = 3; numObservarations = 100; dimensions = 3; data = rand([numObservarations dimensions]); %% cluster opts = statset('MaxIter', 500, 'Display', 'iter'); [clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ... 'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3); %% plot data+clusters figure, hold on scatter3(data(:,1),data(:,2),data(:,3), 50, clustIDX, 'filled') scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 200, (1:K)', 'filled') hold off, xlabel('x'), ylabel('y'), zlabel('z') %% plot clusters quality figure [silh,h] = silhouette(data, clustIDX); avrgScore = mean(silh); %% Assign data to clusters % calculate distance (squared) of all instances to each cluster centroid D = zeros(numObservarations, K); % init distances for k=1:K %d = sum((xy).^2).^0.5 D(:,k) = sum( ((data - repmat(clusters(k,:),numObservarations,1)).^2), 2); end % find for all instances the cluster closet to it [minDists, clusterIndices] = min(D, [], 2); % compare it with what you expect it to be sum(clusterIndices == clustIDX) 
+11
source share

I don’t know if I understand your meaning correctly, but if you want to know which cluster belongs to your points, you can easily use the KnnSearch function. It has two arguments and will look in the first argument for the first one, which is closest to the two argument.

+2
source share

Assuming you are using the squared Euclidean distance metric, try the following:

 for i = 1:size(ctrs,2) d(:,i) = sum((B-ctrs(repmat(i,size(B,1),1),:)).^2,2); end [distances,predicted] = min(d,[],2) 

then it should contain the index of the nearest center of gravity, and the distances should contain the distances to the nearest center of gravity.

Take a look inside the kmeans function, in the distfun subfunction. This will show you how to do this, and also contains equivalents for other distance metrics.

+1
source share

for a small amount of data you can do

 [testpointID,dum] = find(permute(all(bsxfun(@eq,B,permute(ctrs,[3,2,1])),2),[3,1,2])) 

but it is somewhat obscure; bsxfun with permutation ctrs creates a 49 x 10 x 19 logical element array, which is then “all-edited” in the second dimension, rebuilt, and then line identifiers are detected. again, probably impractical for large amounts of data.

0
source share

All Articles