Sort a vector based on a unique frequency of values

I am grouping rows of an NxM matrix using kmeans .

 clustIdx = kmeans(data, N_CLUST, 'EmptyAction', 'drop'); 

Then I rebuild the rows of my matrix so that the adjacent rows are in the same cluster

 dataClustered = data(clustIdx,:); 

However, every time I run cluster analysis, I get more or less the same clusters, but with different identifiers. Thus, the structure in dataClustered looks the same after each iteration, but the groups are in different orders.

I would like to reorder my cluster identifiers so that lower cluster identifiers represent dense clusters and higher numbers represent rare clusters.

Is there an easy and / or intuitive way to do this?

T. Conversion

 clustIdx = [1 2 3 2 3 2 4 4 4 4]; 

to

 clustIdx = [4 2 3 2 3 2 1 1 1 1] 

Identities in themselves are arbitrary; information is contained in a grouping.

+4
source share
2 answers

If I understand correctly, you want to assign cluster label 1 to the cluster with most points, cluster label 2 to the cluster with the second point, etc.

Suppose you have an array of cluster labels named idx

 >> idx = [1 1 2 2 2 2 3 3 3]'; 

Now you can redo the idx as follows:

 %# count the number of occurrences cts = hist(idx,1:max(idx)); %# sort the counts - now we know that 1 should be last [~,sortIdx] = sort(cts,'descend') sortIdx = 2 3 1 %# create a mapping vector (thanks @angainor) map(sortIdx) = 1:length(sortIdx); map = 3 1 2 %# and remap indices map(idx) ans = 3 3 1 1 1 1 2 2 2 
+3
source

This may be inefficient, but a simple way is to first determine for each cluster how dense it is.

Then you can create an nx2 matrix containing Density and ClusterIdx

After that, a simple sort will give you ClusterIdx in the correct order

+1
source

All Articles