Cluster Center Search

I have the following problem - an annotation made to identify key issues.

I have 10 points, each of which is at some distance from the other. I want

  • we can find the center of the cluster, that is, the point for which the pair distance to each other’s points is minimized. Let p (j) ~ p (k) represent the distance between the points j and kp (i) in pairs - the center point of the cluster if p (i) st min [sum (p (j) ~ p (k))] for all 0 <j, k <= n, where we have n points in the cluster
  • determine how to split a cluster into two clusters as soon as the number of data points in the cluster exceeds a certain threshold t.

This is not Euclidean space. But the distances can be summarized as follows: p (i) - point i:

p(1) p(2) p(3) p(4) p(5) p(6) p(7) p(8) p(9) p(10) p(1) 0 2 1 3 2 3 3 2 3 4 p(2) 2 0 1 3 2 3 3 2 3 4 p(3) 1 1 0 2 0 1 2 1 2 3 p(4) 3 3 2 0 1 2 3 2 3 4 p(5) 2 2 1 1 0 1 2 1 2 3 p(6) 3 3 2 2 1 0 3 2 3 4 p(7) 3 3 2 3 2 3 0 1 2 3 p(8) 2 2 1 2 1 2 1 0 1 2 p(9) 3 3 2 3 2 3 2 1 0 1 p(10) 4 4 3 4 3 4 3 2 1 0 

How would I calculate what is the central point of this cluster?

+6
algorithm cluster-analysis data-mining
source share
4 answers

As I understand it, this is similar to K Mans Clustering, and what you are looking for is usually referred to as "honeyoid."

See here: http://en.wikipedia.org/wiki/Medoids or here: http://en.wikipedia.org/wiki/K-medoids

+8
source share

I’m probably going to make this frisson, which appears before showing complete stupidity. But isn't that easy to brute force? In Python:

 distances = [ [ 0 , 2 , 1 , 3 , 2 , 3 , 3 , 2 , 3 , 4 , ], [ 2 , 0 , 1 , 3 , 2 , 3 , 3 , 2 , 3 , 4 , ], [ 1 , 1 , 0 , 2 , 0 , 1 , 2 , 1 , 2 , 3 , ], [ 3 , 3 , 2 , 0 , 1 , 2 , 3 , 2 , 3 , 4 , ], [ 2 , 2 , 1 , 1 , 0 , 1 , 2 , 1 , 2 , 3 , ], [ 3 , 3 , 2 , 2 , 1 , 0 , 3 , 2 , 3 , 4 , ], [ 3 , 3 , 2 , 3 , 2 , 3 , 0 , 1 , 2 , 3 , ], [ 2 , 2 , 1 , 2 , 1 , 2 , 1 , 0 , 1 , 2 , ], [ 3 , 3 , 2 , 3 , 2 , 3 , 2 , 1 , 0 , 1 , ], [ 4 , 4 , 3 , 4 , 3 , 4 , 3 , 2 , 1 , 0 , ], ] currentMinimum = 99999 for point in range ( 10 ) : distance_sum = 0 for second_point in range ( 10 ) : if point == second_point : continue distance_sum += distances [ point ] [ second_point ] print '>>>>>', point, distance_sum if distance_sum < currentMinimum : currentMinimum = distance_sum centre = point print centre 
+2
source share

but)

  • Find the average or average values ​​of all distances. = avgAll
  • For each p, find the average distance to other cars. = avgP (i)
  • Choose a closer center. avgAll ~ = avgP (i)

b) I don’t know yet.

maybe for every p, find a closer car.

according to this logic makes the graph.

something (I don’t know yet) will divide the schedule

+1
source share

What you are trying to do, or at least (b), belongs to cluster analysis. A branch of mathematics / statistics / econometrics, where data (for example, points in n-dimensional space) is shared between groups or clusters. How to do this, these are not trivial questions; there are many, many possible ways.

Read more on the Wikipedia article on cluster analysis .

+1
source share

All Articles