K-means with ellipsoids

I have n points in R ^ 3 that I want to cover k ellipsoids or cylinders (I don't care, whichever is simpler). I want to minimize volume pooling to a minimum. Let n be tens of thousands and k a handful. Development time (i.e. Simplicity) is more important than runtime.

Obviously, I can use k-tools and use perfect balls for my ellipsoids. Or I can run a k-tool, and then use the minimum spanning ellipsoids per cluster, rather than covering with balls, although in the worst case it is not better. I saw talk about processing anisotropy using k-means, but the links I saw seemed to think I had a tensor in my hand; I don’t know, I just know that the data will be a union of ellipsoids. Any suggestions?

[Edit: There are a couple of voices for picking up a mixture of multivariate Gaussians that seems viable to try. Calling the EM code for this will not reduce the volume of the union, but, of course, the k-tool also does not minimize the volume.]

+4
source share
3 answers

So, you probably know that the k-tool is NP-hard, and this problem is even more general (more complicated). Since you want to make ellipsoids, it can be very useful to choose a mixture of k multidimensional Gaussian distributions. You will probably want to try and find a maximum likelihood solution that is non-convex optimization, but at least it's easy to formulate and probably the code is available.

In addition to this, you probably have to write your own heuristic search algorithm from scratch, this is just a huge task.

+3
source

I did something similar with multivariate gausses using this method . The authors use kurtosis as a split measure, and I found that this is a satisfactory method for my application, the clustering points obtained using a laser rangefinder (i.e. computer vision).

+1
source

If ellipsoids can intersect a lot, then methods like k-value that try to assign points to single clusters will not work very well. Part of each ellipsoid should correspond to the surface of your object, but the rest can be inside it without caring. That is, spanning algorithms seem to me completely different from clustering / splitting algorithms; unions are not splits.

Gaussian mixtures with lots of overlap? I don’t know, but look at the image and code on Numerical Recipes with. 845 .

Coverages are difficult even in 2d, see find-near-minimal-covering-set-of-discs-on-a-2-d-plane .

0
source

All Articles