OPTICS Clustering algorithm. How to get the best epsilon

Question

OPTICS Clustering algorithm. How to get the best epsilon

I am implementing a project that should cluster geographic points. The OPTICS algorithm seems to be a very good solution. As input, only 2 parameters are required (MinPts and Epsilon), which, respectively, represent the minimum number of points needed to be considered as a cluster, and the distance value used for comparison if two points are in the same cluster.

My problem is that due to the huge variety of points I can’t install a fixed epsilon. Just look at the image below.

The same structure of the dots, but on a different scale will result in very different ones. Suppose to set MinPts = 2 and epsilon = 1 km. On the left, the algorithm will create 2 clusters (red and blue), but on the right it will create one cluster containing all the points (red), but I would like to get 2 clusters even on the right.

So my question is: is there a way to dynamically calculate the epsilon value to get this result?

UPDATE June 5, 2012 15.15: I thought I was using an implementation of the OPTICS algorithm from the javaml library, but it seems that this is actually an implementation of the DBSCAN algorithm. So now the question is: does anyone know a Java-based OPTICS implementation of the algorithm?

Thank you very much and sorry for my bad english.

Marco

+6

java algorithm cluster-analysis data-mining optics-algorithm

smellyarmpits Jun 04 '12 at 16:37

source share

4 answers

You can try to scale epsilon to the total size of the enclosing rectangle. For example, your left data is about 4 km x 6 km (using the Mark I measuring box), and on the right is about 2 km x 2 km. So, the epsilon on the right should be about 2.5 times smaller.

Of course, this does not work reliably. If in your right data there was an additional single point 4 km to the right and 2 km down, this would make the rectangle with the closure on the right the same as on the left, and you would get similar (incorrect) results.

+1

user949300 Jun 04 '12 at 19:02

source share

You can try a minimal spanning tree and then remove the longest edge. The remaining spanning tree and their center is the best center for OPTICS, and you can count the number of points around it.

+1

Gigamegs Jul 01 '12 at 20:37

source share

In your explanation above, this is a zooming that creates uncertainty. When your scale increases, your epsilon should change accordingly. Since they are on two very different scales, the two images you presented are not the same set of points. They will not respond equally to your OPTICS algorithm without changing the parameters.

In short, no. There is no way to dynamically compute epsilon to get this result. Clustering is already similar to NP-Hard, and these clustering algorithms (optics, k-means, veroni) can only approximate the optimal solution.

0

Peaches491 Jun 04 '12 at 17:36

source share

Anony-mousse · Accepted Answer · 2012-06-04T20:20:10+0000

The OPTICS epsilon value is intended solely to limit the complexity of execution when using index structures. If you do not have an acceleration index, you can set it to infinity .

Quote Wikipedia on OPTICS

The \ varepsilon parameter, strictly speaking, is not needed. It can be set to the maximum value. When a spatial index is available, it, however, plays a practical role when it comes to complexity.

What you seem to be more like DBSCAN than OPTICS. In OPTICS, you don’t have to choose epsilon (its authors should have called max-epsilon!), But your cluster extraction method will take care of this. Are you using the Xi highlight suggested in the OPTICS document?

minPts is much more important. You should try a value of at least 5 or 10, not 2. With 2, you essentially run clusters with one connection!

The above example should work fine once you increase minPts!

Re: edit: As you can even see in the Wikipedia article, ELKI has the correct implementation of OPTICS and its in Java.

OPTICS Clustering algorithm. How to get the best epsilon

More articles: