Clustering Tracing: Which Clustering Method?

As a newbie to Machine Learning, I have a set of toolpaths that can have different lengths. I want to group them, because some of them are actually the same way , and they just SEEM are different due to noise.

In addition, not all of them have the same length . Therefore, it is possible, although the path A does not coincide with the path B, but it is part of the path B. I want to represent this property after clustering.

I have only a little knowledge of K-means Clustering and Fuzzy N-means Clustering . How can I choose between the two? Or should I use other methods?

Any method that takes into account "affiliation"? (for example, after clustering, I have 3 clusters A, B and C One particular trajectory X refers to cluster A And the shorter trajectory Y , although it is not grouped into A , is identified as part of trajectory B )

==================== UPDATE =======================

The above trajectories are those of pedestrians. They can be represented either as a series of points (x, y) , or a series of step vectors (length, direction) . The presentation form is under my control.

+7
algorithm machine-learning cluster-analysis data-mining
source share
4 answers

It might be a little late, but I'm also working on the same issue. I suggest you take a look at TRACLUS , an algorithm created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD07. http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf

This is the best approach I've seen for clustering toolpaths because:

  • Common sub-trajectories can be detected.
  • Focuses on segments instead of dots (therefore, filters out noise emissions ).
  • He works on trajectories of various lengths .

In principle, this is a two-phase approach:

  • First phase . Section. Divide the paths into segments. This is done using MDL optimization with complexity O (n), where n is the number of points in a given trajectory. Here, the input is a set of paths, and the output is a set of segments.

    • Difficulty: O (n), where n is the number of points on the trajectory
    • Entrance: a set of trajectories.
    • Output: setting D segments
  • Stage Two . Group. This phase discovers clusters using some version of density-based clustering, as in DBSCAN. The entrance to this phase is a set of segments obtained from the first phase, and some parameters of what constitutes the neighborhood, and the minimum number of lines that can make up a cluster. The output is a set of clusters. Clustering is done by segments. They define their distance measure, consisting of three components: parallel distance, perpendicular distance and angular distance. This phase has complexity O (n log n), where n is the number of segments.

    • Difficulty: O (n log n), where n is the number of segments on the set D
    • Input: set D segments, parameter E, which sets the proximity error, and parameter MinLns, which is the minimum number of rows.
    • Output: set the C cluster, that is, a cluster of segments (trajectory clusters).

Finally, they compute for each cluster a a representative trajectory that is not something else that has detected a common subtraction in each cluster .

They have some pretty interesting examples, and the article is very well explained. Once again, this is not my algorithm, so be sure to bring them if you are doing research.

PS: I made some slides based on their work, for educational purposes only: http://www.slideshare.net/ivansanchez1988/trajectory-clustering-traclus-algorithm

+9
source share

Each clustering algorithm requires a metric. You need to determine the distance between your samples. In your case, a simple Euclidean distance is not a good idea, especially if the paths can have different lengths.

If you define a metric, you can use any clustering algorithm that allows you to customize the metric. You probably do not know the correct number of clusters in advance, and hierarchical clustering is a good option. K-tool does not allow you to customize the metric, but there are modifications of K-tools that do (for example, K-medoids)

The difficult part determines the distance between two trajectories (time series). A common approach is DTW (Dynamic Time Warping). To increase productivity, you can zoom in on your trajectory with fewer points (there are many algorithms for this).

+6
source share

Nothing will work. Because what does it mean here

Check out distance-based clustering methods like hierarchical clustering (for small datasets, but you probably don't have thousands of paths) and DBSCAN.

Then you only need to select the appropriate distance function, which allows, for example, time differences and spatial resolution of the trajectories.

Distance functions, such as dynamic time coefficient change (DTW) , can accommodate this.

+5
source share

This is a good concept and opportunity for real-time applications. In my opinion, you can use any clustering, but you need to choose the appropriate measure of dissimilarity, and then think about the complexity of the calculations. Paper ( http://link.springer.com/chapter/10.1007/978-81-8489-203-1_15 ) used Hausdorff and proposed a technique for reducing complexity and paper ( http://www.cit.iit.bas.bg /CIT_2015/v-15-2/4-5-TCMVS%20A-edited-md-Gotovop.pdf ) described the use of "Methods of trajectory clustering based on similarity with several views"

0
source share

All Articles