I have an array of such floats:
[1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200]
Now I want to split the array as follows:
[[1.91, 2.87, 3.61] , [10.91, 11.91, 12.82] , [100.73, 100.71, 101.89] , [200]]
// [200] will be considered an outlier due to less cluster support
I need to find this segment for multiple arrays, and I don't know what section size should be. I tried to do this using hierarchical clustering (Agglomerative) and it gives me satisfactory results. However, the problem is that I was asked not to use clustering algorithms for a one-dimensional problem, since their theoretical justification (as well as for multidimensional data) does not have for this.
I spent a lot of time to find a solution. However, the sentences look very different: this and this VS. and and.
I found another suggestion, not clustering, i.e. natural gap optimization . However, it is also necessary to declare the partition number, for example, K-means (right?).
This is rather confusing (especially because I have to perform such segmentation on multiple arrays, and it is impossible to find out the optimal number of partitions).
Is there a way to find partitions (so we can reduce the variance within partitions and maximize the difference between partitions) with some theoretical justification?
Any pointers to articles / articles (if available for implementing C / C ++ / Java) with some theoretical justification will be very useful to me.