Clustering data with specified cluster centers in Python

I have a 1-dimensional numerical dataset (but my question is also applicable for an n-dimensional numerical dataset) that I want to group, and I already know the values ​​of the centers of the clusters. Therefore, I only want to bind each data point to its combined cluster center (the one closest to the datapoint).

I could write a special function, but I would prefer to use the Python scientific library, optimized to work on pandas.Series or numpy.arrays, like Scipy, because my dataset is very large (hundreds of millions of point data).

How can i do this?

Thank!

+4
source share
1 answer

scipy vq.

- , - . - (), :

>>> vq( array([0,5,5]), array([1,2,3]) )
(array([0, 2, 2]), array([ 1.,  2.,  2.]))
+3

All Articles