Clustering data with specified cluster centers in Python

Question

Clustering data with specified cluster centers in Python

I have a 1-dimensional numerical dataset (but my question is also applicable for an n-dimensional numerical dataset) that I want to group, and I already know the values of the centers of the clusters. Therefore, I only want to bind each data point to its combined cluster center (the one closest to the datapoint).

I could write a special function, but I would prefer to use the Python scientific library, optimized to work on pandas.Series or numpy.arrays, like Scipy, because my dataset is very large (hundreds of millions of point data).

How can i do this?

Thank!

+4

python numpy pandas

sweeeeeet Aug 14 '14 at 9:53

source share

1 answer

goncalopp · Accepted Answer · 2014-08-14T10:14:38+0000

scipy vq.

- , - . - (), :

>>> vq( array([0,5,5]), array([1,2,3]) )
(array([0, 2, 2]), array([ 1.,  2.,  2.]))

Clustering data with specified cluster centers in Python

More articles: