Spherical implementation of k-tools in Python

I have been using scipy k-means for quite some time, and I am very pleased with how it works in terms of usability and efficiency. However, now I want to study various variants of k-means, more specifically, I would like to use spherical k-means in some of my problems.

Did you know that a good Python implementation (i.e. looks like a scipy k-tool) of spherical k-means? If not, how difficult would it be to modify the scipy source code to adapt its k-means algorithm as spherical?

Thanks.

+7
python scipy k-means
source share
3 answers

Changing the distance function in k-means (in cosine) is not enough, because in spherical k-means you are trying to guarantee that the centers are also on the sphere.

In particular, the centers should be normalized after each maximization step. Indeed, when the centers and data points are normalized, there is a 1 to 1 relationship between the distance between the cosine and the Euclidean distance

|a - b|_2 = 2 * (1 - cos(a,b)) 

A new clara-labs / spherecluster package has been added that converts scikit k-means to spherical k-means , and also provides another cluster sphere algorithm.

+4
source share

It seems that the main feature of the spherical k-means is the use of the cosine distance instead of the standard Euclidean metric. With that said in another answer to SO there is a nice clean version of numpy / scipy:

Can I specify my own distance function with Scikits.Learn K-Mans Clustering?

If this does not match what you are looking for, you can try sklearn.cluster .

+2
source share

Here's how you do it if you have polar coordinates in a 3D sphere , for example ( lat , lon ) pairs :

  • If your coordinates ( lat , lon ) are measured in degrees, you can write a function that converts these points to Cartesian coordinates, for example:

     def cartesian_encoder(coord, r_E=6371): """Convert lat/lon to cartesian points on Earth surface. Input ----- coord : numpy 2darray (size=(N, 2)) r_E : radius of Earth Output ------ out : numpy 2darray (size=(N, 3)) """ def _to_rad(deg): return deg * np.pi / 180. theta = _to_rad(coord[:, 0]) # lat [radians] phi = _to_rad(coord[:, 1]) # lon [radians] x = r_E * np.cos(phi) * np.cos(theta) y = r_E * np.sin(phi) * np.cos(theta) z = r_E * np.sin(theta) return np.concatenate([x.reshape(-1, 1), y.reshape(-1, 1), z.reshape(-1, 1)], axis=1) 

    If your coordinates are already in radians, just delete the first 5 lines in this function.

  • Install the spherecluster package using pip. If your polar data specified as strings ( lat , lon ) is called X , and you want to find 10 clusters in it, the final code for clustering KMeans will be spherical:

     import numpy as np import spherecluster X_cart = cartesian_encoder(X) kmeans_labels = SphericalKMeans(10).fit_predict(X_cart) 
+1
source share

All Articles