Changing the distance function in k-means (in cosine) is not enough, because in spherical k-means you are trying to guarantee that the centers are also on the sphere.
In particular, the centers should be normalized after each maximization step. Indeed, when the centers and data points are normalized, there is a 1 to 1 relationship between the distance between the cosine and the Euclidean distance
|a - b|_2 = 2 * (1 - cos(a,b))
A new clara-labs / spherecluster package has been added that converts scikit k-means to spherical k-means , and also provides another cluster sphere algorithm.
Jaska
source share