Python Refers to a cluster k-for example.

Question

Python Refers to a cluster k-for example.

I read the docs here , and also see this tutorial, but I'm still missing something fundamental in using K-tools in scikit-learn:

Let's say I have a dataset as such:

|UserName| Variable1 | Variable2 | Variable3 | Cluster | | bob | 1 | 3 | 7 | | | joe | 2 | 4 | 8 | | | bill | 1 | 6 | 4 | |

Since K-mean accepts a numpy array, I have to remove the username and just use numeric variables. But after creating the clusters, how can I associate them with each individual user for further analysis. I. How can I populate the Cluster column with the corresponding cluster number?

+3

python numpy scikit-learn

Redaven Jan 19 '14 at 2:25

source share

2 answers

I remember the time when I had to face the same question. :-)

Here is what I know. When you feed into the X data matrix in KMeans (or any sklearn algorithm, for that matter), the order is remembered. Let's say you create a KMeans cluster:

 from sklearn.cluster import KMeans kms = KMeans().fit(X) #where X is your data

You can get tags like:

 labels = list(kms.labels_)

The way I think is usually found in lists or dictionaries, so I tend to throw a lot of things into lists or arrays.

The order of the labels will be identical to your dataset. In other words, if the bob data is at position 0 , etc., then kms.labels_ will return the labels in the same order.

To put them together, zip or map .

+1

ericmjl Jan 19 '14 at 2:49

source share

fivetentaylor · Accepted Answer · 2014-01-19T03:39:49+0000

Here is an example assuming you are reading data in a list from a file:

 import sklearn.cluster import numpy as np data = [ ['bob', 1, 3, 7], ['joe', 2, 4, 8], ['bill', 1, 6, 4], ] labels = [x[0] for x in data] a = np.array([x[1:] for x in data]) clust_centers = 2 model = sklearn.cluster.k_means(a, clust_centers)

now contains a tuple with (centroids, tags, intertia)

So, return these labels as follows:

 clusters = dict(zip(lables, model[1]))

And print the cluster id for 'one':

 print clusters['bob']

Or send it back to csv as follows:

 for d in data: print '%s,%d' % (','.join([str(x) for x in d]), clusters[d[0]])

Python Refers to a cluster k-for example.

More articles: