DBSCAN with python and scikit-learn: What are the whole labs returned by make_blobs?

Question

DBSCAN with python and scikit-learn: What are the whole labs returned by make_blobs?

I am trying to understand an example of the DBSCAN algorithm implemented by scikit ( http://scikit-learn.org/0.13/auto_examples/cluster/plot_dbscan.html ).

I changed the line

X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4)

with X = my_own_data , so I can use my own data for DBSCAN.

now the variable labels_true , which is the second return argument to make_blobs , is used to calculate some result values, for example:

 print "Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels) print "Completeness: %0.3f" % metrics.completeness_score(labels_true, labels) print "V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels) print "Adjusted Rand Index: %0.3f" % \ metrics.adjusted_rand_score(labels_true, labels) print "Adjusted Mutual Information: %0.3f" % \ metrics.adjusted_mutual_info_score(labels_true, labels) print ("Silhouette Coefficient: %0.3f" % metrics.silhouette_score(D, labels, metric='precomputed'))

How can I calculate labels_true from my X data? what exactly does scikit with label mean in this case?

thank you for your help!

+7

python scikit-learn dbscan

otmezger Apr 04 '13 at 18:39

source share

2 answers

my name is Orlando Salazar, this is my question

I am new to data and I have a question. If label-true is the real value of my real label this dimension refers to (theoretical value), how do I calculate this value in a DataFrame? Is there a theoretical way or using an algorithm? I do not understand how to calculate or how difficult it is to find a shortcut a true shortcut.

thank you brother

0

osalaz Dec 9 '18 at 18:31

source share

Dougal · Accepted Answer · 2013-04-04T18:45:01+0000

labels_true is the "true" assignment of points to methods: on which cluster should they be included. This is available because make_blobs knows which of the "blob" he generated the point.

You cannot get this for your arbitrary X data unless you have any true labels for the points (in this case, you will not do clustering anyway). It just shows some indicators of how well clustering is done in the fake case when you know the true answer.

DBSCAN with python and scikit-learn: What are the whole labs returned by make_blobs?

More articles: