I am trying to specify a custom clustering function to implement DBSCAN scikit-learn:
def geodistance(latLngA, latLngB): print latLngA, latLngB return vincenty(latLngA, latLngB).miles cluster_labels = DBSCAN( eps=500, min_samples=max(2, len(found_geopoints)/10), metric=geodistance ).fit(np.array(found_geopoints)).labels_
However, when I print out the arguments to my distance function, they are not at all what I would expect:
[ 0.53084126 0.19584111 0.99640966 0.88013373 0.33753788 0.79983037 0.71716144 0.85832664 0.63559538 0.23032912] [ 0.53084126 0.19584111 0.99640966 0.88013373 0.33753788 0.79983037 0.71716144 0.85832664 0.63559538 0.23032912]
This is what my found_geopoints array looks like:
[[ 4.24680600e+01 1.40868060e+02] [ -2.97677600e+01 -6.20477000e+01] [ 3.97550400e+01 2.90069000e+00] [ 4.21144200e+01 1.43442500e+01] [ 8.56111000e+00 1.24771390e+02] ...
So why not arguments for a pair of longitude latitude distances?
scikit-learn cluster-analysis dbscan
Nathan breit
source share