I played with various implementations of the Euclidean distance metric, and I noticed that I got different results for Scipy, pure Python, and Java.
Here, as I calculate the distance using Scipy (= option 1):
distance = scipy.spatial.distance.euclidean(sample, training_vector)
here is the Python implementation that I found on the forum (option 2):
distance = math.sqrt(sum([(a - b) ** 2 for a, b in zip(training_vector, sample)]))
and finally here is my implementation in Java (option 3):
public double distance(int[] a, int[] b) {
assert a.length == b.length;
double squaredDistance = 0.0;
for(int i=0; i<a.length; i++){
squaredDistance += Math.pow(a[i] - b[i], 2.0);
}
return Math.sqrt(squaredDistance);
}
sample training_vector 1-D 784, MNIST. sample training_vector. , (.. 1936 1, 1914 2 1382 3). , sample training_vector 1 2 (.. 1 ), . , ...?
: k-NN- MNIST. Java 94% 100 2700 . Python 1 75%...
- , ? , CSV .
Java 8, Python 2.7 Scipy 1.0.0.
Edit:
2
distance = math.sqrt(sum([(float(a) - float(b)) ** 2 for a, b in zip(training_vector, sample)]))
:
- ubyte (, , ...)
- 1 2 .
- 2 ( Python) 3 (Java)
, : SciPy (.. ?)?