The easiest way to represent Euclidean distance in scala

I am writing a data mining algorithm in Scala, and I want to write the Euclidean Distance function for this test and several train instances. I have Array[Array[Double]]with test and train instances. I have a method that goes through each test instance against all training instances and calculates the distances between them (choosing one test and train instance for iteration) and returns Double.

Say, for example, I have the following data:

testInstance = Array(Array(3.2, 2.1, 4.3, 2.8))
trainPoints = Array(Array(3.9, 4.1, 6.2, 7.3), Array(4.5, 6.1, 8.3, 3.8), Array(5.2, 4.6, 7.4, 9.8), Array(5.1, 7.1, 4.4, 6.9))

I have a method stub (allocation of a distance function) that returns neighbors around a given test instance:

def predictClass(testPoints: Array[Array[Double]], trainPoints: Array[Array[Double]], k: Int): Array[Double] = {

    for(testInstance <- testPoints)
    {
        for(trainInstance <- trainPoints) 
        {
            for(i <- 0 to k) 
            {
                distance = euclideanDistanceBetween(testInstance, trainInstance) //need help in defining this function
            }
        }
    }    
    return distance
}

I know how to write a generalized Euclidean Distance formula as:

math.sqrt(math.pow((x1 - y1), 2) + math.pow((x2 - y2), 2))

, , :

def distanceBetween(testInstance: Array[Double], trainInstance: Array[Double]): Double = {
  // subtract each element of trainInstance with testInstance
  // for example, 
  // iteration 1 will do [Array(3.9, 4.1, 6.2, 7.3) - Array(3.2, 2.1, 4.3, 2.8)]
  // i.e. sqrt(3.9-3.2)^2+(4.1-2.1)^2+(6.2-4.3)^2+(7.3-2.8)^2
  // return result
  // iteration 2 will do [Array(4.5, 6.1, 8.3, 3.8) - Array(3.2, 2.1, 4.3, 2.8)]
  // i.e. sqrt(4.5-3.2)^2+(6.1-2.1)^2+(8.3-4.3)^2+(3.8-2.8)^2
  // return result, and so on......
  }

?

+4
1

, , , . , , , , ​​. .

, :

for each position i:
  subtract the ith element of Y from the ith element of X
  square it
add all of those up
square root the whole thing

, :

square root the:
  sum of:
    zip X and Y into pairs
    for each pair, square the difference

, :

import math._

def distance(xs: Array[Double], ys: Array[Double]) = {
  sqrt((xs zip ys).map { case (x,y) => pow(y - x, 2) }.sum)
}

val testInstances = Array(Array(5.0, 4.8, 7.5, 10.0), Array(3.2, 2.1, 4.3, 2.8))
val trainPoints = Array(Array(3.9, 4.1, 6.2, 7.3), Array(4.5, 6.1, 8.3, 3.8), Array(5.2, 4.6, 7.4, 9.8), Array(5.1, 7.1, 4.4, 6.9))

distance(testInstances.head, trainPoints.head)
// 3.2680269276736382

, , , Double, . , ? , c, ?

def findNearestClasses(testPoints: Array[Array[Double]], trainPoints: Array[Array[Double]]): Array[Int] = {
  testPoints.map { testInstance =>
    trainPoints.zipWithIndex.map { case (trainInstance, c) =>
      c -> distance(testInstance, trainInstance)
    }.minBy(_._2)._1
  }
}    

findNearestClasses(testInstances, trainPoints)
// Array(2, 0)

, , k - :

def findKNearestClasses(testPoints: Array[Array[Double]], trainPoints: Array[Array[Double]], k: Int): Array[Int] = {
  testPoints.map { testInstance =>
    val distances = 
      trainPoints.zipWithIndex.map { case (trainInstance, c) =>
        c -> distance(testInstance, trainInstance)
      }
    val classes = distances.sortBy(_._2).take(k).map(_._1)
    val classCounts = classes.groupBy(identity).mapValues(_.size)
    classCounts.maxBy(_._2)._1
  }
}    

findKNearestClasses(testInstances, trainPoints)
// Array(2, 1)
+7

All Articles