I am trying to do some binary classification with svm flink-ml implementation. When I rated the classification, I got an 85% error rate in the training kit. I built 3D data, and it looked like you could perfectly separate data with a hyperplane.
When I tried to get the weight vector from svm, I only saw the ability to get the weight vector without intercepting the hyperplane. So, just the hyperplane passes (0,0,0).
I have no clue where the error may be and evaluate each clue.
val env = ExecutionEnvironment.getExecutionEnvironment val input: DataSet[(Int, Int, Boolean, Double, Double, Double)] = env.readCsvFile(filepathTraining, ignoreFirstLine = true, fieldDelimiter = ";") val inputLV = input.map( t => { LabeledVector({if(t._3) 1.0 else -1.0}, DenseVector(Array(t._4, t._5, t._6)))} ) val trainTestDataSet = Splitter.trainTestSplit(inputLV, 0.8, precise = true, seed = 100) val trainLV = trainTestDataSet.training val testLV = trainTestDataSet.testing val svm = SVM() svm.fit(trainLV) val testVD = testLV.map(lv => (lv.vector, lv.label)) val evalSet = svm.evaluate(testVD) // groups the data in false negatives, false positives, true negatives, true positives evalSet.map(t => (t._1, t._2, 1)).groupBy(0,1).reduce((x1,x2) => (x1._1, x1._2, x1._3 + x2._3)).print()
The displayed data is shown here:

scala svm apache-flink flinkml
hucko
source share