Spark Combinebykey JABA lambda expression

I want to use the lambda function to calculate the average value by key ( JavaPairRDD<Integer, Double> pairs ). For this reason, I developed the following code:

 java.util.function.Function<Double, Tuple2<Double, Integer>> createAcc = x -> new Tuple2<Double, Integer>(x, 1); BiFunction<Tuple2<Double, Integer>, Double, Tuple2<Double, Integer>> addAndCount = (Tuple2<Double, Integer> x, Double y) -> { return new Tuple2(x._1()+y, x._2()+1 ); }; BiFunction<Tuple2<Double, Integer>, Tuple2<Double, Integer>, Tuple2<Double, Integer>> combine = (Tuple2<Double, Integer> x, Tuple2<Double, Integer> y) -> { return new Tuple2(x._1()+y._1(), x._2()+y._2() ); }; JavaPairRDD<Integer, Tuple2<Double, Integer>> avgCounts = pairs.combineByKey(createAcc, addAndCount, combine); 

However, eclipse does this error :

 The method combineByKey(Function<Double,C>, Function2<C,Double,C>, Function2<C,C,C>) in the type JavaPairRDD<Integer,Double> is not applicable for the arguments (Function<Double,Tuple2<Double,Integer>>, BiFunction<Tuple2<Double,Integer>,Double,Tuple2<Double,Integer>>, BiFunction<Tuple2<Double,Integer>,Tuple2<Double,Integer>,Tuple2<Double,Integer>>) 
+5
source share
1 answer

The combByKey method expects org.apache.spark.api.java.function.Function2 instead of java.util.function.BiFunction . Therefore, either you write:

 java.util.function.Function<Double, Tuple2<Double, Integer>> createAcc = x -> new Tuple2<Double, Integer>(x, 1); Function2<Tuple2<Double, Integer>, Double, Tuple2<Double, Integer>> addAndCount = (Tuple2<Double, Integer> x, Double y) -> { return new Tuple2(x._1()+y, x._2()+1 ); }; Function2<Tuple2<Double, Integer>, Tuple2<Double, Integer>, Tuple2<Double, Integer>> combine = (Tuple2<Double, Integer> x, Tuple2<Double, Integer> y) -> { return new Tuple2(x._1()+y._1(), x._2()+y._2() ); }; JavaPairRDD<Integer, Tuple2<Double, Integer>> avgCounts = pairs.combineByKey(createAcc, addAndCount, combine); 
+5
source

Source: https://habr.com/ru/post/1214461/


All Articles