Apache Spark - reducebyKey - Java -

I am trying to understand work reduceByKeyin Spark using java as a programming language.

Say that I have the sentence "I am the one who is." I break the sentence into words and save it as a list [I, am, who, I, am].

Now this function assigns to 1each word:

JavaPairRDD<String, Integer> ones = words.mapToPair(new PairFunction<String, String, Integer>() {
    @Override
    public Tuple2<String, Integer> call(String s) {
        return new Tuple2<String, Integer>(s, 1);
    }
});

So, the output looks something like this:

(I,1) 
(am,1)
(who,1)
(I,1)
(am,1)

Now, if I have 3 reducers, each reducer will receive a key and the values ​​associated with this key:

reducer 1:
    (I,1)
    (I,1)

reducer 2:
    (am,1)
    (am,1)

reducer 3:
    (who,1)

I wanted to know

and. What exactly happens here in the function below.
b. What are the parameters new Function2<Integer, Integer, Integer>
c. Basically, how JavaPairRDD is formed.

JavaPairRDD<String, Integer> counts = ones.reduceByKey(new Function2<Integer, Integer, Integer>() {
    @Override
    public Integer call(Integer i1, Integer i2) {
        return i1 + i2;
    }
});
+4
source share
3 answers

, , 2 , 1, Reducer --.

API , . , 2 1 (, ). 1. N-to-1, 2-to-1. .

(, ) (, ).

Mapper Reducer Hadoop MapReduce ( , ) . , , .

, Mappers Reducers, Spark - mapPartitions, , groupByKey. , , , MapReduce Spark. . .

+5

ReduceByKey :

RDD, , , . , , RDD :

[k, V1], [K, V2], V1, V2 Function2() .

  • K, V i.e V1.
  • K, V i.e V2.
  • , V1 V2 ( , ).

, , RDD , node master, .

, .

+6

In short, consider the following:

Input: {(a:1),(b:2),(c:2),(a:3),(b:2),(c:3)}

Go to reduceByKey.

Output: {(a:4),(b:4),(c:5)}

-3
source

All Articles