Apache Spark mapPartitionsWithIndex

Question

Apache Spark mapPartitionsWithIndex

Can someone give an example of the correct use of mapPartitionsWithIndex in Java? I found many examples of Scala, but there is a flaw in Java. I understand that individual sectors will be processed by individual nodes when using this function.

I get the following error

method mapPartitionsWithIndex in class JavaRDD<T> cannot be applied to given types;
    JavaRDD<String> rdd = sc.textFile(filename).mapPartitionsWithIndex
    required: Function2<Integer,Iterator<String>,Iterator<R>>,boolean
    found: <anonymous Function2<Integer,Iterator<String>,Iterator<JavaRDD<String>>>>

While doing

JavaRDD<String> rdd = sc.textFile(filename).mapPartitionsWithIndex(
    new Function2<Integer, Iterator<String>, Iterator<JavaRDD<String>> >() {

    @Override
    public Iterator<JavaRDD<String>> call(Integer ind, String s) {

+4

java mapreduce apache-spark

YuliaSh. Oct 20 '14 at 12:58

source share

1 answer

Juh_ · Accepted Answer · 2015-03-09T09:07:52+0000

Here is the code I use to delete the first line of the csv file:

JavaRDD<String> rawInputRdd = sparkContext.textFile(dataFile);

Function2 removeHeader= new Function2<Integer, Iterator<String>, Iterator<String>>(){
    @Override
    public Iterator<String> call(Integer ind, Iterator<String> iterator) throws Exception {
        if(ind==0 && iterator.hasNext()){
            iterator.next();
            return iterator;
        }else
            return iterator;
    }
};
JavaRDD<String> inputRdd = rawInputRdd.mapPartitionsWithIndex(removeHeader, false);

Apache Spark mapPartitionsWithIndex

More articles: