Scala / Hadoop: specifying the context for the reducer

Question

Scala / Hadoop: specifying the context for the reducer

Before I start playing with Scoobi or Scrunch, I thought that I would try to port WordCount to scala (2.9.1) using only Java Hadoop bindings (0.20.1).

Initially, I had:

class Map extends Mapper[LongWritable, Text, Text, IntWritable] { @throws[classOf[IOException]] @throws[classOf[InterruptedException]] def map(key : LongWritable, value : Text, context : Context) { //...

Which compiled fine, but gave me a runtime error:

 java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

Looking around a bit, I realized that this was because I did not define the correct map method (should have been disabled by the absence of an override ), so I fixed it as:

 override def map(key : LongWritable, value : Text, context : Mapper[LongWritable, Text, Text, IntWritable]#Context) {

And voila, not a runtime error.

But then I looked at the exit to work and realized that my gearbox does not start.

So, I looked at my gearbox and noticed that the reduce signature had the same problem as my cartographer:

 class Reduce extends Reducer[Text, IntWritable, Text, IntWritable] { @throws[classOf[IOException]] @throws[classOf[InterruptedException]] def reduce(key : Text, value : Iterable[IntWritable], context : Context) { //...

Therefore, I guessed that the reduce identifier was used due to a mismatch.

But when I tried to fix the reduce signature:

 override def reduce(key: Text, values : Iterable[IntWritable], context : Reducer[Text, IntWritable, Text, IntWritable]#Context) {

Now I got a compiler error:

 [ERROR] /path/to/src/main/scala/WordCount.scala:32: error: method reduce overrides nothing [INFO] override def reduce(key: Text, values : Iterable[IntWritable],

Therefore, I am not sure what I am doing wrong.

+7

scala mapreduce hadoop

rampion Mar 25 '12 at 1:08

source share

1 answer

jwinder · Accepted Answer · 2012-03-25T02:26:25+0000

At a glance, make sure the values are java.lang.Iterable, not scala Iterable. Either import java.lang.Iterable, or:

 override def reduce(key: Text, values : java.lang.Iterable[IntWritable], context : Reducer[Text, IntWritable, Text, IntWritable]#Context)

Scala / Hadoop: specifying the context for the reducer

More articles: