MapReduce Output ArrayWritable

I am trying to get the result from ArrayWritable in a simple MapReduce-Task. I found several questions with a similar problem, but I cannot solve the problem in my own code. Therefore, I look forward to your help. Thanks:)!

Input: A text file with some sentence.

The output should be:

<Word, <length, number of same words in Textfile>> Example: Hello 5 2 

The result that I get in my assignment:

 hello WordLength_V01$IntArrayWritable@221cf05 test WordLength_V01$IntArrayWritable@799e525a 

I think the problem is with the IntArrayWritable subclass, but I am not getting the correct correction to fix this. We have Hadoop 2.5. To get this result, I use the following code:

The main method:

 public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word length V1"); // Set Classes job.setJarByClass(WordLength_V01.class); job.setMapperClass(MyMapper.class); // job.setCombinerClass(MyReducer.class); job.setReducerClass(MyReducer.class); // Set Output and Input Parameters job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntArrayWritable.class); // Number of Reducers job.setNumReduceTasks(1); // Set FileDestination FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } 

Mapper:

 public static class MyMapper extends Mapper<Object, Text, Text, IntWritable> { // Initialize Variables private final static IntWritable one = new IntWritable(1); private Text word = new Text(); // Map Method public void map(Object key, Text value, Context context) throws IOException, InterruptedException { // Use Tokenizer StringTokenizer itr = new StringTokenizer(value.toString()); // Select each word while (itr.hasMoreTokens()) { word.set(itr.nextToken()); // Output Pair context.write(word, one); } } } 

Dilution:

 public static class MyReducer extends Reducer<Text, IntWritable, Text, IntArrayWritable> { // Initialize Variables private IntWritable count = new IntWritable(); private IntWritable length = new IntWritable(); // Reduce Method public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // Count Words int sum = 0; for (IntWritable val : values) { sum += val.get(); } count.set(sum); // Wordlength length.set(key.getLength()); // Define Output IntWritable[] temp = new IntWritable[2]; IntArrayWritable output = new IntArrayWritable(temp); temp[0] = count; temp[1] = length; // Output output.set(temp); context.write(key, new IntArrayWritable(output.get())); } } 

Subclass

 public static class IntArrayWritable extends ArrayWritable { public IntArrayWritable(IntWritable[] intWritables) { super(IntWritable.class); } @Override public IntWritable[] get() { return (IntWritable[]) super.get(); } @Override public void write(DataOutput arg0) throws IOException { for(IntWritable data : get()){ data.write(arg0); } } } 

I used the following links to find a solution:

I am very grateful for any idea!

-------- Decision --------

New SubClass:

 public static class IntArrayWritable extends ArrayWritable { public IntArrayWritable(IntWritable[] values) { super(IntWritable.class, values); } @Override public IntWritable[] get() { return (IntWritable[]) super.get(); } @Override public String toString() { IntWritable[] values = get(); return values[0].toString() + ", " + values[1].toString(); } } 

New reduction method:

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // Count Words int sum = 0; for (IntWritable val : values) { sum += val.get(); } count.set(sum); // Wordlength length.set(key.getLength()); // Define Output IntWritable[] temp = new IntWritable[2]; temp[0] = count; temp[1] = length; context.write(key, new IntArrayWritable(temp)); } 
+5
source share
1 answer

Everything looks perfect. You just need to write another printStrings () method in a subclass that returns a string instead of an array. An array of strings will be returned in the built toString (), so the reason for this is the address in your view instead of values.

 public String printStrings() { String strings = ""; for (int i = 0; i < values.length; i++) { strings = strings + " "+ values[i].toString(); } return strings; } 
0
source

Source: https://habr.com/ru/post/1214885/


All Articles