Why do we need to explicitly set the input / value class in a Hadoop program?

Question

Why do we need to explicitly set the input / value class in a Hadoop program?

The book "Hadoop: The Definitive Guide" has an example program with the code below.

JobConf conf = new JobConf(MaxTemperature.class);  
conf.setJobName("Max temperature");  
FileInputFormat.addInputPath(conf, new Path(args[0]));  
FileOutputFormat.setOutputPath(conf, new Path(args[1]));  
conf.setMapperClass(MaxTemperatureMapper.class);  
conf.setReducerClass(MaxTemperatureReducer.class);  
conf.setOutputKeyClass(Text.class);  
conf.setOutputValueClass(IntWritable.class);

The MR structure should be able to define the output class and value class from the Mapper and Reduce functions that are set in the JobConf class. Why do we need to explicitly set the output class and value class in the JobConf class? In addition, there is no similar API for a pair of input keys / values.

+5

input class hadoop

Praveen sripati 18 sept '11 at 11:44

source share

1 answer

Thomas Jungblut · Accepted Answer · 2011-09-18T16:21:40+0000

- [1]. K/V generics. ( , ) .

k/v , SequenceFiles , . , SequenceFile, .

[1] http://download.oracle.com/javase/tutorial/java/generics/erasure.html

Why do we need to explicitly set the input / value class in a Hadoop program?

More articles: