Hadoop MapReduce: Can I define two cartographers and reducers in the same team work class?

Question

Hadoop MapReduce: Can I define two cartographers and reducers in the same team work class?

I have two separate java classes to do two different mapreduce jobs. I can run them myself. The input files on which they work are the same for both jobs. So my question is whether it is possible to define two cartographers and two reducers in the same java class, for example

mapper1.class mapper2.class reducer1.class reducer2.class

and then how

 job.setMapperClass(mapper1.class); job.setmapperClass(mapper2.class); job.setCombinerClass(reducer1); job.setCombinerClass(reducer2); job.setReducerClass(reducer1); job.setReducerClass(reducer2);

Do these methods establish methods to actually override previous ones or add new ones? I tried the code, but it only executes the last defined classes, which makes me think that it overrides. But should there be a way to do it right?

The reason I ask for this is to read the input files only once (one input / output), and then process two map reduction jobs. I would also like to know how I can write output files to two different folders. Both jobs are currently separate and require input and an output directory.

+8

mapreduce hadoop

Bob Jun 20 '12 at 15:23

source share

4 answers

Chun · Answer 1 · 2012-06-20T22:23:09+0000

You can have several cards, but in one task you can have only one reducer. And you need the functions MultipleInput , MultipleOutput and GenericWritable .

Using MultipleInput , you can set mapper and the corresponding inputFormat. Here is my post on how to use it.

Using GenericWritable , you can separate the different input classes in the reducer. Here is my post on how to use it.

Using MultipleOutput , you can output different classes to the same reducer.

Chris gerken · Answer 2 · 2012-06-20T15:54:01+0000

You can use the MultipleInputs and MultipleOutputs classes for this, but the output of both cards will appear in both reducers. If the data streams for two pairs of cards / gearboxes are really independent from each other, then save them as two separate jobs. By the way, MultipleInputs will run your mappers without changes, but reducers must be modified to use MultipleOutputs

pyfunc · Answer 3 · 2012-06-20T15:54:18+0000

According to my understanding, which is due to the use of map reduction with Hadoop streaming, you can associate several cartographers and reducers, where each consumes the output of the other

But you should not run different cards and reducers at the same time. The users themselves depend on the lack of blocks for processing. Mapper should be created based on this solution, not the variety of mapmakers available to work.

[Edit: based on your comment]

I do not think that's possible. You can chain (where the reducers will receive all the inputs from the cartographers. They can be ordered, but you cannot only run independent sets of carterers and reducers.

I think that you can do it, even if you get both inputs from the converters in both of your gearboxes, you can output the mappers (K, V) in such a way that you can distinguish in your gearboxes which is the source (K, V). Thus, both gearboxes can be processed on selective (K, V) pairs.

jeton · Answer 4 · 2016-04-17T22:44:51+0000

The ChainMapper class allows you to use multiple Mapper classes within the same map task. For example, look here .

Hadoop MapReduce: Can I define two cartographers and reducers in the same team work class?

More articles: