Multiple inputs: add the same input to multiple cards for comparison

I have two Mapper classes that take some files from the same folder as the input, and based on the name of the file that has a timestamp, determines which mapping file should be specified as input. From time to time it happens that the same input file must be given as input for two different Mappers. Now I tested it to work when two different inputs are passed for both Mappers, but when I give them the same input, one of the Mapper classes does not generate a result that will be used for comparison in the reducer.

The code is huge, so instead of putting it here, I will describe what I did. I created two lists and looked at the files in the directory and based on the names of files that have timestamps, I put them in two different lists and then added them to two different Mappers, that is, they were both calculated differently, so I use different Mappers for calculations, which are then used for comparison in the reducer, but when it is the same input file, since the time criteria for both cartographers are almost the same, one of the converters does not generate any result. This is due to the fact that one cartographer cannot access the file because the other uses it, and if so, there is some way to use it.

Here MapPath1 is one list, while MapPath2 is another

for(i=0;i<MapPath1.size();i++) MultipleInputs.addInputPath(job,new Path(MapPath1.get(i)),TextInputFormat.class,Map1.class); if(type.equals("comparative")) for(i=0;i<MapPath2.size();i++) MultipleInputs.addInputPath(job,new Path(MapPath2.get(i)),TextInputFormat.class,Map2.class); 

Update

I just found this question ( Several mappers in hadoop ) to be similar to mine, but I do not want to duplicate the input file, since it can be large, Can someone direct me on how I can create two separate tasks using different mappers and provide them with one gearbox.

0
source share
1 answer

one of the Mapper classes does not generate a result that will be used for comparison in the reducer.

I assume that both modules run on the same node task tracker, and the intermediate location of the mapping output is shared by the mapper task. You should check the task tracking nodes where these map tasks run to confirm this.

Also, you should only start working with mapper (s), setting the number of reduction tasks to zero and checking the output - this means that mapper does not use output directories.

To give a solution to your problem - it looks like you are transferring the same file to both cartographers and data from both transmitters provided to one reducer. It has some overlap. Is your output work fine for this duplication?

-1
source

All Articles