I am trying to find out where the output of the map task is saved to disk before it can be used by the Reduce task.
Note: - the used version of Hadoop 0.20.204 with the new API
For example, when rewriting a map method in the Map class:
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); }
I am interested to know where end.write () finishes writing data. So far I have come across:
FileOutputFormat.getWorkOutputPath(context);
Which gives me the following location in hdf:
hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0
When I try to use it as input for another job, it causes the following error:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0
Note. the task runs in Mapper, so technically the temporary folder in which the Mapper task writes its output exists when a new task starts. Again, he still says that the input path does not exist.
Any ideas on where the temporary output is written? Or, maybe, in what place, where can I find the result of the "Map" task during a task that has both a "Map" and a "Reduce" stage?
inquire
source share