I am running Hadoop 0.20.1 under SLES 10 (SUSE).
My map task takes a file and generates a few more, then I generate the results from these files. I would like to know where I should place these files so that the performance is good and there are no conflicts. If Hadoop can delete the directory automatically, that would be nice.
I am currently using a temporary folder and task id to create a unique folder and then work in subfolders of this folder.
reduceTaskId = job.get("mapred.task.id"); reduceTempDir = job.get("mapred.temp.dir"); String myTemporaryFoldername = reduceTempDir+File.separator+reduceTaskId+ File.separator; File diseaseParent = new File(myTemporaryFoldername+File.separator +REDUCE_WORK_FOLDER);
The problem with this approach is that I'm not sure if this is optimal, also I have to delete every new folder, or I will start to escape from space. thanks akintayo
(edit) I found that the best place to store files that you don’t want outside of the life of the map is job.get ("job.local.dir") , which provides a path that will be deleted when the map tasks complete. I am not sure if the deletion is done based on a key or for each tasktracker.
source share