How to recursively use the directory structure in the new Hadoop API?

My file structure is as follows:

/indir/somedir1/somefile
/indir/somedir1/someotherfile...
/indir/somedir2/somefile
/indir/somedir2/someotherfile...

Now I want to pass everything recursively to the MR job, and I'm using the new API. So I did:

FileInputFormat.setInputPaths(job, new Path("/indir"));

But job failed:

Error: java.io.FileNotFoundException: Path is not a file: /indir/somedir1

I am using Hadoop 2.4 and this post states that the new Hadoop 2 API does not support recursive files. But I wonder how this can be, because I believe that the most common thing in the world is to drop a large nested directory structure into Hadoop ...

So is this intended, or is it a mistake? In both cases, is there a different solution than using the old API?

+4
source share
2 answers

. JIRA, , , :

  • mapreduce.input.fileinputformat.input.dir.recursive true ( mapred.input.dir.recursive, )
  • FileInputFormat.addInputPath,

.

+14

- FileInputFormat.

FileInputFormat.setInputDirRecursive(job, true);
+1

All Articles