How to recursively use the directory structure in the new Hadoop API?

Question

How to recursively use the directory structure in the new Hadoop API?

My file structure is as follows:

/indir/somedir1/somefile
/indir/somedir1/someotherfile...
/indir/somedir2/somefile
/indir/somedir2/someotherfile...

Now I want to pass everything recursively to the MR job, and I'm using the new API. So I did:

FileInputFormat.setInputPaths(job, new Path("/indir"));

But job failed:

Error: java.io.FileNotFoundException: Path is not a file: /indir/somedir1

I am using Hadoop 2.4 and this post states that the new Hadoop 2 API does not support recursive files. But I wonder how this can be, because I believe that the most common thing in the world is to drop a large nested directory structure into Hadoop ...

So is this intended, or is it a mistake? In both cases, is there a different solution than using the old API?

+4

recursion hadoop hdfs

rabejens Oct 30 '14 at 8:13

source share

2 answers

- FileInputFormat.

FileInputFormat.setInputDirRecursive(job, true);

+1

serhiy.h 13 . '16 13:02

rabejens · Accepted Answer · 2014-10-30T09:03:47+0000

. JIRA, , , :

mapreduce.input.fileinputformat.input.dir.recursive true ( mapred.input.dir.recursive, )
FileInputFormat.addInputPath,

.

How to recursively use the directory structure in the new Hadoop API?

More articles: