I have a situation where I have several (100+ from 2-3 MB each) gz compressed files present in several directories. For example
A1 / B1 / C1 / incomplete 0000.gz
A2 / B2 / C2 / incomplete 0000.gz
A1 / B1 / C1 / part-0001.gz
I need to transfer all these files to one map job. From what I see, to use MultipleFileInputFormat, all input files must be in the same directory. Can I transfer multiple directories directly to a job?
If not, is it possible to efficiently put these files in one directory without naming a conflict or to combine these files into one compressed gz file.
Note. I use simple Java to implement Mapper and do not use Pig or hadoop streams.
Any help on the above issue would be greatly appreciated. Thanks,
Ankit
input file hadoop
Ankit
source share