You do not need to transfer a separate file as input for the MapReduce Job.
The FileInputFormat class already provides an API for accepting a list of several files as an input to the map program.
public static void setInputPaths(Job job, Path... inputPaths) throws IOException
Add a path to the list of inputs to specify a map reduction. Options:
conf - Job configuration
path - the path that will be added to the input list to specify the map reduction.
Sample code from Apache tutorial
Job job = Job.getInstance(conf, "word count"); FileInputFormat.addInputPath(job, new Path(args[0]));
MultipleInputs provides below API.
public static void addInputPath(Job job, Path path, Class<? extends InputFormat> inputFormatClass, Class<? extends Mapper> mapperClass)
Add a path with custom InputFormat and Mapper to the list of inputs to specify a map reduction.
Related SE Question:
Can hadoop enter data from multiple directories and files
Refer to the MultipleOutputs API for your second request on multiple output paths.
FileOutputFormat.setOutputPath(job, outDir); // Defines additional single text based output 'text' for the job MultipleOutputs.addNamedOutput(job, "text", TextOutputFormat.class, LongWritable.class, Text.class); // Defines additional sequence-file based output 'sequence' for the job MultipleOutputs.addNamedOutput(job, "seq", SequenceFileOutputFormat.class, LongWritable.class, Text.class);
Take a look at SE related questions regarding multiple output files.
Writing to multiple folders in hadoop?
hadoop method to send output to multiple directories