You're right. If you are looking for class FileInputFormat and method getSplits() . He is looking for locks:
BlockLocation[] blkLocations = fs.getFileBlockLocations(file, 0, length);
This implies a request FileSystem. This occurs within JobClient , the results are written in SequenceFile (in fact it is just the original byte code). Therefore Jobtracker reads this file later during the initialization tasks and greatly simply assigns the task to inputplit.
BUT distribution of data - a task NameNodes.
Now to your question: Normally you expand on FileInputFormat . Thus, you will be forced to return the list InputSplit , and in the initialization phase is required to specify the location of such a thing separation. For example, FileSplit :
public FileSplit(Path file, long start, long length, String[] hosts)
So, you do not implement a data location, you just say, on what the host can be found split. This can easily be retrieved using the interface FileSystem .
source share