There are many classes that implement InputFormat.
CombineFileInputFormat, CombineSequenceFileInputFormat, CombineTextInputFormat, CompositeInputFormat, DBInputFormat, FileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat, MultiFileInputFormat, NLineInputFormat, Parser.Node, SequenceFileAsBinaryInputFormat, SequenceFileAsTextInputFormat, SequenceFileInputFilter, SequenceFileInputFormat, TextInputFormat
Take a look at the article on when to use the Inputformat type.
Of these, the most commonly used formats are:
FileInputFormat : base class for all file-based input formatsKeyValueTextInputFormat : InputFormat for text files. Files are broken into lines. To signal to the end of the line, either line feed or carriage return is used. Each line is divided into key and part values ββby a separator byte. If such a byte does not exist, the key will be the entire line, and the value will be empty.TextInputFormat : InputFormat for text files. Files are broken into lines. To signal the end of a line, either a line return or a carriage return is used. Keys are a position in a file, and values ββare a line of text.NLineInputFormat : NLineInputFormat, which splits N input lines as one split. In many "nice" parallel applications, each process / mapper processes the same input file (s), but with the parameters being controlled by different parameters.SequenceFileInputFormat : InputFormat for sequence files.
As for the second request, first get the files from the remote servers and use the appropriate InputFileFormat depending on the contents in the file. Hadoop best for localizing data.
Ravindra babu
source share