How many types of InputFormat exist in Hadoop?

Question

How many types of InputFormat exist in Hadoop?

I am new to Hadoop and wonder how many types of InputFormat are in Hadoop such as TextInputFormat ? Is there a specific InputFormat that I can use to read files through HTTP requests to remote data servers?

Thanks:)

+7

hadoop

Trams Dec 08 '15 at 3:46

source share

2 answers

Ravindra babu · Answer 1 · 2015-12-08T06:42:42+0000

There are many classes that implement InputFormat.

 CombineFileInputFormat, CombineSequenceFileInputFormat, CombineTextInputFormat, CompositeInputFormat, DBInputFormat, FileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat, MultiFileInputFormat, NLineInputFormat, Parser.Node, SequenceFileAsBinaryInputFormat, SequenceFileAsTextInputFormat, SequenceFileInputFilter, SequenceFileInputFormat, TextInputFormat

Take a look at the article on when to use the Inputformat type.

Of these, the most commonly used formats are:

FileInputFormat : base class for all file-based input formats
KeyValueTextInputFormat : InputFormat for text files. Files are broken into lines. To signal to the end of the line, either line feed or carriage return is used. Each line is divided into key and part values by a separator byte. If such a byte does not exist, the key will be the entire line, and the value will be empty.
TextInputFormat : InputFormat for text files. Files are broken into lines. To signal the end of a line, either a line return or a carriage return is used. Keys are a position in a file, and values are a line of text.
NLineInputFormat : NLineInputFormat, which splits N input lines as one split. In many "nice" parallel applications, each process / mapper processes the same input file (s), but with the parameters being controlled by different parameters.
SequenceFileInputFormat : InputFormat for sequence files.

As for the second request, first get the files from the remote servers and use the appropriate InputFileFormat depending on the contents in the file. Hadoop best for localizing data.

Durga Viswanath Gadiraju · Answer 2 · 2015-12-08T04:04:57+0000

Your first question: how many types of InputFormat exist in Hadoop like TextInputFormat?

TextInputFormat - each line will be considered a value
KeyValueTextInputFormat - The first value before the delimiter is the key, and rest is the value
FixedLengthInputFormat - Each value of a fixed length is considered a value
NLineInputFormat - N number of rows counts as one value / record
SequenceFileInputFormat - for binary

There is also DBInputFormat for reading from databases

The second question: there is no input format for reading files through HTTP requests.

How many types of InputFormat exist in Hadoop?

More articles: