How many types of InputFormat exist in Hadoop?

I am new to Hadoop and wonder how many types of InputFormat are in Hadoop such as TextInputFormat ? Is there a specific InputFormat that I can use to read files through HTTP requests to remote data servers?

Thanks:)

+7
hadoop
source share
2 answers

There are many classes that implement InputFormat.

 CombineFileInputFormat, CombineSequenceFileInputFormat, CombineTextInputFormat, CompositeInputFormat, DBInputFormat, FileInputFormat, FixedLengthInputFormat, KeyValueTextInputFormat, MultiFileInputFormat, NLineInputFormat, Parser.Node, SequenceFileAsBinaryInputFormat, SequenceFileAsTextInputFormat, SequenceFileInputFilter, SequenceFileInputFormat, TextInputFormat 

Take a look at the article on when to use the Inputformat type.

Of these, the most commonly used formats are:

  • FileInputFormat : base class for all file-based input formats
  • KeyValueTextInputFormat : InputFormat for text files. Files are broken into lines. To signal to the end of the line, either line feed or carriage return is used. Each line is divided into key and part values ​​by a separator byte. If such a byte does not exist, the key will be the entire line, and the value will be empty.
  • TextInputFormat : InputFormat for text files. Files are broken into lines. To signal the end of a line, either a line return or a carriage return is used. Keys are a position in a file, and values ​​are a line of text.
  • NLineInputFormat : NLineInputFormat, which splits N input lines as one split. In many "nice" parallel applications, each process / mapper processes the same input file (s), but with the parameters being controlled by different parameters.
  • SequenceFileInputFormat : InputFormat for sequence files.

As for the second request, first get the files from the remote servers and use the appropriate InputFileFormat depending on the contents in the file. Hadoop best for localizing data.

+6
source share

Your first question: how many types of InputFormat exist in Hadoop like TextInputFormat?

  • TextInputFormat - each line will be considered a value
  • KeyValueTextInputFormat - The first value before the delimiter is the key, and rest is the value
  • FixedLengthInputFormat - Each value of a fixed length is considered a value
  • NLineInputFormat - N number of rows counts as one value / record
  • SequenceFileInputFormat - for binary

There is also DBInputFormat for reading from databases

The second question: there is no input format for reading files through HTTP requests.

+3
source share