Basically, you need to understand the difference in that when you need to modify SerDe and when to modify fileformat.
From the official documentation: Hive SerDe
What is SerDe? 1.SerDe - short name "Serializer and Deserializer". 2.Hive uses SerDe (and FileFormat) to read and write table rows. 3.HDFS files -> InputFileFormat -> -> Deserializer -> String object 4.Row Object -> Serializer -> -> OutputFileFormat -> HDFS Files
So, the 3rd and 4th points clearly indicate the difference. To read a record in different ways than usual, you need to have your own file format (input / output), where the records are separated by the "\ n" character. And you need to configure SerDe when you want to interpret the read records in your own way.
Take the example of the widely used JSON format.
Scenario 1: Say you have a json input file, where one line contains one json entry. So, now you just need a Custom Serde to interpret the read record the way you want. There is no need for a custom inout format, since 1 line will be 1 record.
Scenario 2: Now, if you have an input file in which your single json record spans several lines and you want to read it, then you must first write a custom input format for reading in 1 json record, and then this read The json record will go to Custom SerDe.
Harry kumar
source share